Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
488 views
in Technique[技术] by (71.8m points)

r - adding text to ggplot geom_jitter points that match a condition

How can I add text to points rendered with geom_jittered to label them? geom_text will not work because I don't know the coordinates of the jittered dots. Could you capture the position of the jittered points so I can pass to geom_text?

My practical usage would be to plot a boxplot with the geom_jitter over it to show the data distribution and I would like to label the outliers dots or the ones that match certain condition (for example the lower 10% for the values used for color the plots).

One solution would be to capture the xy positions of the jittered plots and use it later in another layer, is that possible?

[update]

From Joran answer, a solution would be to calculate the jittered values with the jitter function from the base package, add them to a data frame and use them with geom_point. For filtering he used ddply to have a filter column (a logic vector) and use it for subsetting the data in geom_text.

He asked for a minimal dataset. I just modified his example (a unique identifier in the label colum)

dat <- data.frame(x=rep(letters[1:3],times=100),y=runif(300),
                      lab=paste('id_',1:300,sep='')) 

This is the result of joran example with my data and lowering the display of ids to the lowest 1% boxplot with jitter dots and label in lower 1% values

And this is a modification of the code to have colors by another variable and displaying some values of this variable (the lowest 1% for each group):

library("ggplot2")
#Create some example data
dat <- data.frame(x=rep(letters[1:3],times=100),y=runif(300),
                          lab=paste('id_',1:300,sep=''),quality= rnorm(300))

#Create a copy of the data and a jittered version of the x variable
datJit <- dat
datJit$xj <- jitter(as.numeric(factor(dat$x)))

#Create an indicator variable that picks out those
# obs that are in lowest 1% by x
datJit <- ddply(datJit,.(x),.fun=function(g){
               g$grp <- g$y <= quantile(g$y,0.01);
               g$top_q <- g$qual <= quantile(g$qual,0.01);
               g})

#Create a boxplot, overlay the jittered points and
# label the bottom 1% points
ggplot(dat,aes(x=x,y=y)) +
  geom_boxplot() +
  geom_point(data=datJit,aes(x=xj,colour=quality)) +
  geom_text(data=subset(datJit,grp),aes(x=xj,label=lab)) +
  geom_text(data=subset(datJit,top_q),aes(x=xj,label=sprintf("%0.2f",quality)))

boxplot with jitter dots and label in lower 1% values

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Your question isn't completely clear; for example, you mention labeling points at one point but also mention coloring points, so I'm not sure which you really mean, or perhaps both. A reproducible example would be very helpful. But using a little guesswork on my part, the following code does what I think you're describing:

#Create some example data
dat <- data.frame(x=rep(letters[1:3],times=100),y=runif(300),
        lab=rep('label',300))

#Create a copy of the data and a jittered version of the x variable
datJit <- dat
datJit$xj <- jitter(as.numeric(factor(dat$x)))

#Create an indicator variable that picks out those 
# obs that are in lowest 10% by x
datJit <- ddply(datJit,.(x),.fun=function(g){
             g$grp <- g$y <= quantile(g$y,0.1); g})

#Create a boxplot, overlay the jittered points and 
# label the bottom 10% points
ggplot(dat,aes(x=x,y=y)) + 
    geom_boxplot() + 
    geom_point(data=datJit,aes(x=xj)) + 
    geom_text(data=subset(datJit,grp),aes(x=xj,label=lab))        

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...