Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
428 views
in Technique[技术] by (71.8m points)

r - Condition a ..count.. summation on the faceting variable

I'm trying to annotate a bar chart with the percentage of observations falling into that bucket, within a facet. This question is very closely related to this question: Show % instead of counts in charts of categorical variables but the introduction of faceting introduces a wrinkle. The answer to the related question is to use stat_bin w/ the text geom and then have the label be constructed as so:

 stat_bin(geom="text", aes(x = bins,
         y = ..count..,
         label = paste(round(100*(..count../sum(..count..)),1), "%", sep="")
         )

This works fine for an un-faceted plot. However, with facets, this sum(..count..) is summing over the entire collection of observations without regard for the facets. The plot below illustrates the issue---note that the percentages do not sum to 100% within a panel.

enter image description here

Here the actually code for the figure above:

 g.invite.distro <- ggplot(data = df.exp) +
 geom_bar(aes(x = invite_bins)) +
 facet_wrap(~cat1, ncol=3) +
 stat_bin(geom="text", aes(x = invite_bins,
         y = ..count..,
         label = paste(round(100*(..count../sum(..count..)),1), "%", sep="")
         ),  
         vjust = -1, size = 3) +
  theme_bw() + 
scale_y_continuous(limits = c(0, 3000))

UPDATE: As per request, here's a small example re-producing the issue:

df <- data.frame(x = c('a', 'a', 'b','b'), f = c('c', 'd','d','d'))
ggplot(data = df) + geom_bar(aes(x = x)) +
 stat_bin(geom = "text", aes(
         x = x,
         y = ..count.., label = ..count../sum(..count..)), vjust = -1) +
 facet_wrap(~f)

enter image description here

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Update geom_bar requires stat = identity.

Sometimes it's easier to obtain summaries outside the call to ggplot.

df <- data.frame(x = c('a', 'a', 'b','b'), f = c('c', 'd','d','d'))

# Load packages
library(ggplot2)
library(plyr)

# Obtain summary. 'Freq' is the count, 'pct' is the percent within each 'f'
m = ddply(data.frame(table(df)), .(f), mutate, pct = round(Freq/sum(Freq) * 100, 1)) 

# Plot the data using the summary data frame
ggplot(data = m, aes(x = x, y = Freq)) + 
   geom_bar(stat = "identity", width = .7) +
   geom_text(aes(label = paste(m$pct, "%", sep = "")), vjust = -1, size = 3) +
   facet_wrap(~ f, ncol = 2) + theme_bw() +
   scale_y_continuous(limits = c(0, 1.2*max(m$Freq)))

enter image description here


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...