Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
2.2k views
in Technique[技术] by (71.8m points)

ggplot2 - R: ggplot stacked bar chart with counts on y axis but percentage as label

I'm looking for a way to label a stacked bar chart with percentages while the y-axis shows the original count (using ggplot). Here is a MWE for the plot without labels:

library(ggplot2)
df <- as.data.frame(matrix(nrow = 7, ncol= 3,
                       data = c("ID1", "ID2", "ID3", "ID4", "ID5", "ID6", "ID7",
                                "north", "north", "north", "north", "south", "south", "south",
                                "A", "B", "B", "C", "A", "A", "C"),
                      byrow = FALSE))

colnames(df) <- c("ID", "region", "species")

p <- ggplot(df, aes(x = region, fill = species))
p  + geom_bar()

I have a much larger table and R counts quite nicely the different species for every region. Now, I would like to show both, the original count value (preferably on the y-axis) and the percentage (as label) to compare proportions of species between regions.

I tried out many things using geom_text() but I think the main difference to other questions (e.g. this one) is that

  • I do not have a separate column for y values (they are just the counts of different species per region) and
  • I need the labels per region to sum up to 100% (since they are considered to represent seperate populations), not all labels of the entire plot.

Any help is much appreciated!!

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

As @Gregor mentioned, summarize the data separately and then feed the data summary to ggplot. In the code below, we use dplyr to create the summary on the fly:

library(dplyr)

ggplot(df %>% count(region, species) %>%    # Group by region and species, then count number in each group
         mutate(pct=n/sum(n),               # Calculate percent within each region
                ypos = cumsum(n) - 0.5*n),  # Calculate label positions
       aes(region, n, fill=species)) +
  geom_bar(stat="identity") +
  geom_text(aes(label=paste0(sprintf("%1.1f", pct*100),"%"), y=ypos))

enter image description here

Update: With dplyr 0.5 and later, you no longer need to provide a y-value to center the text within each bar. Instead you can use position_stack(vjust=0.5):

ggplot(df %>% count(region, species) %>%    # Group by region and species, then count number in each group
         mutate(pct=n/sum(n)),              # Calculate percent within each region
       aes(region, n, fill=species)) +
  geom_bar(stat="identity") +
  geom_text(aes(label=paste0(sprintf("%1.1f", pct*100),"%")), 
            position=position_stack(vjust=0.5))

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...