Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
391 views
in Technique[技术] by (71.8m points)

r - A histogram with individual proportions on one Y-axis and cumulative proportion on another

I am looking for a solution in [R] to make a chart like I show here (it is made in Excel): enter image description here

I can make a histogram using below code:

ggplot(data=TestData, aes(x=QP1)) + geom_histogram(aes(y = (..count..)/sum(..count..)), binwidth = 0.1, fill = "lightblue", color="black")+ scale_x_continuous(breaks = seq(8,10,0.1)) + scale_y_continuous(labels = percent_format(), breaks = seq(0,1,0.05)) + xlab("QP1")

enter image description here

but I could not make a secondary axis and a line plot overlaid on the histogram. I found several example on this site asking similar question, but still had difficulty in truly understanding those solutions.

I need help for:

  1. Recommended code to achieve cumulative line plot, but appreciate if some guidance is added to explain the mechanism behind working.
  2. I want to control bin width nicely and add some statistic like mean/stdev etc on the chart, am not lazy if have to use dplyr or do some additional working.

Thanks.

Edit: I could achieve what I wanted initially. During further enhancement, such as adding adding data labels is a challenge in this matter:

    ggplot(data=TestData, aes(x=QP1, y=after_stat(count / sum(count)))) +
         
  geom_histogram(fill = "darkorange", color="black", binwidth = 0.1) + 
  
  
  stat_bin(aes(y = after_stat(cumsum(count / sum(count)) * 0.5)),
           geom = "line", colour = "dodgerblue",binwidth = 0.1) + 
  
  stat_bin(aes(label =  after_stat(scales::percent(count / sum(count)))),
           geom = "text",colour="blue", binwidth = 0.1,vjust=1)  +
  
  stat_bin(aes(label =  after_stat(scales::percent(cumsum((count / sum(count)))))),
           geom = "text",colour="blue", binwidth = 0.1, vjust=-4)  +
  
  
  
  scale_y_continuous(
    labels = scales::percent, breaks = seq(0,5,.1),
    name = "Proportion",
    sec.axis = sec_axis(~ .x * 2, 
                        name = "Cumulative Proportion",
                        labels = scales::percent, breaks = seq(0,1,.2)))

enter image description here

Data labels are added well and show correct numbers, cumulative labels need to be positioned as per sec.axis, how to do that? if we transform by add/div, label value changed not the position. Please suggest.

question from:https://stackoverflow.com/questions/65830791/a-histogram-with-individual-proportions-on-one-y-axis-and-cumulative-proportion

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

So a couple of things about secondary axes:

  1. You must transform input data yourself to fit it on the primary axis.
  2. You must give the inverse transform to the trans argument of the secondary axis.

In the code below we achieve (1) by doing y = after_stat(cumsum(count / sum(count)) * 0.1. The after_stat() part replaces the older syntax of ..variable... The cumsum() calculates the cumulative sum of the proportions, giving the cumulative proportions. The * 0.1 is dividing the cumulative data by 10 to achieve (1). Then, to achieve (2) you should give the secondary axis ~ .x * 10 to scale up the number on the axis itself. You can change these scaling factors depending on the plot, but be sure to change them at both places.

library(ggplot2)

df <- data.frame(
  x = rnorm(100)
)

ggplot(df, aes(x)) +
  geom_histogram(aes(y = after_stat(count / sum(count))),
                 fill = "darkorange")  +
  stat_bin(aes(y = after_stat(cumsum(count / sum(count)) * 0.1)),
           geom = "line", colour = "dodgerblue") +
  # Set secondary axis in y scale
  scale_y_continuous(
    labels = scales::percent,
    name = "Proportion",
    sec.axis = sec_axis(~ .x * 10, 
                        name = "Cumulative Proportion",
                        labels = scales::percent)
  ) +
  # For pretty colours
  theme(
    axis.line.y.left = element_line(colour = "darkorange"),
    axis.text.y.left = element_text(colour = "darkorange"),
    axis.ticks.y.left = element_line(colour = "darkorange"),
    axis.title.y.left = element_text(colour = "darkorange"),
    axis.line.y.right = element_line(colour = "dodgerblue"),
    axis.text.y.right = element_text(colour = "dodgerblue"),
    axis.ticks.y.right = element_line(colour = "dodgerblue"),
    axis.title.y.right = element_text(colour = "dodgerblue")
  )
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Created on 2021-01-21 by the reprex package (v0.3.0)

EDIT:

With regards to sec_axis(~ .x * 10, ...), this is called 'lambda syntax' where you create a one-sided formula (only right hand side is defined), that will be converted to a function by rlang::as_function(). The .x is a placeholder for the input data, so the ~ .x * 10 can be read as function(x) {x * 10}. This does not work in general, but many tidyverse packages accept this notation at various points.

The after_stat() function is the newer notations of ..variable.., such that after_stat(count/sum(count)) is the same as (..count..) / sum(..count..) you use in your example. The difference is that you don't need to wrap every variable in ..'s and it is generally more flexible. The after_stat() function causes whatever is inside that function to be evaluated after the stat layer has computed the stats. The count variable is not an aesthetic you define, it is a computed variable that the stat layer produces, so we need after_stat() to do something with that variable.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...