Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
489 views
in Technique[技术] by (71.8m points)

r - Plotting cumulative counts in ggplot2

There are some posts about plotting cumulative densities in ggplot. I'm currently using the accepted answer from Easier way to plot the cumulative frequency distribution in ggplot? for plotting my cumulative counts. But this solution involves pre-calculating the values beforehand.

Here I'm looking for a pure ggplot solution. Let's show what I have so far:

x <- data.frame(A=replicate(200,sample(c("a","b","c"),1)),X=rnorm(200))

ggplot's stat_ecdf

I can use ggplot's stat_ecdf, but it only plots cumulative densities:

ggplot(x,aes(x=X,color=A)) + geom_step(aes(y=..y..),stat="ecdf")

enter image description here

I'd like to do something like the following, but it doesn't work:

ggplot(x,aes(x=X,color=A)) + geom_step(aes(y=..y.. * ..count..),stat="ecdf")

cumsum and stat_bin

I found an idea about using cumsum and stat_bin:

ggplot(x,aes(x=X,color=A)) + stat_bin(aes(y=cumsum(..count..)),geom="step")

enter image description here

But as you can see, the next color doesn't start at y=0, but where the last color ended.

What I ask for

What I'd like to have from best to worst:

  1. Ideally a simple fix to the not working

    ggplot(x,aes(x=X,color=A)) + geom_step(aes(y=..y.. * ..count..),stat="ecdf")
    
  2. A more complicated way to use stat_ecdf with counts.

  3. Last resort would be to use the cumsum approach, since it gives worse (binned) results.
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

This will not solve directly problem with grouping of lines but it will be workaround.

You can add three calls to stat_bin() where you subset your data according to A levels.

ggplot(x,aes(x=X,color=A)) +
  stat_bin(data=subset(x,A=="a"),aes(y=cumsum(..count..)),geom="step")+
  stat_bin(data=subset(x,A=="b"),aes(y=cumsum(..count..)),geom="step")+
  stat_bin(data=subset(x,A=="c"),aes(y=cumsum(..count..)),geom="step")

enter image description here

UPDATE - solution using geom_step()

Another possibility is to multiply values of ..y.. with number of observations in each level. To get this number of observations at this moment only way I found is to precalculate them before plotting and add them to original data frame. I named this column len. Then in geom_step() inside aes() you should define that you will use variable len=len and then define y values as y=..y.. * len.

set.seed(123)
x <- data.frame(A=replicate(200,sample(c("a","b","c"),1)),X=rnorm(200))
library(plyr)
df <- ddply(x,.(A),transform,len=length(X))
ggplot(df,aes(x=X,color=A)) + geom_step(aes(len=len,y=..y.. * len),stat="ecdf") 

enter image description here


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...