Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
157 views
in Technique[技术] by (71.8m points)

r - How to plot many probability density functions (pdfs) without sharp edges?

I have an issue with plotting continuous distributions without sharp edges in ggplot2. I need to show two of them on one plot. Also, it does not have to be ggplot2 to achieve this result.

I am aware, that the number of data points directly influences the smoothness of the line, but it was not the case here. Below you can see some sample data (from dput)

 sample.data<-list(beta.data = structure(list(cluster = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), beta.density = c(0, 3.42273368363934e-43, 8.42987148403021e-29,    
2.04764468657484e-20, 1.69485562831516e-14, 6.07999638837842e-10, 2.88180370232676e-06, 0.00314064636750876, 0.954118897015866, 0, 0, 3.80101893822358e-36, 6.43342582657081e-22, 6.82956252277493e-14, 1.75291058931833e-08, 0.000131874335695378, 0.0750918340641428,     3.72532418759802, 5.05242078519544, 0), pr = c(0, 0.111111111111111, 0.222222222222222, 0.333333333333333, 0.444444444444444, 0.555555555555556, 0.666666666666667, 0.777777777777778,     0.888888888888889, 1, 0, 0.111111111111111, 0.222222222222222, 0.333333333333333, 0.444444444444444, 0.555555555555556, 0.666666666666667, 0.777777777777778, 0.888888888888889, 1)), row.names = c(NA, -20L), class = "data.frame"), beta.params = structure(list(cluster = 1:2, a = c(49, 50), b = c(2, 10), ni.beta = c(0.961,0.833), sd.beta = c(0.00072, 0.00228)), row.names = c(NA,-2L), class = "data.frame"))

Before I was using geom_col, but it discretizes values. I went with geom_area:

ggplot(sample.data$beta.data, aes(x = pr, y = beta.density)) +
  geom_area(stat = "function",
            fun = dbeta, 
            args = list(shape1 = sample.data$beta.params[1,"a"], shape2 = sample.data$beta.params[1,"b"]),
            fill = "#F1C40F",
            colour = "black",
            alpha = 0.7) +
  geom_area(stat = "function",
            fun=dbeta, 
            args = list(shape1 = sample.data$beta.params[2,"a"], shape2 = sample.data$beta.params[2,"b"]),
            fill = "#3498DB",
            colour = "black",
            alpha = 0.7)

Output picture with sharp edges

I presented you the data with 10 points, but 1000 points look almost the same. It is not the case here, where even 100 points looks ok:

p = seq(0,1, length=100)
plot(p, dbeta(p, 50, 10), ylab="Density", type ="l", col=4, , lwd = 2)

Here I am attaching code to simulate the data. Oh, and these troublesome beta parameters were a = 49 and b = 2.

len <- 100
p <- seq(0,1, length.out = len)
df <- data.frame(rbind(cbind("cl" = rep(1, times = length(p)), "beta" = dbeta(p, 50, 10),"p"= p),
                       cbind("cl" = rep(1, times = length(p)), "beta" = dbeta(p, 40, 2),"p"= p)))

Do you have any ideas?

EDIT: The pdfs stands here for probability density functions. That is why I have not put "pdf" as a tag. My apologies for the confusion! Anyway, when I tried to print graphic to PDF file, the result was poor as well (sharp edges). But it the end, it shouldn't matter. I want to see smooth lines whatever I do (reasonably).

EDIT2 It is possible to achieve because:

library(mosaic)
theme_set(theme_bw())
xpbeta(c(0.7, 0.90), shape1 = 49, shape2 = 2)

It produces nice, smoothed beta dist with parameters (49, 2). But then again, I need to show two dists in one chart.

question from:https://stackoverflow.com/questions/65947332/how-to-plot-many-probability-density-functions-pdfs-without-sharp-edges

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

I have found the answer. It still needs some editing (like transparency/alpha which I couldn't figure out), but in general, this is what I meant. Code:

library(mosaic)
plotDist('beta', params=list(49,2), kind='density', type = "h", col = "#3498DB", xlim = c(0,1))
plotDist('beta', params=list(50, 10), kind='density', , type = "h", col = "#F1C40F", add = TRUE)
plotDist('beta', params=list(49,2), kind='density', add = TRUE, col = "black")
plotDist('beta', params=list(50, 10), kind='density', add = TRUE, col = "black")

Result: enter image description here

We can add as many distributions as we want, using "add" parameter. Parameter type = "h", is used to draw filled distribution. Without it, the only line is visible. In my answer, I draw the two lines and two filled dists. I would be really happy if someone could show a better answer, though.

EDIT: I think I found my perfect answer! enter image description here

Here is the code:

library(ggformula)
theme_set(theme_bw())
gf_dist("beta", shape1 = 49, shape2 = 2, geom = "area", alpha = 0.5, fill = "#F1C40F")  %>%
gf_dist("beta", shape1 = 49, shape2 = 2) %>% 
gf_dist("beta", shape1 = 50, shape2 = 10, geom = "area", alpha = 0.5, fill = "#3498DB") %>%
gf_dist("beta", shape1 = 50, shape2 = 10)

It is much faster than the previous code, parameter alpha is obvious and it is relatively easy to combine many plots! Because of transparency, you can nicely see the overlap of both distributions.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...