This is a follow-up to my previous question Integrating ggplot2 with user-defined stat_function(), which I've answered myself yesterday. My current problem is that, in the following reproducible example, lines, which are supposed to plot components of the data values' mixture distribution, neither appear in the expected places, nor they're of expected shape, as shown below (see the red lines at y=0 in the second figure).
Complete reproducible example:
library(ggplot2)
library(scales)
library(RColorBrewer)
library(mixtools)
NUM_COMPONENTS <- 2
set.seed(12345) # for reproducibility
data(diamonds, package='ggplot2') # use built-in data
myData <- diamonds$price
# extract 'k' components from mixed distribution 'data'
mix.info <- normalmixEM(myData, k = NUM_COMPONENTS,
maxit = 100, epsilon = 0.01)
summary(mix.info)
numComponents <- length(mix.info$sigma)
message("Extracted number of component distributions: ",
numComponents)
calc.components <- function(x, mix, comp.number) {
mix$lambda[comp.number] *
dnorm(x, mean = mix$mu[comp.number], sd = mix$sigma[comp.number])
}
g <- ggplot(data.frame(x = myData)) +
scale_fill_continuous("Count", low="#56B1F7", high="#132B43") +
scale_x_log10("Diamond Price [log10]",
breaks = trans_breaks("log10", function(x) 10^x),
labels = prettyNum) +
scale_y_continuous("Count") +
geom_histogram(aes(x = myData, fill = 0.01 * ..density..),
binwidth = 0.01)
print(g)
# we could select needed number of colors randomly:
#DISTRIB_COLORS <- sample(colors(), numComponents)
# or, better, use a palette with more color differentiation:
DISTRIB_COLORS <- brewer.pal(numComponents, "Set1")
distComps <- lapply(seq(numComponents), function(i)
stat_function(fun = calc.components,
arg = list(mix = mix.info, comp.number = i),
geom = "line", # use alpha=.5 for "polygon"
size = 1,
color = "red")) # DISTRIB_COLORS[i]
print(g + distComps)
UPDATE: Just a quick note on my efforts. I have additionally tried several other options, including converting the plot's x-axis scale to normal and requesting original data values' log transformation in the histogram part, like this: geom_histogram(aes(x = log10(data), fill = ..count..), binwidth = 0.01)
, but the end result still remains the same. In regard to my first comment, I realized that the transformation I have mentioned is not needed as long as I'm using reference to the ..count.. object.
UPDATE 2: Changed color of line, produced by stat_function()
, to red, to clarify the problem.
See Question&Answers more detail:
os