I think the answer to your previous question is wrong and misleading, although, to be fair, you did not ask the question very clearly.
I think what you are perhaps trying to do is compare the binomial distribution to the Normal approximation to it. The binomial is the number of successes you get if you do something N times, and the chance of each being a success is p. The mean of this is Np, and the standard deviation is sqrt(Np(1-p)), which can be used to approximate it with a Normal distribution.
One way to compare them using ggplot
would be like this...
library(tidyverse)
trials <- 100 #i.e. N in the explanation above
prob <- 0.1 #i.e. p in the explanation above
sims <- 100000 #the number of simulations you want (1e5 in your previous question)
df <- tibble(n = 1:sims,
normal = sort(rnorm(sims, #no of variates
trials * prob, #mean
sqrt(trials * prob * (1-prob)))), #standard deviation
binomial = sort(rbinom(sims,
trials,
prob)))
Then, to compare the (discrete) histogram of the binomial distribution (in red) with the (continuous) density of the Normal approximation (in blue), you can do
df %>% ggplot() +
geom_density(aes(x = normal),
alpha = 0.5,
fill = "blue") +
geom_histogram(aes(x = binomial,
y = stat(density)), #normalises scale to sum to 1
alpha = 0.5,
fill = "red",
binwidth = 1)
And to compare the cumulative distributions (taking advantage of the fact that we have sorted the variates in our dataframe)...
df %>% ggplot(aes(y = n/sims)) +
geom_line(aes(x = normal),
colour = "blue") +
geom_line(aes(x = binomial),
colour = "red")
I hope this helps!
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…