I have the following sample data.
It looks like right half of the normal distribution.
Suppose the data is read
time of a blog article.
What I want to do is to find out how each blog article is performing in terms of read
time.
With regular normal-looking distribution, I'd find mean
and std
of a populartion, then given a sample(a blog), find the mean read
time of the blog and compute p-value of the sample mean.
But since the distribution is not like-normal.. what should I do?
Below is the data.
tds_ = [28.965,
12.172,
17.042,
36.98,
20.323,
3.481,
18.43,
5.638,
20.763,
48.104,
8.015,
21.2,
48.122,
32.51,
16.87,
10.402,
7.896,
3.827,
0.078,
18.63,
42.428,
0.975,
11.392,
15.937,
4.531,
44.635,
10.457,
53.821,
43.046,
39.572,
6.31,
52.039,
36.726,
19.67,
43.719,
9.421,
2.798,
20.013,
32.888,
43.622,
13.093,
38.688,
57.199,
13.627,
42.571,
34.076,
18.812,
49.251,
57.412,
35.089,
8.093,
15.141,
58.05,
17.936,
4.673,
5.475,
11.731,
46.649,
12.403,
6.442,
22.542,
44.069,
7.893,
26.484,
4.199,
6.575,
3.209,
32.125,
40.202,
37.918,
27.567,
22.634,
43.355,
44.481,
17.854,
29.538,
2.39,
16.52,
34.321,
8.003,
28.034,
20.963,
16.509,
26.279,
13.541,
22.654,
32.074,
9.474,
1.054,
11.612,
2.108,
19.015,
0.864,
7.577,
9.927,
7.295,
6.689,
13.908,
2.063,
31.57]
Here I'm showing distribution and the normal ..
(From a sample, I create a negative copy of it and append to the sample)
Then it looks like the normal distribution
from scipy.stats import norm
import matplotlib.pyplot as plt
fig, ax = plt.subplots(1, 1)
tds_half = pd.Series(tds_)
tds_inverse = tds_half * -1
tds = np.append(tds_half, tds_inverse)
mean = np.mean(tds)
std = np.std(tds)
mean, var, skew, kurt = norm(mean, std).stats(moments='mvsk')
x = np.linspace(norm(mean, std).ppf(0.01),
norm(mean, std).ppf(0.99), 100)
ax.plot(x, norm(mean, std).pdf(x),
'r-', lw=5, alpha=0.6, label='norm pdf')
rv = norm(mean, std)
ax.plot(x, rv.pdf(x), 'k-', lw=2, label='frozen pdf')
vals = norm.ppf([0.001, 0.5, 0.999])
np.allclose([0.001, 0.5, 0.999], norm.cdf(vals))
r = tds
ax.hist(r, density=True, histtype='stepfilled', alpha=0.2)
ax.legend(loc='best', frameon=False)
plt.show()
question from:
https://stackoverflow.com/questions/65878569/whats-the-proper-distribution-for-the-following-data