Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
67 views
in Technique[技术] by (71.8m points)

python - What's the proper distribution for the following data

I have the following sample data.
It looks like right half of the normal distribution.

Suppose the data is read time of a blog article. What I want to do is to find out how each blog article is performing in terms of read time.

With regular normal-looking distribution, I'd find mean and std of a populartion, then given a sample(a blog), find the mean read time of the blog and compute p-value of the sample mean.

But since the distribution is not like-normal.. what should I do?

Below is the data.

tds_ = [28.965,
 12.172,
 17.042,
 36.98,
 20.323,
 3.481,
 18.43,
 5.638,
 20.763,
 48.104,
 8.015,
 21.2,
 48.122,
 32.51,
 16.87,
 10.402,
 7.896,
 3.827,
 0.078,
 18.63,
 42.428,
 0.975,
 11.392,
 15.937,
 4.531,
 44.635,
 10.457,
 53.821,
 43.046,
 39.572,
 6.31,
 52.039,
 36.726,
 19.67,
 43.719,
 9.421,
 2.798,
 20.013,
 32.888,
 43.622,
 13.093,
 38.688,
 57.199,
 13.627,
 42.571,
 34.076,
 18.812,
 49.251,
 57.412,
 35.089,
 8.093,
 15.141,
 58.05,
 17.936,
 4.673,
 5.475,
 11.731,
 46.649,
 12.403,
 6.442,
 22.542,
 44.069,
 7.893,
 26.484,
 4.199,
 6.575,
 3.209,
 32.125,
 40.202,
 37.918,
 27.567,
 22.634,
 43.355,
 44.481,
 17.854,
 29.538,
 2.39,
 16.52,
 34.321,
 8.003,
 28.034,
 20.963,
 16.509,
 26.279,
 13.541,
 22.654,
 32.074,
 9.474,
 1.054,
 11.612,
 2.108,
 19.015,
 0.864,
 7.577,
 9.927,
 7.295,
 6.689,
 13.908,
 2.063,
 31.57]

Here I'm showing distribution and the normal .. (From a sample, I create a negative copy of it and append to the sample) Then it looks like the normal distribution

enter image description here

from scipy.stats import norm
import matplotlib.pyplot as plt


fig, ax = plt.subplots(1, 1)

tds_half = pd.Series(tds_)
tds_inverse = tds_half * -1

tds = np.append(tds_half, tds_inverse)

mean = np.mean(tds)
std = np.std(tds)


mean, var, skew, kurt = norm(mean, std).stats(moments='mvsk')
x = np.linspace(norm(mean, std).ppf(0.01),
                norm(mean, std).ppf(0.99), 100)
ax.plot(x, norm(mean, std).pdf(x),
        'r-', lw=5, alpha=0.6, label='norm pdf')



rv = norm(mean, std)
ax.plot(x, rv.pdf(x), 'k-', lw=2, label='frozen pdf')

vals = norm.ppf([0.001, 0.5, 0.999])
np.allclose([0.001, 0.5, 0.999], norm.cdf(vals))

r = tds

ax.hist(r, density=True, histtype='stepfilled', alpha=0.2)
ax.legend(loc='best', frameon=False)
plt.show()
question from:https://stackoverflow.com/questions/65878569/whats-the-proper-distribution-for-the-following-data

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)
Waitting for answers

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...