Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
747 views
in Technique[技术] by (71.8m points)

python - What is y axis in seaborn distplot?

I have some geometrically distributed data. When I want to take a look at it, I use

sns.distplot(data, kde=False, norm_hist=True, bins=100)

which results is a picture:

Plot 1a

However, bins heights don't add up to 1, which means y axis doesn't show probability, it's something different. If instead we use

weights = np.ones_like(np.array(data))/float(len(np.array(data)))
plt.hist(data, weights=weights, bins = 100)

the y axis shall show probability, as bins heights sum up to 1:

Plot 1b

It can be seen more clearly here: suppose we have a list

l = [1, 3, 2, 1, 3]

We have two 1s, two 3s and one 2, so their respective probabilities are 2/5, 2/5 and 1/5. When we use seaborn histplot with 3 bins:

sns.distplot(l, kde=False, norm_hist=True, bins=3)

we get:

Plot 2a

As you can see, the 1st and the 3rd bin sum up to 0.6+0.6=1.2 which is already greater than 1, so y axis is not a probability. When we use

weights = np.ones_like(np.array(l))/float(len(np.array(l)))
plt.hist(l, weights=weights, bins = 3)

we get:

enter image description here

and the y axis is probability, as 0.4+0.4+0.2=1 as expected.

The amount of bins in these 2 cases are is the same for both methods used in each case: 100 bins for geometrically distributed data, 3 bins for small array l with 3 possible values. So bins amount is not the issue.

My question is: in seaborn distplot called with norm_hist=True, what is the meaning of y axis?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

From the documentation:

norm_hist : bool, optional

If True, the histogram height shows a density rather than a count. This is implied if a KDE or fitted density is plotted.

So you need to take into account your bin width as well, i.e. compute the area under the curve and not just the sum of the bin heights.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...