python - what is the most efficient way of counting occurrences in pandas?

Question

Welcome To Ask or Share your Answers For Others

python - what is the most efficient way of counting occurrences in pandas?

posted Oct 17, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - what is the most efficient way of counting occurrences in pandas?

I have a large (about 12M rows) dataframe df with say:

df.columns = ['word','documents','frequency']

So the following ran in a timely fashion:

word_grouping = df[['word','frequency']].groupby('word')
MaxFrequency_perWord = word_grouping[['frequency']].max().reset_index()
MaxFrequency_perWord.columns = ['word','MaxFrequency']

However, this is taking an unexpected long time to run:

Occurrences_of_Words = word_grouping[['word']].count().reset_index()

What am I doing wrong here? Is there a better way to count occurences in a large dataframe?

df.word.describe()

ran pretty well, so I really did not expect this Occurrences_of_Words dataframe to take very long to build.

ps: If the answer is obvious and you feel the need to penalize me for asking this question, please include the answer as well. thank you.

Question&Answers:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-16T22:33:19+0000

I think df['word'].value_counts() should serve. By skipping the groupby machinery, you'll save some time. I'm not sure why count should be much slower than max. Both take some time to avoid missing values. (Compare with size.)

In any case, value_counts has been specifically optimized to handle object type, like your words, so I doubt you'll do much better than that.

Categories

python - what is the most efficient way of counting occurrences in pandas?

python - what is the most efficient way of counting occurrences in pandas?

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags