I have a large (about 12M rows) dataframe df with say:
df.columns = ['word','documents','frequency']
So the following ran in a timely fashion:
word_grouping = df[['word','frequency']].groupby('word')
MaxFrequency_perWord = word_grouping[['frequency']].max().reset_index()
MaxFrequency_perWord.columns = ['word','MaxFrequency']
However, this is taking an unexpected long time to run:
Occurrences_of_Words = word_grouping[['word']].count().reset_index()
What am I doing wrong here? Is there a better way to count occurences in a large dataframe?
df.word.describe()
ran pretty well, so I really did not expect this Occurrences_of_Words dataframe to take very long to build.
ps: If the answer is obvious and you feel the need to penalize me for asking this question, please include the answer as well. thank you.
Question&Answers:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…