python - Count distinct words from a Pandas Data Frame

Question

Welcome To Ask or Share your Answers For Others

python - Count distinct words from a Pandas Data Frame

posted Oct 17, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - Count distinct words from a Pandas Data Frame

I've a Pandas data frame, where one column contains text. I'd like to get a list of unique words appearing across the entire column (space being the only split).

import pandas as pd

r1=['My nickname is ft.jgt','Someone is going to my place']

df=pd.DataFrame(r1,columns=['text'])

The output should look like this:

['my','nickname','is','ft.jgt','someone','going','to','place']

It wouldn't hurt to get a count as well, but it is not required.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-17T00:10:25+0000

Use a set to create the sequence of unique elements.

Do some clean-up on df to get the strings in lower case and split:

df['text'].str.lower().str.split()
Out[43]: 
0             [my, nickname, is, ft.jgt]
1    [someone, is, going, to, my, place]

Each list in this column can be passed to set.update function to get unique values. Use apply to do so:

results = set()
df['text'].str.lower().str.split().apply(results.update)
print(results)

set(['someone', 'ft.jgt', 'my', 'is', 'to', 'going', 'place', 'nickname'])

Or use with Counter() from comments:

from collections import Counter
results = Counter()
df['text'].str.lower().str.split().apply(results.update)
print(results)

Categories

python - Count distinct words from a Pandas Data Frame

python - Count distinct words from a Pandas Data Frame

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags