Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
134 views
in Technique[技术] by (71.8m points)

python - Tokenize dataframe column and create new dataframe for result

I have the following dataframe

pd.DataFrame({'category': [1,2,1], 'names' : ['ab c', 's', 'dm ab aaa']})

category   names
0   1      ab c
1   2      s
2   1      dm ab aaa

Really I need to find all unique tokens(separated by space) in names column, assign corresponding category and create new datafrane as you can see below:

pd.DataFrame({'category' : [1, 1,2,1,1,1], 'names' : ['ab', 'c', 's', 'dm', 'ab', 'aaa']})

category   names
0   1      ab
1   1      c
2   2      s
3   1      dm
4   1      ab
5   1      aaa

Please help me and how to do it the best way...

question from:https://stackoverflow.com/questions/66068405/tokenize-dataframe-column-and-create-new-dataframe-for-result

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You can split the names column first and then explode it:

df.assign(names = df.names.str.split()).explode('names')

#   category names
#0         1    ab
#0         1     c
#1         2     s
#2         1    dm
#2         1    ab
#2         1   aaa

If you need to reset index (from @KRKirov's comment):

df.assign(names = df.names.str.split()).explode('names').reset_index(drop=True)

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...