Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
56 views
in Technique[技术] by (71.8m points)

python - How can I use split() in a string when broadcasting a dataframe's column?

Take the following dataframe:

df = pd.DataFrame({'col_1':[0, 1], 'col_2':['here 123', 'here 456']})

Result:

   col_1     col_2
0      0  here 123
1      1  here 456

I need to create a 3rd column (broadcasting), using a condition on col_1, and splitting the string on col_2. This is ok to do:

df['col_3'] = float('NaN')

df.loc[df['col_1'] == 1, ['col_3']] = df['col_2'].str.slice(5, 8)

Result:

   col_1     col_2 col_3
0      0  here 123   NaN
1      1  here 456   456

But I need to specify dynamic indexes to split the string on col_2, instead of (5, 8).

When I try to run the following code it does not work, because df['col_2'] is treated as a Series:

df.loc[df['col_1'] == 1, ['col_3']] = df['col_2'].split(' ')[0]

I'm spending a huge time trying to solve this without needing to iterate the dataframe.

question from:https://stackoverflow.com/questions/65893903/how-can-i-use-split-in-a-string-when-broadcasting-a-dataframes-column

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

This one liner does the trick.

df['col_3']=[y.split(' ')[1] if x==1 else float('nan') for x,y in df[['col_1','col_2']].values]

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...