- Use
pandas.Series.str.extract
with a positive lookahead conditional.
flags=re.IGNORECASE
is used to ignore the case of 'pass'
df.Text.str.lower().str.extract('(w+(?=s+pass))')
can be used instead of importing re
for the flag.
import pandas as pd
import re
# test dataframe
data = {'play_id': ['1', '2', '3'], 'type': ['pass', 'pass', 'rush'], 'Text': ['Jon PASS complete to Ben.', 'Clock 14:52, Jon pass complete to Mitch.', 'Frank rush.']}
df = pd.DataFrame(data)
# display(df)
play_id type Text
1 pass Jon PASS complete to Ben.
2 pass Clock 14:52, Jon pass complete to Mitch.
3 rush Frank rush.
# extract
df['passer'] = df.Text.str.extract('(w+(?=s+pass))', flags=re.IGNORECASE)
# display(df)
play_id type Text passer
1 pass Jon PASS complete to Ben. Jon
2 pass Clock 14:52, Jon pass complete to Mitch. Jon
3 rush Frank rush. NaN
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…