Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
84 views
in Technique[技术] by (71.8m points)

python - Optimal realization of cleaning the names function


I wanted to ask you about the best function performance.
I have a function for cleaning names which runs through pandas data frame column where each element is a long string. Function check if string starts with something return x, if str contain something return y, etc. I have like tons of such conditions (20-30 elifs) with different regex cleaning constructions (each regex depends on how the name is look like):
def lets_make_a_short_name(row):
    name = row['name']
    short_name =0
    if name.startswith('something'): short_name = 'something'
    elif (name.startswith('something') or name.startswith('something')):
          short_name = re.search('(?<=_)[^_]+(?=_)',name).group()
    ....
    else: short_name = 'something'
return short_name

What will give me the best performance:

conditions = [df['bruto'] / df['age'] > 100, 
(df['bruto'] / df['age'] <= 100) & (df['bruto'] / df['age'] > 50), (df['bruto'] / df['age'] < 50) & 
(df['bruto'] / df['age'] > 0)]
outputs = ['high salary', 'medium salary', 'low salary']
df['salary_age_relation'] = np.select(conditions, outputs, 'no salary')
  • or maybe something else?

Thanks in advance!

question from:https://stackoverflow.com/questions/65869147/optimal-realization-of-cleaning-the-names-function

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)
Waitting for answers

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...