I wanted to ask you about the best function performance.
I have a function for cleaning names which runs through pandas data frame column where each element is a long string. Function check if string starts with something return x, if str contain something return y, etc.
I have like tons of such conditions (20-30 elifs) with different regex cleaning constructions (each regex depends on how the name is look like):
def lets_make_a_short_name(row):
name = row['name']
short_name =0
if name.startswith('something'): short_name = 'something'
elif (name.startswith('something') or name.startswith('something')):
short_name = re.search('(?<=_)[^_]+(?=_)',name).group()
....
else: short_name = 'something'
return short_name
What will give me the best performance:
conditions = [df['bruto'] / df['age'] > 100,
(df['bruto'] / df['age'] <= 100) & (df['bruto'] / df['age'] > 50), (df['bruto'] / df['age'] < 50) &
(df['bruto'] / df['age'] > 0)]
outputs = ['high salary', 'medium salary', 'low salary']
df['salary_age_relation'] = np.select(conditions, outputs, 'no salary')
Thanks in advance!
question from:
https://stackoverflow.com/questions/65869147/optimal-realization-of-cleaning-the-names-function 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…