I have a pretty long loop in my function, which should overwrite a current dataframe (31krows and 370 rows) like this:
The goal is to look up a value in a dataframe (df_look_up) and based on a condition in the df_patients dataframe to overwrite the current dataframe (df_patients).
The function I have so far works fine on a small sample set, but runs days on the bigger one.
def prepare_df_genetics(df_look_up, df_patients):
#iterate through columns in look-Up table
for index_col, column_snip in enumerate(df_look_up):
#print(str(index_col) + ":" + column_snip)
# if A1 == ALT,
if(df_look_up[column_snip].loc['A1'] == df_look_up[column_snip].loc['ALT']):
data = df_patients.loc[:, [column_snip]]
for index, row in data.iterrows():
if (row[column_snip]) == '1/1':
df_patients.loc[index,column_snip] = "2"
elif (row[column_snip]) == '0/1':
df_patients.loc[index,column_snip] = "1"
elif (row[column_snip]) == '0/0':
df_patients.loc[index,column_snip] = "0"
else:
df_patients.loc[index,column_snip] = "NaN"
#if A1 == REF,
elif (df_look_up[column_snip].loc['A1'] == df_look_up[column_snip].loc['REF']):
data = df_patients.loc[:, [column_snip]]
for index, row in data.iterrows():
if (row[column_snip]) == '0/0':
df_patients.loc[index,column_snip] = "2"
elif (row[column_snip]) == '0/1':
df_patients.loc[index,column_snip] = "1"
elif (row[column_snip]) == '1/1':
df_patients.loc[index,column_snip] = "0"
else:
df_patients.loc[index,column_snip] = "NaN"
return df_patients
The two given tables look like this:
df_lookup table and df_patients table
And the desired overwritten df_patient table looks like this:
Desired outcome: df_patients
My question is if anybody has an idea to make it more efficient? I tried to work with lambda and iterrows and so on, but none of them really worked.
Any help would be highly appreciated!
question from:
https://stackoverflow.com/questions/65901495/simplify-for-loop-in-python-for-pandas-dataframe 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…