Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
253 views
in Technique[技术] by (71.8m points)

Simplify for loop in python for pandas dataframe

I have a pretty long loop in my function, which should overwrite a current dataframe (31krows and 370 rows) like this:

The goal is to look up a value in a dataframe (df_look_up) and based on a condition in the df_patients dataframe to overwrite the current dataframe (df_patients).

The function I have so far works fine on a small sample set, but runs days on the bigger one.

def prepare_df_genetics(df_look_up, df_patients):

#iterate through columns in look-Up table

for index_col, column_snip in enumerate(df_look_up):
    #print(str(index_col) + ":" +  column_snip)

    # if A1 == ALT, 
    if(df_look_up[column_snip].loc['A1'] == df_look_up[column_snip].loc['ALT']):
        data = df_patients.loc[:, [column_snip]]
        for index, row in data.iterrows():
            if (row[column_snip]) == '1/1':
                df_patients.loc[index,column_snip] = "2" 
            elif (row[column_snip]) == '0/1':
                df_patients.loc[index,column_snip] = "1" 
            elif (row[column_snip]) == '0/0':
                df_patients.loc[index,column_snip]  = "0"
            else:
                df_patients.loc[index,column_snip]  = "NaN"

    #if A1 == REF, 
    elif (df_look_up[column_snip].loc['A1'] == df_look_up[column_snip].loc['REF']):
        data = df_patients.loc[:, [column_snip]]
        for index, row in data.iterrows():
            if (row[column_snip]) == '0/0':
                df_patients.loc[index,column_snip]  = "2"
            elif (row[column_snip]) == '0/1':
                df_patients.loc[index,column_snip]  = "1"
            elif (row[column_snip]) == '1/1':
                df_patients.loc[index,column_snip]  = "0"
            else:
                df_patients.loc[index,column_snip]  = "NaN"

    
return df_patients

The two given tables look like this:

df_lookup table and df_patients table

And the desired overwritten df_patient table looks like this:

Desired outcome: df_patients

My question is if anybody has an idea to make it more efficient? I tried to work with lambda and iterrows and so on, but none of them really worked.

Any help would be highly appreciated!

question from:https://stackoverflow.com/questions/65901495/simplify-for-loop-in-python-for-pandas-dataframe

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)
Waitting for answers

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...