Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
87 views
in Technique[技术] by (71.8m points)

python - merging on pandas: reduce the set of merging variables when match is not possible

Using python, I want to merge on multiple variables; A, B, C, but when realization a-b-c in one dataset is missing, use the finer combination that the observation has (like b-c).

Example:

Suppose I have a dataset (df1) containing person's characteristics (gender, married, city). And another dataset (df2) that I have the median income of a person according to their gender, city, married (created with a groupby).

Then I want to input that median income into the first dataset (df1) matching in as many characterisics as possible. That is if individual has characteristics gender-city-married that has median income, use that value. If the individual has characteristics that there is only city-married median income, to use that value.

Something like that

df1 = pd.DataFrame({'Male':['0', '0', '1','1'],'Married':['0', '1', '0','1'], 'City': ['NY', 'NY', 'NY', 'NY']})
  Male Married City
    0       0   NY
    0       1   NY
    1       0   NY
    1       1   NY

df2 = pd.DataFrame({'Male':['0', '0', '1'],'Married':['0', '1', '1'], 'City': ['NY', 'NY','NY'], 'income':['300','400', '500']})

  Male Married City income
    0       0   NY    300
    0       1   NY    400
    1       1   NY    500
'''
and the desired outcome:
'''
desired_df1:
  Male Married City income
   0      0    NY    300
   0      1    NY    400
   1      0    NY    300
   1      1    NY    400

I was thinking to do a 1st merge by=['male','married','city'], and then fill missing values from a 2nd merge by=['married','city']. But I think there should be a more systematic and simpler way. Any suggestions?

Thanks and sorry if formulation is not correct or it is duplicate (I look deeply and didn't find anything).

question from:https://stackoverflow.com/questions/65830363/merging-on-pandas-reduce-the-set-of-merging-variables-when-match-is-not-possibl

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You can do a groupby and fillna too after merging:

out = df1.merge(df2,on=['Male','Married','City'],how='left')
out['income'] = (out['income'].fillna(out.groupby(['Married','City'])['income']
                                                     .fillna(method='ffill')))

print(out)

  Male Married City income
0    0       0   NY    300
1    0       1   NY    400
2    1       0   NY    300
3    1       1   NY    500 # <- Note that this should be 500 not 400

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...