python - merging on pandas: reduce the set of merging variables when match is not possible

Question

Welcome To Ask or Share your Answers For Others

python - merging on pandas: reduce the set of merging variables when match is not possible

posted Oct 7, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - merging on pandas: reduce the set of merging variables when match is not possible

Using python, I want to merge on multiple variables; A, B, C, but when realization a-b-c in one dataset is missing, use the finer combination that the observation has (like b-c).

Example:

Suppose I have a dataset (df1) containing person's characteristics (gender, married, city). And another dataset (df2) that I have the median income of a person according to their gender, city, married (created with a groupby).

Then I want to input that median income into the first dataset (df1) matching in as many characterisics as possible. That is if individual has characteristics gender-city-married that has median income, use that value. If the individual has characteristics that there is only city-married median income, to use that value.

Something like that

df1 = pd.DataFrame({'Male':['0', '0', '1','1'],'Married':['0', '1', '0','1'], 'City': ['NY', 'NY', 'NY', 'NY']})
  Male Married City
    0       0   NY
    0       1   NY
    1       0   NY
    1       1   NY

df2 = pd.DataFrame({'Male':['0', '0', '1'],'Married':['0', '1', '1'], 'City': ['NY', 'NY','NY'], 'income':['300','400', '500']})

  Male Married City income
    0       0   NY    300
    0       1   NY    400
    1       1   NY    500
'''
and the desired outcome:
'''
desired_df1:
  Male Married City income
   0      0    NY    300
   0      1    NY    400
   1      0    NY    300
   1      1    NY    400

I was thinking to do a 1st merge by=['male','married','city'], and then fill missing values from a 2nd merge by=['married','city']. But I think there should be a more systematic and simpler way. Any suggestions?

Thanks and sorry if formulation is not correct or it is duplicate (I look deeply and didn't find anything).

question from:https://stackoverflow.com/questions/65830363/merging-on-pandas-reduce-the-set-of-merging-variables-when-match-is-not-possibl

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-06T19:36:47+0000

You can do a groupby and fillna too after merging:

out = df1.merge(df2,on=['Male','Married','City'],how='left')
out['income'] = (out['income'].fillna(out.groupby(['Married','City'])['income']
                                                     .fillna(method='ffill')))

print(out)

  Male Married City income
0    0       0   NY    300
1    0       1   NY    400
2    1       0   NY    300
3    1       1   NY    500 # <- Note that this should be 500 not 400

Categories

python - merging on pandas: reduce the set of merging variables when match is not possible

python - merging on pandas: reduce the set of merging variables when match is not possible

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags