pandas - Concat python dataframes based on unique rows

Question

Welcome To Ask or Share your Answers For Others

pandas - Concat python dataframes based on unique rows

posted Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

pandas - Concat python dataframes based on unique rows

My dataframe reads like :

df1

user_id    username firstname lastname 
 123         abc      abc       abc
 456         def      def       def 
 789         ghi      ghi       ghi

df2

user_id     username  firstname lastname
 111         xyz       xyz       xyz
 456         def       def       def
 234         mnp       mnp        mnp

Now I want a output dataframe like

 user_id    username firstname lastname 
 123         abc      abc       abc
 456         def      def       def 
 789         ghi      ghi       ghi
 111         xyz       xyz       xyz
 234         mnp       mnp        mnp

As user_id 456 is common across both the dataframes. I have tried groupby on user_id groupby(['user_id']) . But looks like groupby need to be followed by some aggregation which I don't want here.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-23T19:38:22+0000

Use concat + drop_duplicates:

df = pd.concat([df1, df2]).drop_duplicates('user_id').reset_index(drop=True)
print (df)
   user_id username firstname lastname
0      123      abc       abc      abc
1      456      def       def      def
2      789      ghi       ghi      ghi
3      111      xyz       xyz      xyz
4      234      mnp       mnp      mnp

Solution with groupby and aggregate first is slowier:

df = pd.concat([df1, df2]).groupby('user_id', as_index=False, sort=False).first()
print (df)
   user_id username firstname lastname
0      123      abc       abc      abc
1      456      def       def      def
2      789      ghi       ghi      ghi
3      111      xyz       xyz      xyz
4      234      mnp       mnp      mnp

EDIT:

Another solution with boolean indexing and numpy.in1d:

df = pd.concat([df1, df2[~np.in1d(df2['user_id'], df1['user_id'])]], ignore_index=True)
print (df)
   user_id username firstname lastname
0      123      abc       abc      abc
1      456      def       def      def
2      789      ghi       ghi      ghi
3      111      xyz       xyz      xyz
4      234      mnp       mnp      mnp

Categories

pandas - Concat python dataframes based on unique rows

pandas - Concat python dataframes based on unique rows

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags