Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
106 views
in Technique[技术] by (71.8m points)

python - Comparing the between two data frames consisting of repetitive data and execute some operations

The data under the poiid colum are unique and do not repeat itself. (dataframe name is df1)

enter image description here

On the other hand, The following dataframe can repeat poiids.

enter image description here

My main goal is to delete poiid in df2 if it is not found in df1. How can I handle it most effectively?

I'm adding two separate dummy dataframes for easy testing.

data1 = {'userid': [1, 2, 5, 5, 7, 10, 10, 10, 15, 15], 
         'checkinid': [100, 120, 90, 95, 100, 130, 90, 80, 200, 120]}

data2 = {'checkinid': [100, 120, 90, 95], 
         'latitude': [-90, -92, 48, 52],
         'longitude': [42, 54, 51, -27]}

In these examples, some checkinids on both dataframe are different.

Expecting output according to dummy datasets for data1.

expectingoutput= {'userid': [1, 2, 5, 5, 7, 10, 15], 
                  'checkinid': [100, 120, 90, 95, 100,90,120]}
question from:https://stackoverflow.com/questions/65921263/comparing-the-between-two-data-frames-consisting-of-repetitive-data-and-execute

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Based on the data you added. The fastest way solve it will be by merging the two dataframes, checking where you have the NaNs (which will point you to unfound keys) and then filter them out:

Here how to do it:

data1 = {'userid': [1, 2, 5, 5, 7, 10, 10, 10, 15, 15], 
         'checkinid': [100, 120, 90, 95, 100, 130, 90, 80, 200, 120]}

data2 = {'checkinid': [100, 120, 90, 95], 
         'latitude': [-90, -92, 48, 52],
         'longitude': [42, 54, 51, -27]}

expectingoutput= {'userid': [1, 2, 5, 5, 7, 10, 15], 
                  'checkinid': [100, 120, 90, 95, 100,90,120]}

# Create df1
df1 = pd.DataFrame(data1)
df1

enter image description here

# Create df2
df2 = pd.DataFrame(data2)
df2

enter image description here

# Merge both dataframes using using the key checkinid 
merged_df = df1.merge(df2, how='left', on=['checkinid'])
merged_df

enter image description here

# Find those rows where NaNs are present and remove them from the original DataFrame
df1[~merged_df.isna().any(axis=1)]

enter image description here


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

1.4m articles

1.4m replys

5 comments

57.0k users

...