Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
317 views
in Technique[技术] by (71.8m points)

python - Compare two dataframes based on two numeric columns and find rows matching within a numeric threshold of eachother

I have two separate datasets that follow this structure:

import pandas as pd

data_one = {'ID':[281.1, 290.125, 450.123, 980.11,   1200,  130,  12,      500,   401.1,  1025.349], 
        'Code':  [201,   205.1,   900,     1200.54,  52,    44,   111.223, 192.1, 120.99, 11.1]} 

df_one = pd.DataFrame(data_one)
df_one
  ID        Code
0 281.100   201.000
1 290.125   205.100
2 450.123   900.000
3 980.110   1200.540
4 1200.000  52.000
5 130.000   44.000
6 12.000    111.223
7 500.000   192.100
8 401.100   120.990
9 1025.349  11.100
data_two = {'ID':[405.122, 12.125, 11356.12, 85.111,    1025.4069,   112.1111,  2112,      1200.6582,   980.1035,     441.123,  52842.584, 5648.12, 11], 
        'Code':  [1000,    33.1,   222,      101.541,   26.75,       25.2564,   11.08,     192.1,       1220.1258,    11.1,     7894,      1323,    123.123]}

df_two = pd.DataFrame(data_two) 
df_two
      ID          Code
0     405.1220    1000.0000
1     12.1250     33.1000
2     11356.1200  222.0000
3     85.1110     101.5410
4     1025.4069   26.7500
5     112.1111    25.2564
6     2112.0000   11.0800
7     1200.6582   192.1000
8     980.1035    1220.1258
9     441.1230    11.1000
10    52842.5840  7894.0000
11    5648.1200   1323.0000
12    11.0000     123.1230

I want to find rows between these two dataframes where ID and Code are in agreement but within a certain numeric threshold of ±0.5 for ID and ±30 for Code, respectively.

So a match is returned when for particular row:

df_one.ID is in this range: [df_two.ID-0.5,df_two.ID+0.5] AND df_one.Code is in the range [df_two.Code-30,df_two.code+30]

For example, row 3 from df_one matches with row 8 from df_two because ID and Code both satisfy the conditions.

            ID.1      Code.1       ID.2        Code.2
match_1     980.110   1200.540     980.1035    1220.1258

Another match is:

            ID.1      Code.1       ID.2        Code.2
match_2     1025.349  11.100       1025.4069   26.7500
question from:https://stackoverflow.com/questions/65941170/compare-two-dataframes-based-on-two-numeric-columns-and-find-rows-matching-withi

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

This is probably a brute force way but will always find your desired solution.

import pandas as pd
import numpy as np

df3 = pd.DataFrame({}, columns=['ID.1', 'Code.1', 'ID.2', 'Code.2'])
id_1 = []
id_2 = []
code_1 = []
code_2 = []
for i, a in enumerate(list(df_one['ID'].values)):
  for i2, a2 in enumerate(list(df_two['ID'].values)):
    if ((((df_one.iloc[i, 0]) >= (df_two.iloc[i2, 0] - 0.5)) and  ((df_one.iloc[i, 0]) <= (df_two.iloc[i2, 0] + 0.5))) and 
        (((df_one.iloc[i, 1]) >= (df_two.iloc[i2, 1] - 30)) and ((df_one.iloc[i, 1]) <= (df_two.iloc[i2, 1] + 30)))):
      id_1.append(df_one.iloc[i, 0])
      id_2.append(df_two.iloc[i2, 0])
      code_1.append(df_one.iloc[i, 1])
      code_2.append(df_two.iloc[i2, 1])

df3['ID.1'] = id_1
df3['Code.1'] = code_1
df3['ID.2'] = id_2
df3['Code.2'] = code_2
df3

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...