I have two separate datasets that follow this structure:
import pandas as pd
data_one = {'ID':[281.1, 290.125, 450.123, 980.11, 1200, 130, 12, 500, 401.1, 1025.349],
'Code': [201, 205.1, 900, 1200.54, 52, 44, 111.223, 192.1, 120.99, 11.1]}
df_one = pd.DataFrame(data_one)
df_one
ID Code
0 281.100 201.000
1 290.125 205.100
2 450.123 900.000
3 980.110 1200.540
4 1200.000 52.000
5 130.000 44.000
6 12.000 111.223
7 500.000 192.100
8 401.100 120.990
9 1025.349 11.100
data_two = {'ID':[405.122, 12.125, 11356.12, 85.111, 1025.4069, 112.1111, 2112, 1200.6582, 980.1035, 441.123, 52842.584, 5648.12, 11],
'Code': [1000, 33.1, 222, 101.541, 26.75, 25.2564, 11.08, 192.1, 1220.1258, 11.1, 7894, 1323, 123.123]}
df_two = pd.DataFrame(data_two)
df_two
ID Code
0 405.1220 1000.0000
1 12.1250 33.1000
2 11356.1200 222.0000
3 85.1110 101.5410
4 1025.4069 26.7500
5 112.1111 25.2564
6 2112.0000 11.0800
7 1200.6582 192.1000
8 980.1035 1220.1258
9 441.1230 11.1000
10 52842.5840 7894.0000
11 5648.1200 1323.0000
12 11.0000 123.1230
I want to find rows between these two dataframes where ID and Code are in agreement but within a certain numeric threshold of ±0.5
for ID
and ±30
for Code
, respectively.
So a match is returned when for particular row:
df_one.ID is in this range: [df_two.ID-0.5,df_two.ID+0.5] AND df_one.Code is in the range [df_two.Code-30,df_two.code+30]
For example, row 3 from df_one matches with row 8 from df_two because ID and Code both satisfy the conditions.
ID.1 Code.1 ID.2 Code.2
match_1 980.110 1200.540 980.1035 1220.1258
Another match is:
ID.1 Code.1 ID.2 Code.2
match_2 1025.349 11.100 1025.4069 26.7500
question from:
https://stackoverflow.com/questions/65941170/compare-two-dataframes-based-on-two-numeric-columns-and-find-rows-matching-withi