Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
224 views
in Technique[技术] by (71.8m points)

Merging of 2 separate excel file based 3 condition using python, pandas

1st Step of my question which is currently not solved is in Python removing rows with time condition

2nd Step of the Question: After having the filter data from step 1, would like to do a join from DF1 to DF2 based on the condition of same ID and Type and join the nearest Datetime from DF 2 to DF 1 (Date time of DF1 must be after DF2)

E.g. DF1 - EXIT data

   DatetimeX            ID    Type
   2020-01-01 02:00:01 12345 C
   2020-01-01 02:00:01 13333 D
   2020-01-01 02:00:50 13333 E
   2020-01-01 16:00:01 12211 C
   2020-01-02 21:00:01 12211 C
   2020-01-03 17:00:01 12211 C
   2020-01-04 17:00:01 12211 C
   2020-01-05 21:00:01 12211 C

E.g. DF2 - ENTRY data

   DatetimeE            ID    Type
   2020-01-01 01:00:00 12345 C
   2020-01-01 00:00:01 77777 C
   2020-01-01 00:00:05 12345 C
   2020-01-01 00:00:20 12345 C
   2020-01-01 02:00:05 13333 D
   2020-01-01 04:00:50 13333 E
   2020-01-01 07:00:01 12211 C

Endstate after join e.g. DF3

   DatetimeX            ID   Type DatetimeE
   2020-01-01 02:00:01 12345 C    2020-01-01 01:00:00 
   2020-01-01 02:00:01 13333 D
   2020-01-01 02:00:50 13333 E
   2020-01-01 02:00:01 12211 C    2020-01-01 01:00:30 
   2020-01-02 21:00:01 12211 C    2020-01-01 07:00:01

I've tried exploring merge and merge_asof but the combined data of DF3 does not match any of the data. Would like to just request for some guidance on how I can start comparing.

Thank you!

Updated with more data sample

question from:https://stackoverflow.com/questions/65840485/merging-of-2-separate-excel-file-based-3-condition-using-python-pandas

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)
DF1['DatetimeX'] = pd.to_datetime(DF1['DatetimeX'])
DF2['DatetimeE'] = pd.to_datetime(DF2['DatetimeE'])

output = []

for index, row in DF1.iterrows():
    output.append(DF2[(DF2['ID'] == row['ID']) & (DF2['Type'] == row['Type']) & (DF2['DatetimeE'] < row['DatetimeX'])]['DatetimeE'].max())

DF1['DatetimeE'] = output

another way using apply

DF1['DatetimeX'] = pd.to_datetime(DF1['DatetimeX'])
DF2['DatetimeE'] = pd.to_datetime(DF2['DatetimeE'])

DF1['DatetimeE']  = DF1.apply(lambda row: DF2[(DF2['ID'] == row[1]) & (DF2['Type'] == row[2]) & (DF2['DatetimeE'] < row[0])]['DatetimeE'].max(), axis=1)

output

DatetimeX   ID  Type    DatetimeE
0   2020-01-01 02:00:01 12345   C   2020-01-01 01:00:00
1   2020-01-01 02:00:01 13333   D   NaT
2   2020-01-01 02:00:50 13333   E   NaT
3   2020-01-01 02:00:01 12211   C   2020-01-01 01:00:30

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...