Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
248 views
in Technique[技术] by (71.8m points)

python - How to join two pandas dataframes based on a date in df1 being >= date in df2

I have a large data frame with key IDs, states, start dates and other characteristics. I have another data frame with states, a start date and a "1" to signify a flag.

I want to join the two, based on the state and the date in df1 being greater than or equal to the date in df2.

Take the example below. df1 is the table of states, start dates, and a 1 for a flag. df2 is a dataframe that needs those flags if the date in df2 is >= the date in df1. The end result is df3. The only observations get the flag whose states match and dates are >= the original dates.

import pandas as pd

dict1 = {'date':['2020-01-01', '2020-02-15', '2020-02-04','2020-03-17',
                 '2020-06-15'],
         'state':['AL','FL','MD','NC','SC'],
         'flag': [1,1,1,1,1]}
df1 = pd.DataFrame(dict1)
df1['date'] = pd.to_datetime(df1['date'])

dict2 = {'state': ['AL','FL','MD','NC','SC'],
         'keyid': ['001','002','003','004','005'],
         'start_date':['2020-02-01', '2020-01-15', '2020-01-30','2020-05-18',
                       '2020-05-16']}
df2 = pd.DataFrame(dict2)
df2['start_date'] = pd.to_datetime(df2['start_date'])

df3 = df2
df3['flag'] = [0,1,1,0,1]

How do I get to df3 programmatically? My actual df1 has a row for each state. My actual df2 has over a million observations with different dates.

question from:https://stackoverflow.com/questions/65843716/how-to-join-two-pandas-dataframes-based-on-a-date-in-df1-being-date-in-df2

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Use merge_asof for merge by greater or equal datetimes by parameter direction='forward':

A “forward” search selects the first row in the right DataFrame whose ‘on’ key is greater than or equal to the left’s key.

df2['need'] = [0,1,1,0,1]

df1 = df1.sort_values('date')
df2 = df2.sort_values('start_date')

df = pd.merge_asof(df2, 
                   df1, 
                   left_on='start_date', 
                   right_on='date', 
                   by='state',
                   direction='forward')


df['flag'] = df['flag'].fillna(0).astype(int)
print (df)

  state keyid start_date  need       date  flag
0    FL   002 2020-01-15     1 2020-02-15     1
1    MD   003 2020-01-30     1 2020-02-04     1
2    AL   001 2020-02-01     0        NaT     0
3    SC   005 2020-05-16     1 2020-06-15     1
4    NC   004 2020-05-18     0        NaT     0

You can also rename column for avoid appending in output DataFrame:

df2['need'] = [0,1,1,0,1]

df1 = df1.sort_values('date')
df2 = df2.sort_values('start_date')

df = pd.merge_asof(df2, 
                   df1.rename(columns={'date':'start_date'}), 
                   on='start_date', 
                   by='state',
                   direction='forward')

df['flag'] = df['flag'].fillna(0).astype(int)
print (df)
  state keyid start_date  need  flag
0    FL   002 2020-01-15     1     1
1    MD   003 2020-01-30     1     1
2    AL   001 2020-02-01     0     0
3    SC   005 2020-05-16     1     1
4    NC   004 2020-05-18     0     0

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...