it can also be done using numpy broadcast and boolean indexing like below
## load sample data
df1 = pd.DataFrame([('0', '2019-07-14 05:31:00', '0.020228', '0.026572'), ('1', '2019-07-14 06:32:00', '0.057780', '0.175499'), ('2', '2019-07-14 07:02:00', '0.076623', '0.875499')], columns=('id', 'timestamp', 'A', 'B'))
df2 = pd.DataFrame([('0', '2019-07-14 05:30:00', '2019-07-14 06:30:00', 'E1'), ('1', '2019-07-14 06:00:00', '2019-07-14 07:00:00', 'E2'), ('2', '2019-07-14 06:30:01', '2019-07-14 07:30:00', 'E3'), ('3', '2019-07-14 07:30:01', '2019-07-14 08:30:00', 'E4')], columns=('id', 'start', 'end', 'event'))
df1["timestamp"] = pd.to_datetime(df1["timestamp"])
df2["start"] = pd.to_datetime(df2["start"])
df2["end"] = pd.to_datetime(df2["end"])
Solution
## df2[["start"]] is a column vector of size m and df1.timestamp.values is row
## vector of size n then broad cast will result matrix of shape m,n which is
## result of comparing each pair of m and n
compare = (df2[["start"]].values<df1.timestamp.values) & (df2[["end"]].values>df1.timestamp.values)
## get cell numbers which is in range 0 to matrix size which meets the condition
ind = np.arange(len(df1)*len(df2))[compare.ravel()]
## calculate row and column index from cell number
pd.concat([df2.iloc[ind//len(df1)].reset_index(drop=True), df1.iloc[ind%len(df1)].reset_index(drop=True)], axis=1, sort=False)
Result
start end event timestamp A B
0 2019-07-14 05:30:00 2019-07-14 06:30:00 E1 2019-07-14 05:31:00 0.020228 0.026572
1 2019-07-14 06:00:00 2019-07-14 07:00:00 E2 2019-07-14 06:32:00 0.057780 0.175499
2 2019-07-14 06:30:01 2019-07-14 07:30:00 E3 2019-07-14 06:32:00 0.057780 0.175499
3 2019-07-14 06:30:01 2019-07-14 07:30:00 E3 2019-07-14 07:02:00 0.076623 0.875499
Edit
in response to comment from @baccandr here is some more explanation how the indexing working.
After comparison we get the compare matrix with boolean values like below
array([[ True, False, False],
[False, True, False],
[False, True, True],
[False, False, False]])
- you can think of this matrix as table with column representing index of
df1
which is (0,1,2) and rows representing index of df2
which is (0,1,2,3)
- The value in cell is True if corresponding row in df1 and df2 mets the condition e.g row 0 of df1 and row 0 of df2; row 2 of df1 and row 1 of df2 meets the condition
- To find rows meeting the condition from df1 and df2 separately we can use
compare
as index directly like df1[compare.T] and df2[compare] but it will not give rows in correct order for pairing. This will give row in increasing order of index for both which will not always be true.
- Now our aim here is to get indxes of both dataframes which meets the condition, in correct order. so what we need is index of df1 [0, 1, 2, 2] and index of df2 [0,1,1,2]. Using those indexes in df1 and df2 we get the match in correct order from both
- so what we did here is count cells of matrix from left to right then down then again left two right giving a unique number to each cell. Then filter the cells where conditions meets and convert that to index for df1 and df2.
As the matrix represents index for 2 dataframes in matrix format I think np.where may not work. Another way we can do it is use compare
as index for df2 and only find index of df1 like below
this will repeat index of df1 for each row of df2 and find the index of df1 in order with df2
ind_df1 = np.tile(np.arange(len(df1)), len(df2))[compare.ravel()]
pd.concat([df2[compare].reset_index(drop=True), df1.iloc[ind_df1].reset_index(drop=True)], axis=1, sort=False)
I hope this makes it clear, if you have some other idea then I would love to see it in comment or as answer