I have two columns in my dataframe 'START_TIME' and 'END_TIME' which i zipped into a list and brought it to the below form. Used the following snippet to generate that.
zippedList = list(zip(new_df['START_TIME'],new_df['END_TIME']))
[(Timestamp('2020-06-09 06:00:00'), Timestamp('2020-06-09 16:00:00')),
(Timestamp('2020-06-09 02:00:00'), Timestamp('2020-06-09 06:00:00')),
(Timestamp('2020-06-10 02:00:00'), Timestamp('2020-06-10 06:00:00')),
(Timestamp('2020-06-09 16:00:00'), Timestamp('2020-06-10 02:00:00')),
(Timestamp('2020-06-10 06:00:00'), Timestamp('2020-06-10 16:00:00')),
(Timestamp('2020-06-10 16:00:00'), Timestamp('2020-06-11 02:00:00')),
(Timestamp('2020-06-11 02:00:00'), Timestamp('2020-06-11 06:00:00')),
(Timestamp('2020-06-11 01:00:00'), Timestamp('2020-06-11 05:00:00')),
(Timestamp('2020-06-11 06:00:00'), Timestamp('2020-06-11 16:00:00')),
(Timestamp('2020-06-11 16:00:00'), Timestamp('2020-06-12 02:00:00'))]
I went on to iterate through this list and find overlapping values too through this one
for elem1 in zippedList:
for elem2 in zippedList:
#print(elem1,elem2)
i1= pd.Interval(elem1[0],elem1[1],closed='neither')
i2= pd.Interval(elem2[0],elem2[1],closed='neither')
if (i1.overlaps(i2)) and elem1!=elem2:
print('OVERLAP FOUND!!')
print(i1,i2)
Got these duplicated overlaps.
OVERLAP FOUND!!
(2020-06-10 16:00:00, 2020-06-11 02:00:00) (2020-06-11 01:00:00, 2020-06-11 05:00:00)
OVERLAP FOUND!!
(2020-06-11 02:00:00, 2020-06-11 06:00:00) (2020-06-11 01:00:00, 2020-06-11 05:00:00)
OVERLAP FOUND!!
(2020-06-11 01:00:00, 2020-06-11 05:00:00) (2020-06-10 16:00:00, 2020-06-11 02:00:00)
OVERLAP FOUND!!
(2020-06-11 01:00:00, 2020-06-11 05:00:00) (2020-06-11 02:00:00, 2020-06-11 06:00:00)
I have a couple of questions here.
How do I avoid these duplicate overlaps found. For instance, (2020-06-11 02:00:00, 2020-06-11 06:00:00) (2020-06-11 01:00:00, 2020-06-11 05:00:00) and (2020-06-11 01:00:00, 2020-06-11 05:00:00) (2020-06-11 02:00:00, 2020-06-11 06:00:00) are the same!
How do I create a boolean column in the original dataframe (new_df) to mark True for all timestamp pairs which have been found to have overlapped. For instance, along the rows Timestamp('2020-06-11 02:00:00'), Timestamp('2020-06-11 06:00:00')) and (Timestamp('2020-06-11 01:00:00'), Timestamp('2020-06-11 05:00:00')), should be marked as True. How do i achieve this?
Note that the overlap findings are performed with the zipped list(zippedList) and not with my dataframe(new_df).
Thanks in advance!
See Question&Answers more detail:
os