EDIT: Session generation from log file analysis with pandas seems to be exactly what I was looking for.
I have a dataframe that includes non-unique time stamps, and I'd like to group them by time windows. The basic logic would be -
1) Create a time range from each time stamp by adding n minutes before and after the time stamp.
2) Group by time ranges that overlap. The end effect here would be that a time window would be as small as a single time stamp +/- the time buffer, but there is no cap on how large a time window could be, as long as multiple events were less distance apart than the time buffer
It feels like a df.groupby(pd.TimeGrouper(minutes=n)) is the right answer, but I don't know how to have the TimeGrouper create dynamic time ranges when it sees events that are within a time buffer.
For instance, if I try a TimeGrouper('20s') against a set of events: 10:34:00, 10:34:08, 10:34:08, 10:34:15, 10:34:28 and 10:34:54, then pandas will give me three groups (events falling between 10:34:00 - 10:34:20, 10:34:20 - 10:34:40, and 10:34:40-10:35:00). I would like to just get two groups back, 10:34:00 - 10:34:28, since there is no more than a 20 second gap between events in that time range, and a second group that is 10:34:54.
What is the best way to find temporal windows that are not static bins of time ranges?
Given a Series that looks something like -
time
0 2013-01-01 10:34:00+00:00
1 2013-01-01 10:34:12+00:00
2 2013-01-01 10:34:28+00:00
3 2013-01-01 10:34:54+00:00
4 2013-01-01 10:34:55+00:00
5 2013-01-01 10:35:19+00:00
6 2013-01-01 10:35:30+00:00
If I do a df.groupby(pd.TimeGrouper('20s')) on that Series, I would get back 5 group, 10:34:00-:20, :20-:40, :40-10:35:00, etc. What I want to do is have some function that creates elastic timeranges.. as long as events are within 20 seconds, expand the timerange. So I expect to get back -
2013-01-01 10:34:00 - 2013-01-01 10:34:48
0 2013-01-01 10:34:00+00:00
1 2013-01-01 10:34:12+00:00
2 2013-01-01 10:34:28+00:00
2013-01-01 10:34:54 - 2013-01-01 10:35:15
3 2013-01-01 10:34:54+00:00
4 2013-01-01 10:34:55+00:00
2013-01-01 10:35:19 - 2013-01-01 10:35:50
5 2013-01-01 10:35:19+00:00
6 2013-01-01 10:35:30+00:00
Thanks.
See Question&Answers more detail:
os