I would do this using a shift
and a cumsum
(here's a simple example, with numbers instead of times - but they would work exactly the same):
In [11]: s = pd.Series([1., 1.1, 1.2, 2.7, 3.2, 3.8, 3.9])
In [12]: (s - s.shift(1) > 0.5).fillna(0).cumsum(skipna=False) # *
Out[12]:
0 0
1 0
2 0
3 1
4 1
5 2
6 2
dtype: int64
* the need for skipna=False appears to be a bug.
Then you can use this in a groupby apply
:
In [21]: df = pd.DataFrame([[1.1, 1.7, 2.5, 2.6, 2.7, 3.4], list('AAABBB')]).T
In [22]: df.columns = ['time', 'ip']
In [23]: df
Out[23]:
time ip
0 1.1 A
1 1.7 A
2 2.5 A
3 2.6 B
4 2.7 B
5 3.4 B
In [24]: g = df.groupby('ip')
In [25]: df['session_number'] = g['time'].apply(lambda s: (s - s.shift(1) > 0.5).fillna(0).cumsum(skipna=False))
In [26]: df
Out[26]:
time ip session_number
0 1.1 A 0
1 1.7 A 1
2 2.5 A 2
3 2.6 B 0
4 2.7 B 0
5 3.4 B 1
Now you can groupby 'ip'
and 'session_number'
(and analyse each session).
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…