Suppose I have the following pandas table:
import pandas as pd
import math
l = [['f8196bb6d34a9f44e950e30f15e1a2ab_6862', 1605148870, 51.98157826, 5.85744811], ['f8196bb6d34a9f44e950e30f15e1a2ab_6862', 1605141900, 51.98157842, 5.85744476], ['f8196bb6d34a9f44e950e30f15e1a2ab_6862', 1605145244, 51.98157826, 5.85744811], ['f8196bb6d34a9f44e950e30f15e1a2ab_6862', 1605153343, 51.98157826, 5.85744811], ['f8196bb6d34a9f44e950e30f15e1a2ab_6862', 1605143645, 51.98157842, 5.85744476], ['f8196bb6d34a9f44e950e30f15e1a2ab_6862', 1605159323, 51.98157826, 5.85744811], ['f8196bb6d34a9f44e950e30f15e1a2ab_6862', 1605157740, 51.98157826, 5.85744811], ['f8196bb6d34a9f44e950e30f15e1a2ab_6862', 1605150342, 51.98157826, 5.85744811]]
d = pd.DataFrame.from_records(l, columns=['device_zip', 'ts', 'lat', 'lon'])
d.sort_values(by=['ts'], inplace=True)
d['t'] = pd.to_datetime(d['ts'].astype(int), unit='s')
d['dummy'] = d.t.dt.hour
How do I calculate a new column where for every row, I count the number of rows with the dummy value > 40 in the following interval {1 minute before row's timestamp, 1 minute after row's timestamp} ? I've played around with the rolling
function which can take a timewindow parameter, but I don't think it's possible to center the timewindow on each row.
I've been able to do what I want with an ugly loop construct, but it's quite slow. There must be a faster and more elegant way to do this.
question from:
https://stackoverflow.com/questions/65851888/how-do-i-calculate-a-rolling-statistic-on-this-pandas-table-but-with-the-time 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…