Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
289 views
in Technique[技术] by (71.8m points)

How do I calculate a "rolling" statistic on this pandas table, but with the time-window centered on the datapoint?

Suppose I have the following pandas table:

import pandas as pd
import math
l = [['f8196bb6d34a9f44e950e30f15e1a2ab_6862', 1605148870, 51.98157826, 5.85744811], ['f8196bb6d34a9f44e950e30f15e1a2ab_6862', 1605141900, 51.98157842, 5.85744476], ['f8196bb6d34a9f44e950e30f15e1a2ab_6862', 1605145244, 51.98157826, 5.85744811], ['f8196bb6d34a9f44e950e30f15e1a2ab_6862', 1605153343, 51.98157826, 5.85744811], ['f8196bb6d34a9f44e950e30f15e1a2ab_6862', 1605143645, 51.98157842, 5.85744476], ['f8196bb6d34a9f44e950e30f15e1a2ab_6862', 1605159323, 51.98157826, 5.85744811], ['f8196bb6d34a9f44e950e30f15e1a2ab_6862', 1605157740, 51.98157826, 5.85744811], ['f8196bb6d34a9f44e950e30f15e1a2ab_6862', 1605150342, 51.98157826, 5.85744811]]
d = pd.DataFrame.from_records(l, columns=['device_zip', 'ts', 'lat', 'lon'])
d.sort_values(by=['ts'], inplace=True)
d['t'] = pd.to_datetime(d['ts'].astype(int), unit='s')
d['dummy'] = d.t.dt.hour

How do I calculate a new column where for every row, I count the number of rows with the dummy value > 40 in the following interval {1 minute before row's timestamp, 1 minute after row's timestamp} ? I've played around with the rolling function which can take a timewindow parameter, but I don't think it's possible to center the timewindow on each row.

I've been able to do what I want with an ugly loop construct, but it's quite slow. There must be a faster and more elegant way to do this.

question from:https://stackoverflow.com/questions/65851888/how-do-i-calculate-a-rolling-statistic-on-this-pandas-table-but-with-the-time

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Indeed centered rolling with datetime does not seem possible. One work around is to do two rolling with half of the window you want and the second rolling being on the reverse data with [::-1], then substract the value of the row as it has been counted twice. With the provided data, it is hard to implement your needs, so here are random data:

# random data
import numpy as np
np.random.seed(2)
nb_rows = 20
d = pd.DataFrame(
    {'t':np.sort(
            np.random.choice(
                pd.date_range('2020-01-22 12:00:00',periods=nb_rows*10, freq='1s'), 
                size=nb_rows, replace=False)),   
     'dummy':40+np.random.choice([1,-1], size=nb_rows)})

Now you want to create a column that meets your criteria to use the sum, define the semi window and do both rolling:

d['dummy_count'] = d['dummy']>40
semi_win = '1T' # one minute for a 2 min window centered
d['roll_2T'] = (
    d.rolling(window=semi_win, min_periods=1, on='t')['dummy_count'].sum() 
    + d[::-1].rolling(window=semi_win, min_periods=1, on='t')['dummy_count'].sum()
    - d['dummy_count']
)

print(d)
                     t  dummy  dummy_count  roll_2T
0  2020-01-22 12:00:02     39        False      3.0 # value is 41 for 3 times up to 12:01:02
1  2020-01-22 12:00:03     39        False      3.0
2  2020-01-22 12:00:10     39        False      3.0
3  2020-01-22 12:00:12     39        False      3.0
4  2020-01-22 12:00:13     39        False      3.0
5  2020-01-22 12:00:14     41         True      3.0 
6  2020-01-22 12:00:29     39        False      3.0
7  2020-01-22 12:00:35     39        False      4.0
8  2020-01-22 12:00:44     41         True      4.0
9  2020-01-22 12:00:54     41         True      5.0 
10 2020-01-22 12:01:25     39        False      4.0 # 4 times 41 between 12:00:25 and 12:02:25
11 2020-01-22 12:01:32     41         True      4.0
12 2020-01-22 12:01:52     41         True      3.0
13 2020-01-22 12:01:53     39        False      3.0
14 2020-01-22 12:01:55     39        False      3.0
15 2020-01-22 12:02:06     39        False      3.0
16 2020-01-22 12:02:54     41         True      2.0
17 2020-01-22 12:03:02     39        False      2.0
18 2020-01-22 12:03:13     41         True      2.0
19 2020-01-22 12:03:19     39        False      2.0

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...