Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
412 views
in Technique[技术] by (71.8m points)

python - forward looking rolling window in pandas - ragged index

hope someone can help me here I am trying to create a forward rolling window on a ragged time index. Pandas complains about monotonicity - which is obviously respected in my index. The normal backward window works just fine.

[EDIT] The reverse time index does not pass is_monotonic. So I guess it requires a monotonic rising index and not just monotonic index.

Anyone has a better alternative please thanks a lot!

In [352] tmp[::-1]
Out[352]: 
stamp
2018-04-23 06:45:16.920   -0.11
2018-04-23 06:45:16.919   -0.03
2018-04-23 06:45:16.918   -0.01
2018-04-23 06:45:16.917   -0.02
2018-04-23 06:45:16.916    0.03
2018-04-23 06:45:16.914    0.03
2018-04-23 06:45:16.911    0.03
2018-04-23 06:45:16.910    0.06
2018-04-23 06:45:16.909    0.09
2018-04-23 06:45:16.908    0.08
2018-04-23 06:45:16.907    0.18
2018-04-23 06:45:16.906    0.28
2018-04-23 06:45:16.905    0.28
2018-04-23 06:45:16.904    0.02
2018-04-23 06:45:16.903    0.09
2018-04-23 06:45:16.902    0.09
2018-04-23 06:45:16.901    0.09
2018-04-23 06:45:16.900    0.09
2018-04-23 06:45:16.899   -0.24
2018-04-23 06:45:16.898   -0.22
2018-04-23 06:45:16.894   -0.22
2018-04-23 06:45:16.799   -0.21
2018-04-23 06:45:16.798   -0.19
2018-04-23 06:45:16.797   -0.21
2018-04-23 06:45:15.057   -0.13
2018-04-23 06:45:15.056   -0.16
2018-04-23 06:45:13.382   -0.04
2018-04-23 06:45:13.381   -0.02
2018-04-23 06:45:13.380   -0.05
2018-04-23 06:45:13.379   -0.08
Name: d66, dtype: float64

In [353]: tmp[::-1].rolling('20L')
Traceback (most recent call last):

  File "<ipython-input-355-74bdfcdfbbd1>", line 1, in <module>
    tmp[::-1].rolling('20L')

  File "C:UsersluigiAnaconda3libsite-packagespandascoregeneric.py", line 7067, in rolling
    on=on, axis=axis, closed=closed)

  File "C:UsersluigiAnaconda3libsite-packagespandascorewindow.py", line 2069, in rolling
    return Rolling(obj, **kwds)

  File "C:UsersluigiAnaconda3libsite-packagespandascorewindow.py", line 86, in __init__
    self.validate()

  File "C:UsersluigiAnaconda3libsite-packagespandascorewindow.py", line 1104, in validate
    self._validate_monotonic()

  File "C:UsersluigiAnaconda3libsite-packagespandascorewindow.py", line 1136, in _validate_monotonic
    "monotonic".format(formatted))

ValueError: index must be monotonic

In [356]: tmp.index.is_monotonic
Out[356]: True

In [357]: tmp[::-1].index.is_monotonic
Out[357]: False

In [358]: tmp[::-1].index.is_monotonic_decreasing
Out[358]: True
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Just in case you are still looking for a solution. with reindex() and the help of an extra column, the rolling functions with ragged forward-looking window should be doable.

import pandas as pd
from io import StringIO

str = """dtime          value
2018-04-23 06:45:16.920   -0.11
2018-04-23 06:45:16.919   -0.03
2018-04-23 06:45:16.918   -0.01
2018-04-23 06:45:16.917   -0.02
2018-04-23 06:45:16.916    0.03
2018-04-23 06:45:16.914    0.03
2018-04-23 06:45:16.911    0.03
2018-04-23 06:45:16.910    0.06
2018-04-23 06:45:16.909    0.09
2018-04-23 06:45:16.908    0.08
2018-04-23 06:45:16.907    0.18
2018-04-23 06:45:16.906    0.28
2018-04-23 06:45:16.905    0.28
2018-04-23 06:45:16.904    0.02
2018-04-23 06:45:16.903    0.09
2018-04-23 06:45:16.902    0.09
2018-04-23 06:45:16.901    0.09
2018-04-23 06:45:16.900    0.09
2018-04-23 06:45:16.899   -0.24
2018-04-23 06:45:16.898   -0.22
2018-04-23 06:45:16.894   -0.22
2018-04-23 06:45:16.799   -0.21
2018-04-23 06:45:16.798   -0.19
2018-04-23 06:45:16.797   -0.21
2018-04-23 06:45:15.057   -0.13
2018-04-23 06:45:15.056   -0.16
2018-04-23 06:45:13.382   -0.04
2018-04-23 06:45:13.381   -0.02
2018-04-23 06:45:13.380   -0.05
2018-04-23 06:45:13.379   -0.08
"""

## read the data tmp[::-1]
df = pd.read_table(StringIO(str), sep="ss+", engine="python", index_col=["dtime"], parse_dates=['dtime'])

## reverse the data to its original order
df = df[::-1]   

## setup the offset, i.e. 10ms
offset = '10ms'

# create a new column with values as index datetime plus the window timedelta 10ms
df['dt_new'] = df.index + pd.Timedelta(offset)

# use df.index and this new column to form the new index(remove duplicates and sort the list)
idx = sorted(set([*df.index.tolist(), *df.dt_new.tolist()]))

# reindex the original dataframe and calculate the backward rolling sum
df1 = df.reindex(idx).fillna(value={'value':0}).value.rolling(offset, closed='left').sum().to_frame()

# make a LEFt join to the original dataframe. `value_y` should be the forward rolling sum
df.merge(df1, left_on='dt_new', right_index=True, how='left')
#                         value_x                  dt_new  value_y
#dtime                                                            
#2018-04-23 06:45:13.379    -0.08 2018-04-23 06:45:13.389    -0.19
#2018-04-23 06:45:13.380    -0.05 2018-04-23 06:45:13.390    -0.11
#2018-04-23 06:45:13.381    -0.02 2018-04-23 06:45:13.391    -0.06
#2018-04-23 06:45:13.382    -0.04 2018-04-23 06:45:13.392    -0.04
#2018-04-23 06:45:15.056    -0.16 2018-04-23 06:45:15.066    -0.29
#2018-04-23 06:45:15.057    -0.13 2018-04-23 06:45:15.067    -0.13
#2018-04-23 06:45:16.797    -0.21 2018-04-23 06:45:16.807    -0.61
#2018-04-23 06:45:16.798    -0.19 2018-04-23 06:45:16.808    -0.40
#2018-04-23 06:45:16.799    -0.21 2018-04-23 06:45:16.809    -0.21
#2018-04-23 06:45:16.894    -0.22 2018-04-23 06:45:16.904    -0.32
#2018-04-23 06:45:16.898    -0.22 2018-04-23 06:45:16.908     0.66
#2018-04-23 06:45:16.899    -0.24 2018-04-23 06:45:16.909     0.96
#2018-04-23 06:45:16.900     0.09 2018-04-23 06:45:16.910     1.29
#2018-04-23 06:45:16.901     0.09 2018-04-23 06:45:16.911     1.26
#2018-04-23 06:45:16.902     0.09 2018-04-23 06:45:16.912     1.20
#2018-04-23 06:45:16.903     0.09 2018-04-23 06:45:16.913     1.11
#2018-04-23 06:45:16.904     0.02 2018-04-23 06:45:16.914     1.02
#2018-04-23 06:45:16.905     0.28 2018-04-23 06:45:16.915     1.03
#2018-04-23 06:45:16.906     0.28 2018-04-23 06:45:16.916     0.75
#2018-04-23 06:45:16.907     0.18 2018-04-23 06:45:16.917     0.50
#2018-04-23 06:45:16.908     0.08 2018-04-23 06:45:16.918     0.30
#2018-04-23 06:45:16.909     0.09 2018-04-23 06:45:16.919     0.21
#2018-04-23 06:45:16.910     0.06 2018-04-23 06:45:16.920     0.09
#2018-04-23 06:45:16.911     0.03 2018-04-23 06:45:16.921    -0.08
#2018-04-23 06:45:16.914     0.03 2018-04-23 06:45:16.924    -0.11
#2018-04-23 06:45:16.916     0.03 2018-04-23 06:45:16.926    -0.14
#2018-04-23 06:45:16.917    -0.02 2018-04-23 06:45:16.927    -0.17
#2018-04-23 06:45:16.918    -0.01 2018-04-23 06:45:16.928    -0.15
#2018-04-23 06:45:16.919    -0.03 2018-04-23 06:45:16.929    -0.14
#2018-04-23 06:45:16.920    -0.11 2018-04-23 06:45:16.930    -0.11

Some Notes:

  1. The results might vary based on how you define and select the closed option when the size of the rolling window is an offset. By default closed is set to right. If shifting 'offset' is the method applied(as in this example), the rolling aggregation must be calculated with closed = left. (you might have different design though). When the window size is a fixed number, the deault closed is 'both'.

  2. The index (dtime field) should not contain duplicates, if not, idx should be de-duplicated based on two fields (dtime, value).

Potential issues:

  1. The reindex() could potentially double the number of rows at the worst scenario.
  2. Join dataframes by using a datetime field, this might not work on every system if the datetimes are saved as floating number.

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

1.4m articles

1.4m replys

5 comments

57.0k users

...