It seems like you need two boolean masks: one to determine the breaks between groups, and one to determine which dates are in a group in the first place.
There's also one tricky part that can be fleshed out by example. Notice that df
below contains an added row that doesn't have any consecutive dates before or after it.
>>> df
DateAnalyzed Val
1 2018-03-18 0.470253
2 2018-03-19 0.470253
3 2018-03-20 0.470253
4 2017-01-20 0.485949 # < watch out for this
5 2018-09-25 0.467729
6 2018-09-26 0.467729
7 2018-09-27 0.467729
>>> df.dtypes
DateAnalyzed datetime64[ns]
Val float64
dtype: object
The answer below assumes that you want to ignore 2017-01-20
completely, without processing it. (See end of answer for a solution if you do want to process this date.)
First:
>>> dt = df['DateAnalyzed']
>>> day = pd.Timedelta('1d')
>>> in_block = ((dt - dt.shift(-1)).abs() == day) | (dt.diff() == day)
>>> in_block
1 True
2 True
3 True
4 False
5 True
6 True
7 True
Name: DateAnalyzed, dtype: bool
Now, in_block
will tell you which dates are in a "consecutive" block, but it won't tell you to which groups each date belongs.
The next step is to derive the groupings themselves:
>>> filt = df.loc[in_block]
>>> breaks = filt['DateAnalyzed'].diff() != day
>>> groups = breaks.cumsum()
>>> groups
1 1
2 1
3 1
5 2
6 2
7 2
Name: DateAnalyzed, dtype: int64
Then you can call df.groupby(groups)
with your operation of choice.
>>> for _, frame in filt.groupby(groups):
... print(frame, end='
')
...
DateAnalyzed Val
1 2018-03-18 0.470253
2 2018-03-19 0.470253
3 2018-03-20 0.470253
DateAnalyzed Val
5 2018-09-25 0.467729
6 2018-09-26 0.467729
7 2018-09-27 0.467729
To incorporate this back into df
, assign to it and the isolated dates will be NaN
:
>>> df['groups'] = groups
>>> df
DateAnalyzed Val groups
1 2018-03-18 0.470253 1.0
2 2018-03-19 0.470253 1.0
3 2018-03-20 0.470253 1.0
4 2017-01-20 0.485949 NaN
5 2018-09-25 0.467729 2.0
6 2018-09-26 0.467729 2.0
7 2018-09-27 0.467729 2.0
If you do want to include the "lone" date, things become a bit more straightforward:
dt = df['DateAnalyzed']
day = pd.Timedelta('1d')
breaks = dt.diff() != day
groups = breaks.cumsum()