pd.TimeGrouper()
was formally deprecated in pandas v0.21.0 in favor of pd.Grouper()
.
The best use of pd.Grouper()
is within groupby()
when you're also grouping on non-datetime-columns. If you just need to group on a frequency, use resample()
.
For example, say you have:
>>> import pandas as pd
>>> import numpy as np
>>> np.random.seed(444)
>>> df = pd.DataFrame({'a': np.random.choice(['x', 'y'], size=50),
'b': np.random.rand(50)},
index=pd.date_range('2010', periods=50))
>>> df.head()
a b
2010-01-01 y 0.959568
2010-01-02 x 0.784837
2010-01-03 y 0.745148
2010-01-04 x 0.965686
2010-01-05 y 0.654552
You could do:
>>> # `a` is dropped because it is non-numeric
>>> df.groupby(pd.Grouper(freq='M')).sum()
b
2010-01-31 18.5123
2010-02-28 7.7670
But the above is a little unnecessary because you're only grouping on the index. Instead you could do:
>>> df.resample('M').sum()
b
2010-01-31 16.168086
2010-02-28 9.433712
to produce the same result.
Conversely, here's a case where Grouper()
would be useful:
>>> df.groupby([pd.Grouper(freq='M'), 'a']).sum()
b
a
2010-01-31 x 8.9452
y 9.5671
2010-02-28 x 4.2522
y 3.5148
For some more detail, take a look at Chapter 7 of Ted Petrou's Pandas Cookbook.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…