Having a DataFrame with a timestamp column, thanks to groupby
, pd.Grouper
and a for
loop, I am able to group rows by periods and keep track of the group label in the original DataFrame.
For instance, considering following DataFrame, and periods of 2 hours:
import pandas as pd
df1 = pd.DataFrame({'humidity': [0.3, 0.8, 0.9],
'pressure': [1e5, 1.1e5, 0.95e5],
'location': ['Paris', 'Paris', 'Milan']},
index = [pd.Timestamp('2020/01/02 01:59:00'),
pd.Timestamp('2020/01/02 03:59:00'),
pd.Timestamp('2020/01/02 02:59:00')])
grps = df1.groupby(pd.Grouper(freq='2H', origin='start_day'))
for gr in grps:
df1.loc[gr[1].index,'grp'] = gr[0]
Result is then:
df1
Out[23]:
humidity pressure location grp
2020-01-02 01:59:00 0.3 100000.0 Paris 2020-01-02 00:00:00
2020-01-02 03:59:00 0.8 110000.0 Paris 2020-01-02 02:00:00
2020-01-02 02:59:00 0.9 95000.0 Milan 2020-01-02 02:00:00
Intending to manage large Datasets, I wonder if there is not a way to get rid of this for
loop? Is there a function or a parameter in groupby
to retrieve the original DataFrame, only with a new column embedding the name of the label?
Thanks for your help.
Bests,
question from:
https://stackoverflow.com/questions/65951840/vectorized-way-to-store-group-name-from-groupby-into-a-new-column-of-the-origi 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…