Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
305 views
in Technique[技术] by (71.8m points)

python - Vectorized way to store group name (from groupby) into a new column of the original DataFrame?

Having a DataFrame with a timestamp column, thanks to groupby, pd.Grouper and a for loop, I am able to group rows by periods and keep track of the group label in the original DataFrame.

For instance, considering following DataFrame, and periods of 2 hours:

import pandas as pd
df1 = pd.DataFrame({'humidity': [0.3, 0.8, 0.9],
                    'pressure': [1e5, 1.1e5, 0.95e5],
                    'location': ['Paris', 'Paris', 'Milan']},
                    index = [pd.Timestamp('2020/01/02 01:59:00'),
                             pd.Timestamp('2020/01/02 03:59:00'),
                             pd.Timestamp('2020/01/02 02:59:00')])
grps = df1.groupby(pd.Grouper(freq='2H', origin='start_day'))
for gr in grps:
    df1.loc[gr[1].index,'grp'] = gr[0]

Result is then:

df1
Out[23]: 
                     humidity  pressure location                 grp
2020-01-02 01:59:00       0.3  100000.0    Paris 2020-01-02 00:00:00
2020-01-02 03:59:00       0.8  110000.0    Paris 2020-01-02 02:00:00
2020-01-02 02:59:00       0.9   95000.0    Milan 2020-01-02 02:00:00

Intending to manage large Datasets, I wonder if there is not a way to get rid of this for loop? Is there a function or a parameter in groupby to retrieve the original DataFrame, only with a new column embedding the name of the label?

Thanks for your help. Bests,

question from:https://stackoverflow.com/questions/65951840/vectorized-way-to-store-group-name-from-groupby-into-a-new-column-of-the-origi

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Use GroupBy.transform with any column name:

grps = df1.groupby(pd.Grouper(freq='2H', origin='start_day'))
for gr in grps:
    print (gr)
    df1.loc[gr[1].index,'grp'] = gr[0]

df1['new'] = grps['humidity'].transform(lambda x: x.name)
print (df1)
                     humidity  pressure location                 grp  
2020-01-02 01:59:00       0.3  100000.0    Paris 2020-01-02 00:00:00   
2020-01-02 03:59:00       0.8  110000.0    Paris 2020-01-02 02:00:00   
2020-01-02 02:59:00       0.9   95000.0    Milan 2020-01-02 02:00:00   

                                    new  
2020-01-02 01:59:00 2020-01-02 00:00:00  
2020-01-02 03:59:00 2020-01-02 02:00:00  
2020-01-02 02:59:00 2020-01-02 02:00:00  

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...