python - Vectorized way to store group name (from groupby) into a new column of the original DataFrame?

Question

Welcome To Ask or Share your Answers For Others

python - Vectorized way to store group name (from groupby) into a new column of the original DataFrame?

posted Oct 7, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - Vectorized way to store group name (from groupby) into a new column of the original DataFrame?

Having a DataFrame with a timestamp column, thanks to groupby, pd.Grouper and a for loop, I am able to group rows by periods and keep track of the group label in the original DataFrame.

For instance, considering following DataFrame, and periods of 2 hours:

import pandas as pd
df1 = pd.DataFrame({'humidity': [0.3, 0.8, 0.9],
                    'pressure': [1e5, 1.1e5, 0.95e5],
                    'location': ['Paris', 'Paris', 'Milan']},
                    index = [pd.Timestamp('2020/01/02 01:59:00'),
                             pd.Timestamp('2020/01/02 03:59:00'),
                             pd.Timestamp('2020/01/02 02:59:00')])
grps = df1.groupby(pd.Grouper(freq='2H', origin='start_day'))
for gr in grps:
    df1.loc[gr[1].index,'grp'] = gr[0]

Result is then:

df1
Out[23]: 
                     humidity  pressure location                 grp
2020-01-02 01:59:00       0.3  100000.0    Paris 2020-01-02 00:00:00
2020-01-02 03:59:00       0.8  110000.0    Paris 2020-01-02 02:00:00
2020-01-02 02:59:00       0.9   95000.0    Milan 2020-01-02 02:00:00

Intending to manage large Datasets, I wonder if there is not a way to get rid of this for loop? Is there a function or a parameter in groupby to retrieve the original DataFrame, only with a new column embedding the name of the label?

Thanks for your help. Bests,

question from:https://stackoverflow.com/questions/65951840/vectorized-way-to-store-group-name-from-groupby-into-a-new-column-of-the-origi

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-06T19:02:21+0000

Use GroupBy.transform with any column name:

grps = df1.groupby(pd.Grouper(freq='2H', origin='start_day'))
for gr in grps:
    print (gr)
    df1.loc[gr[1].index,'grp'] = gr[0]

df1['new'] = grps['humidity'].transform(lambda x: x.name)
print (df1)
                     humidity  pressure location                 grp  
2020-01-02 01:59:00       0.3  100000.0    Paris 2020-01-02 00:00:00   
2020-01-02 03:59:00       0.8  110000.0    Paris 2020-01-02 02:00:00   
2020-01-02 02:59:00       0.9   95000.0    Milan 2020-01-02 02:00:00   

                                    new  
2020-01-02 01:59:00 2020-01-02 00:00:00  
2020-01-02 03:59:00 2020-01-02 02:00:00  
2020-01-02 02:59:00 2020-01-02 02:00:00

Categories

python - Vectorized way to store group name (from groupby) into a new column of the original DataFrame?

python - Vectorized way to store group name (from groupby) into a new column of the original DataFrame?

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags