Welcome To Ask or Share your Answers For Others

python - What is as_index in groupby in pandas?

Welcome To Ask or Share your Answers For Others

1 Reply

replyed Oct 17, 2021 by 深蓝 (71.8m points)

print() is your friend when you don't understand a thing. It clears out doubts many times.

Take a look:

import pandas as pd

df = pd.DataFrame(data={'books':['bk1','bk1','bk1','bk2','bk2','bk3'], 'price': [12,12,12,15,15,17]})

print(df)

print(df.groupby('books', as_index=True).sum())

print(df.groupby('books', as_index=False).sum())

Output:

  books  price
0   bk1     12
1   bk1     12
2   bk1     12
3   bk2     15
4   bk2     15
5   bk3     17

       price
books       
bk1       36
bk2       30
bk3       17

  books  price
0   bk1     36
1   bk2     30
2   bk3     17

When as_index=True the key(s) you use in groupby() will become an index in the new dataframe.

The benefits you get when you set the column as index are:

Speed. When you filter values based on the index column eg. df.loc['bk1'], it would be faster because of hashing of index column. It doesn't have to traverse the entire books column to find 'bk1'. It will just calculate the hash value of 'bk1' and find it in 1 go.
Ease. When as_index=True you can use this syntax df.loc['bk1'] which is shorter and faster as opposed to df.loc[df.books=='bk1'] which is longer and slower.

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

...