print()
is your friend when you don't understand a thing. It clears out doubts many times.
Take a look:
import pandas as pd
df = pd.DataFrame(data={'books':['bk1','bk1','bk1','bk2','bk2','bk3'], 'price': [12,12,12,15,15,17]})
print(df)
print(df.groupby('books', as_index=True).sum())
print(df.groupby('books', as_index=False).sum())
Output:
books price
0 bk1 12
1 bk1 12
2 bk1 12
3 bk2 15
4 bk2 15
5 bk3 17
price
books
bk1 36
bk2 30
bk3 17
books price
0 bk1 36
1 bk2 30
2 bk3 17
When as_index=True
the key(s) you use in groupby()
will become an index in the new dataframe.
The benefits you get when you set the column as index are:
Speed. When you filter values based on the index column eg. df.loc['bk1']
, it would be faster because of hashing of index column. It doesn't have to traverse the entire books
column to find 'bk1'
. It will just calculate the hash value of 'bk1'
and find it in 1 go.
Ease. When as_index=True
you can use this syntax df.loc['bk1']
which is shorter and faster as opposed to df.loc[df.books=='bk1']
which is longer and slower.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…