If df
has an index with no duplicate values, then you can use idxmax
to return the index of the maximum row for each group. Then use df.loc
to select the entire row:
In [322]: df.loc[df.groupby('type').votes.agg('idxmax')]
Out[322]:
name type votes
3 max cat 9
0 bob dog 10
If df.index
has duplicate values, i.e. is not a unique index, then make the index unique first:
df = df.reset_index()
then use idxmax
:
result = df.loc[df.groupby('type').votes.agg('idxmax')]
If you really need to, you can return df
to its original state:
df = df.set_index(['index'], drop=True)
but in general life is much better with a unique index.
Here is an example showing what goes wrong when df
does not have a unique
index. Suppose the index
is AABB
:
import pandas as pd
df = pd.DataFrame({'name': ['bob', 'pete', 'fluffy', 'max'],
'type': ['dog', 'cat', 'dog', 'cat'],
'votes': [10, 8, 5, 9]},
index=list('AABB'))
print(df)
# name type votes
# A bob dog 10
# A pete cat 8
# B fluffy dog 5
# B max cat 9
idxmax
returns the index values A
and B
:
print(df.groupby('type').votes.agg('idxmax'))
type
cat B
dog A
Name: votes, dtype: object
But A
and B
do not uniquely specify the desired rows. df.loc[...]
returns all rows whose index value is A
or B
:
print(df.loc[df.groupby('type').votes.agg('idxmax')])
# name type votes
# B fluffy dog 5
# B max cat 9
# A bob dog 10
# A pete cat 8
In contrast, if we reset the index:
df = df.reset_index()
# index name type votes
# 0 A bob dog 10
# 1 A pete cat 8
# 2 B fluffy dog 5
# 3 B max cat 9
then df.loc
can be used to select the desired rows:
print(df.groupby('type').votes.agg('idxmax'))
# type
# cat 3
# dog 0
# Name: votes, dtype: int64
print(df.loc[df.groupby('type').votes.agg('idxmax')])
# index name type votes
# 3 B max cat 9
# 0 A bob dog 10