While answering a question Sort a pandas's dataframe series by month name? we meet some weird behavior of groupby
.
df = pd.DataFrame([["dec", 12], ["jan", 40], ["mar", 11], ["aug", 21], ["aug", 11], ["jan", 11], ["jan", 1]], columns=["Month", "Price"])
df["Month_dig"] = pd.to_datetime(df.Month, format='%b', errors='coerce').dt.month
df.sort_values(by="Month_dig", inplace=True)
# Now df looks like
Month Price Month_dig
1 jan 40 1
5 jan 11 1
6 jan 1 1
2 mar 11 3
3 aug 21 8
4 aug 11 8
0 dec 12 12
total = (df.groupby(df['Month'])['Price'].mean())
print(total)
# output
Month
aug 16.000000
dec 12.000000
jan 17.333333
mar 11.000000
Name: Price, dtype: float64
It seems that in total
, the data is sorted alphabetically. While the OP and I were expecting
Month
jan 17.333333
mar 11.000000
aug 16.000000
dec 12.000000
Name: Price, dtype: float64
What's the mechanism behind groupby
? I know that it preserves order within each group from the documentation but is there a rule for the order among groups? It seems to me a pretty straightforward group order would be ["jan", "mar", "aug", "dec"] as the data in df
is sorted in this way.
p.s. From ["aug", "dec", "jan", "mar"], it seems these group names are sorted alphabetically.
I am using Python 3.6 and pandas '0.20.3'
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…