Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
582 views
in Technique[技术] by (71.8m points)

python - Pandas rank by multiple columns

I am trying to rank a pandas data frame based on two columns. I can rank it based on one column, but how can to rank it based on two columns? 'SaleCount', then 'TotalRevenue'?

import pandas as pd

df = pd.DataFrame({'TotalRevenue':[300,9000,1000,750,500,2000,0,600,50,500],
    'Date':['2016-12-02' for i in range(10)],
    'SaleCount':[10,100,30,35,20,100,0,30,2,20],
    'shops':['S3','S2','S1','S5','S4','S8','S6','S7','S9','S10']})

df['Rank'] = df.SaleCount.rank(method='dense',ascending = False).astype(int)

#df['Rank'] = df.TotalRevenue.rank(method='dense',ascending = False).astype(int)
df.sort_values(['Rank'], inplace=True)

print(df)

current output:

    Date        SaleCount   TotalRevenue    shops   Rank
1   2016-12-02  100          9000            S2      1
5   2016-12-06  100          2000            S8      1
3   2016-12-04  35           750             S5      2
2   2016-12-03  30           1000            S1      3
7   2016-12-08  30           600             S7      3
9   2016-12-10  20           500             S10     4
4   2016-12-05  20           500             S4      4
0   2016-12-01  10           300             S3      5
8   2016-12-09  2            50              S9      6
6   2016-12-07  0            0               S6      7

I'm trying to generate an output like this:

    Date        SaleCount   TotalRevenue    shops   Rank
1   2016-12-02  100          9000            S2      1
5   2016-12-02  100          2000            S8      2
3   2016-12-02  35           750             S5      3
2   2016-12-02  30           1000            S1      4
7   2016-12-02  30           600             S7      5
9   2016-12-02  20           500             S10     6
4   2016-12-02  20           500             S4      6
0   2016-12-02  10           300             S3      7
8   2016-12-02  2            50              S9      8
6   2016-12-02  0            0               S6      9
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

The generic way to do that is to group the desired fiels in a tuple, whatever the types.

df["Rank"] = df[["SaleCount","TotalRevenue"]].apply(tuple,axis=1)
             .rank(method='dense',ascending=False).astype(int)

df.sort_values("Rank")

   TotalRevenue        Date  SaleCount shops  Rank
1          9000  2016-12-02        100    S2     1
5          2000  2016-12-02        100    S8     2
3           750  2016-12-02         35    S5     3
2          1000  2016-12-02         30    S1     4
7           600  2016-12-02         30    S7     5
4           500  2016-12-02         20    S4     6
9           500  2016-12-02         20   S10     6
0           300  2016-12-02         10    S3     7
8            50  2016-12-02          2    S9     8
6             0  2016-12-02          0    S6     9

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...