SQL-like window functions in PANDAS: Row Numbering in Python Pandas Dataframe

Question

Welcome To Ask or Share your Answers For Others

SQL-like window functions in PANDAS: Row Numbering in Python Pandas Dataframe

posted Oct 17, 2021 in Technique[技术] by 深蓝 (71.8m points)

SQL-like window functions in PANDAS: Row Numbering in Python Pandas Dataframe

I come from a sql background and I use the following data processing step frequently:

Partition the table of data by one or more fields
For each partition, add a rownumber to each of its rows that ranks the row by one or more other fields, where the analyst specifies ascending or descending

EX:

df = pd.DataFrame({'key1' : ['a','a','a','b','a'],
           'data1' : [1,2,2,3,3],
           'data2' : [1,10,2,3,30]})
df
     data1        data2     key1    
0    1            1         a           
1    2            10        a        
2    2            2         a       
3    3            3         b       
4    3            30        a

I'm looking for how to do the PANDAS equivalent to this sql window function:

RN = ROW_NUMBER() OVER (PARTITION BY Key1 ORDER BY Data1 ASC, Data2 DESC)


    data1        data2     key1    RN
0    1            1         a       1    
1    2            10        a       2 
2    2            2         a       3
3    3            3         b       1
4    3            30        a       4

I've tried the following which I've gotten to work where there are no 'partitions':

def row_number(frame,orderby_columns, orderby_direction,name):
    frame.sort_index(by = orderby_columns, ascending = orderby_direction, inplace = True)
    frame[name] = list(xrange(len(frame.index)))

I tried to extend this idea to work with partitions (groups in pandas) but the following didn't work:

df1 = df.groupby('key1').apply(lambda t: t.sort_index(by=['data1', 'data2'], ascending=[True, False], inplace = True)).reset_index()

def nf(x):
    x['rn'] = list(xrange(len(x.index)))

df1['rn1'] = df1.groupby('key1').apply(nf)

But I just got a lot of NaNs when I do this.

Ideally, there'd be a succinct way to replicate the window function capability of sql (i've figured out the window based aggregates...that's a one liner in pandas)...can someone share with me the most idiomatic way to number rows like this in PANDAS?

Question&Answers:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-16T22:35:21+0000

you can also use sort_values(), groupby() and finally cumcount() + 1:

df['RN'] = df.sort_values(['data1','data2'], ascending=[True,False]) 
             .groupby(['key1']) 
             .cumcount() + 1
print(df)

yields:

   data1  data2 key1  RN
0      1      1    a   1
1      2     10    a   2
2      2      2    a   3
3      3      3    b   1
4      3     30    a   4

PS tested with pandas 0.18

Categories

SQL-like window functions in PANDAS: Row Numbering in Python Pandas Dataframe

SQL-like window functions in PANDAS: Row Numbering in Python Pandas Dataframe

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags