Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
97 views
in Technique[技术] by (71.8m points)

python - How do I count inside a dataframe using unique values and conditional?

Take this df:

df = pd.DataFrame({'client_id':[0, 0, 0, 1, 1, 1, 2, 2, 2],
                   'key':['0_382','0_382','0_356','1_365',float('nan'),'1_365',float('nan'),'2_284','2_405'],
                   'operation':['buy','sell','sell','buy','transfer','buy','fee','buy','buy']})
   client_id    key operation
0          0  0_382       buy
1          0  0_382      sell
2          0  0_356      sell
3          1  1_365       buy
4          1    NaN  transfer
5          1  1_365       buy
6          2    NaN       fee
7          2  2_284       buy
8          2  2_405       buy

I need to create a column named pos_id that will give an incremental value (1,2,3...) for each row, for unique values of client_id and key, and using a conditional to skip transfer and fee values of operation.

The result should be like this:

   client_id    key operation pos_id
0          0  0_382       buy      1
1          0  0_382      sell      1
2          0  0_356      sell      2
3          1  1_365       buy      1
4          1    NaN  transfer    NaN
5          1  1_365       buy      1
6          2    NaN       fee    NaN
7          2  2_284       buy      1
8          2  2_405       buy      2
question from:https://stackoverflow.com/questions/65946342/how-do-i-count-inside-a-dataframe-using-unique-values-and-conditional

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Here are two ways.

The first method groups ['client_id', 'key'] to the same 'pos_id' within 'client_id' regardless of whether or not they appear consecutively.

Use where to mask the rows you want to ignore and then groupby + ngroup with sort=False will count the unique combinations. Then subtract off the min within each group to get the counter starting at 1.

s = (df.where(~df['operation'].isin(['transfer', 'fee']))
       .groupby(['client_id', 'key'], sort=False).ngroup()
       .replace(-1, np.NaN))  # ngroup makes NaN group keys -1.

df['pos_id'] = s - s.groupby(df['client_id']).transform('min') + 1

This method requires the input sorted on at least 'client_id' and then will only group the same key if they are consecutive into the same 'pos_id'. Remove the rows you want to ignore, then check for differences in each row and take the cumsum within 'client_id'

s = (df.where(~df['operation'].isin(['transfer', 'fee']))
       .dropna(how='all'))

s = s['key'].ne(s['key'].shift()) | s['client_id'].ne(s['client_id'].shift())
df['pos_id'] = s.groupby(df['client_id']).cumsum()

For your input either results in:

   client_id    key operation  pos_id
0          0  0_382       buy     1.0
1          0  0_382      sell     1.0
2          0  0_356      sell     2.0
3          1  1_365       buy     1.0
4          1    NaN  transfer     NaN
5          1  1_365       buy     1.0
6          2    NaN       fee     NaN
7          2  2_284       buy     1.0
8          2  2_405       buy     2.0

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...