python pandas conditional cumulative sum

Question

Welcome To Ask or Share your Answers For Others

python pandas conditional cumulative sum

posted Oct 17, 2021 in Technique[技术] by 深蓝 (71.8m points)

python pandas conditional cumulative sum

Consider my dataframe df

data  data_binary  sum_data
  2       1            1
  5       0            0
  1       1            1
  4       1            2
  3       1            3
  10      0            0
  7       0            0
  3       1            1

I want to calculate the cumulative sum of data_binary within groups of contiguous 1 values.

The first group of 1's had a single 1 and sum_data has only a 1. However, the second group of 1's has 3 1's and sum_data is [1, 2, 3].

I've tried using np.where(df['data_binary'] == 1, df['data_binary'].cumsum(), 0) but that returns

array([1, 0, 2, 3, 4, 0, 0, 5])

Which is not what I want.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-17T02:58:04+0000

you want to take the cumulative sum of data_binary and subtract the most recent cumulative sum where data_binary was zero.

b = df.data_binary
c = b.cumsum()
c.sub(c.mask(b != 0).ffill(), fill_value=0).astype(int)

0    1
1    0
2    1
3    2
4    3
5    0
6    0
7    1
Name: data_binary, dtype: int64

Explanation

Let's start by looking at each step side by side

cols = ['data_binary', 'cumulative_sum', 'nan_non_zero', 'forward_fill', 'final_result']
print(pd.concat([
        b, c,
        c.mask(b != 0),
        c.mask(b != 0).ffill(),
        c.sub(c.mask(b != 0).ffill(), fill_value=0).astype(int)
    ], axis=1, keys=cols))


   data_binary  cumulative_sum  nan_non_zero  forward_fill  final_result
0            1               1           NaN           NaN             1
1            0               1           1.0           1.0             0
2            1               2           NaN           1.0             1
3            1               3           NaN           1.0             2
4            1               4           NaN           1.0             3
5            0               4           4.0           4.0             0
6            0               4           4.0           4.0             0
7            1               5           NaN           4.0             1

The problem with cumulative_sum is that the rows where data_binary is zero, do not reset the sum. And that is the motivation for this solution. How do we "reset" the sum when data_binary is zero? Easy! I slice the cumulative sum where data_binary is zero and forward fill the values. When I take the difference between this and the cumulative sum, I've effectively reset the sum.

Categories

python pandas conditional cumulative sum

python pandas conditional cumulative sum

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags