python - Pandas: How to sum columns based on conditional of other column values?

Question

Welcome To Ask or Share your Answers For Others

python - Pandas: How to sum columns based on conditional of other column values?

posted Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - Pandas: How to sum columns based on conditional of other column values?

I have the following pandas DataFrame.

import pandas as pd
df = pd.read_csv('filename.csv')

print(df)

     dog      A         B           C
0     dog1    0.787575  0.159330    0.053095
1     dog10   0.770698  0.169487    0.059815
2     dog11   0.792689  0.152043    0.055268
3     dog12   0.785066  0.160361    0.054573
4     dog13   0.795455  0.150464    0.054081
5     dog14   0.794873  0.150700    0.054426
..    ....
8     dog19   0.811585  0.140207    0.048208
9     dog2    0.797202  0.152033    0.050765
10    dog20   0.801607  0.145137    0.053256
11    dog21   0.792689  0.152043    0.055268
    ....

I create a new column by summing columns "A", "B", "C" as follows:

df['total_ABC'] = df[["A", "B", "B"]].sum(axis=1)

Now I would like to do this based on a conditional, i.e. if "A" < 0.78 then create a new summed column df['smallA_sum'] = df[["A", "B", "B"]].sum(axis=1). Otherwise, the value should be zero.

How does one create conditional statements like this?

My thought would be to use

df['smallA_sum'] = df1.apply(lambda row: (row['A']+row['B']+row['C']) if row['A'] < 0.78))

However, this doesn't work and I'm not able to specify axis.

How do you create a column based on the values of other columns?

You could also do something like for each df['dog'] == 'dog2', create column dog2_sum, i.e.

 df['dog2_sum'] = df1.apply(lambda row: (row['A']+row['B']+row['C']) if df['dog'] == 'dog2'))

but my approach is incorrect.

`

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-23T18:34:50+0000

The following should work, here we mask the df where the condition is met, this will set NaN to the rows where the condition isn't met so we call fillna on the new col:

In [67]:
df = pd.DataFrame(np.random.randn(5,3), columns=list('ABC'))
df

Out[67]:
          A         B         C
0  0.197334  0.707852 -0.443475
1 -1.063765 -0.914877  1.585882
2  0.899477  1.064308  1.426789
3 -0.556486 -0.150080 -0.149494
4 -0.035858  0.777523 -0.453747

In [73]:    
df['total'] = df.loc[df['A'] > 0,['A','B']].sum(axis=1)
df['total'].fillna(0, inplace=True)
df

Out[73]:
          A         B         C     total
0  0.197334  0.707852 -0.443475  0.905186
1 -1.063765 -0.914877  1.585882  0.000000
2  0.899477  1.064308  1.426789  1.963785
3 -0.556486 -0.150080 -0.149494  0.000000
4 -0.035858  0.777523 -0.453747  0.000000

Another approach is to call where on the sum result, this takes a value param to return when the condition isn't met:

In [75]:
df['total'] = df[['A','B']].sum(axis=1).where(df['A'] > 0, 0)
df

Out[75]:
          A         B         C     total
0  0.197334  0.707852 -0.443475  0.905186
1 -1.063765 -0.914877  1.585882  0.000000
2  0.899477  1.064308  1.426789  1.963785
3 -0.556486 -0.150080 -0.149494  0.000000
4 -0.035858  0.777523 -0.453747  0.000000

Categories

python - Pandas: How to sum columns based on conditional of other column values?

python - Pandas: How to sum columns based on conditional of other column values?

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags