Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
502 views
in Technique[技术] by (71.8m points)

python - Comparing two columns in pandas dataframe to create a third one

I have a following dataframe:

In [25]: df1
Out[25]: 
          a         b
0  0.752072  0.813426
1  0.868841  0.354665
2  0.944651  0.745505
3  0.485834  0.163747
4  0.001487  0.820176
5  0.904039  0.136355
6  0.572265  0.250570
7  0.514955  0.868373
8  0.195440  0.484160
9  0.506443  0.523912

Now I want to create another column df1['c'] whose values would be maximum among df1['a'] and df1['b']. Thus, I would like to have this as an output:

In [25]: df1
Out[25]: 
          a         b        c
0  0.752072  0.813426 0.813426
1  0.868841  0.354665 0.868841
2  0.944651  0.745505 0.944651
3  0.485834  0.163747 0.485834
4  0.001487  0.820176 0.820176

I tried :

In [23]: df1['c'] = np.where(max(df1['a'], df1['b'], df1['a'], df1['b'])

However, this throws a syntax error. I don't see any way in which I can do this in pandas. My actual dataframe is way too complex and so I would like to have a generic solution for this. Any ideas?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You can use Series.where:

df['c'] = df.b.where(df.a < df.b, df.a)
print (df)
          a         b         c
0  0.752072  0.813426  0.813426
1  0.868841  0.354665  0.868841
2  0.944651  0.745505  0.944651
3  0.485834  0.163747  0.485834
4  0.001487  0.820176  0.820176
5  0.904039  0.136355  0.904039
6  0.572265  0.250570  0.572265
7  0.514955  0.868373  0.868373
8  0.195440  0.484160  0.484160
9  0.506443  0.523912  0.523912

Solution with numpy.where:

df['c'] = np.where(df['a'] > df['b'], df['a'], df['b'])
print (df)
          a         b         c
0  0.752072  0.813426  0.813426
1  0.868841  0.354665  0.868841
2  0.944651  0.745505  0.944651
3  0.485834  0.163747  0.485834
4  0.001487  0.820176  0.820176
5  0.904039  0.136355  0.904039
6  0.572265  0.250570  0.572265
7  0.514955  0.868373  0.868373
8  0.195440  0.484160  0.484160
9  0.506443  0.523912  0.523912

Or simplier is find max:

df['c'] = df[['a','b']].max(axis=1)
print (df)
          a         b         c
0  0.752072  0.813426  0.813426
1  0.868841  0.354665  0.868841
2  0.944651  0.745505  0.944651
3  0.485834  0.163747  0.485834
4  0.001487  0.820176  0.820176
5  0.904039  0.136355  0.904039
6  0.572265  0.250570  0.572265
7  0.514955  0.868373  0.868373
8  0.195440  0.484160  0.484160
9  0.506443  0.523912  0.523912

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...