Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
3.9k views
in Technique[技术] by (71.8m points)

python 3.x - Populate column based on other column values using multiple if else

I'm trying to compare 4 columns in a pandas dataframe and populate 5'th column based on the result. In normal SQL it would be something like this:

if speciality_new is null and location_new is null then 'No match found'
elif specialty <> specialty_new and location <> location_new then 'both are different'
elif specialty_new is null then 'specialty not found'
elif location_new is null then 'location not found'
else 'true'

I read that this could be achieved using np.where but my code is failing. Can someone please advice me what I'm doing wrong. This is what I wrote:

masterDf['Match'] = np.where(
    masterDf[speciality_new].isnull() & masterDf[location_new].isnull(), 'No match found',
    masterDf[speciality] != masterDf[speciality_new] & masterDf[location] != masterDf[location_new], 'Both specialty and location didnt match',
    masterDf[speciality] != masterDf[speciality_new], 'Specialty didnt match',
    masterDf[location] != masterDf[location_new], 'Location didnt match',
    True)

The error message is TypeError: unsupported operand type(s) for &: 'str' and 'str' which doesn't make any sense since '&' is the syntax for 'and'

dfsample is what I have and dfFinal is what I want

dfsample = pd.DataFrame({'ID': [1, 2, 3, 4, 5],
       'speciality': ['doctor', 'nurse', 'patient', 'driver', 'director'],
       'location': ['texas', 'dc', 'georgia', '', 'florida'],
       'speciality_new' : ['doctor', 'nurse', 'director', 'nurse', ''],
       'location_new': ['texas', 'alaska', 'georgia', 'maryland', 'florida']})

dfFinal = pd.DataFrame({'ID': [1, 2, 3, 4, 5],
       'speciality': ['doctor', 'nurse', 'patient', 'driver', 'director'],
       'location': ['texas', 'dc', 'georgia', '', 'florida'],
       'speciality_new' : ['doctor', 'nurse', 'director', 'nurse', ''],
       'location_new': ['texas', 'alaska', 'georgia', 'maryland', 'florida'],
       'match': ['TRUE', 'location didn’t match', 'specialty didn’t match', 'both specialty and location didn’t match', 'specialty didn’t match']})

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

For analyzing multiple conditions using numpy, you're better using numpy.select, where you should specify the conditions, the expected output for each condition, and a default output, just like an if-elif-else statement:

import numpy as np

condlist = [
    dfsample['speciality_new'].isnull() & dfsample['location_new'].isnull(),
    dfsample['speciality'].ne(dfsample['speciality_new']) & 
    dfsample['location'].ne(dfsample['location_new']),
    dfsample['speciality'].ne(dfsample['speciality_new']),
    dfsample['location'].ne(dfsample['location_new']),
]

choicelist = [
    'No match found',
    'Both specialty and location didnt match',
    'Specialty didnt match',
    'Location didnt match'
]

dfsample['match'] = np.select(condlist, choicelist, default=True)
print(dfsample)

where ne stands by "not equal" (you can simply use !=).


Output:

   ID speciality location speciality_new location_new                                    match
0   1     doctor    texas         doctor        texas                                     True
1   2      nurse       dc          nurse       alaska                     Location didnt match
2   3    patient  georgia       director      georgia                    Specialty didnt match
3   4     driver                   nurse     maryland  Both specialty and location didnt match
4   5   director  florida                     florida                    Specialty didnt match

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...