Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
241 views
in Technique[技术] by (71.8m points)

python - If a pandas df column has a specific value, another column only allow a list of values

I have a dataframe that has a department, its function, its subfunction and its sub-subfunction. A example of the dataframe would be this:

d = [['Dept1'  ,               'HR'  ,        'Talent'     ,     'Good Employee 3'],
    ['Dept2'  ,               'IT'   ,      'Garbage'  ,  'Analysis HR 2'],
       ['Dept3'   ,              'IT'  ,        'Tech Sup' ,           'IT Tech 2'],
      [ 'Dept4'   ,              'HR'  ,        'Hardware' ,        'Trash Can' ] ,             
      [ 'Dept5'   ,              'MKT'  ,         'Sales'  ,         'Facebook Promo 1'],
       ['Dept6'   ,              'MKT'  ,       'Communication',     'Car profit']]
df = pd.DataFrame(d, columns=['Department', 'Function' ,     'Subfunction' ,  'Sub-subfunction'])

       Department           Function      Subfunction   Sub-subfunction
0        Dept1                 HR          Talent          Good Employee 3
1        Dept2                 IT         Garbage     Analysis HR 2
2        Dept3                 IT          Tech Sup            IT Tech 2
3        Dept4                 HR          Hardware         Trash Can               
4        Dept5                 MKT           Sales           Facebook Promo 1
5        Dept6                 MKT         Communication     Car profit

I need to create a rule that would check if a department has a certain value in function, it only allow a possible list of values in subfunction. Then, in subfunction the same, each unique value would allow only a possible list of values in sub-subfunction.

The mapping would be the following:

subfunction = {'HR': ['Talent', 'Analysis Human'],
              'IT': ['Tech Sup', 'Hardware'],
              'MKT': ['Sales', 'Communication']}

sub_subfunction = {'Talent': ['Good Employee 1', 'Good Employee 2', 'Good Employee 3'],
                   'Analysis Human': [ 'Analysis HR 1', 'Analysis HR 2', 'Analysis HR 3'],
                   'Tech Sup': ['IT Tech 1', 'IT Tech 2', 'IT Tech 3', 'Tech Master'],
                   'Hardware': ['PC pieces', 'Phone pieces'],
                   'Sales': ['Car profit', 'Bolt profit'],
                   'Communication': ['Facebook Promo 1', 'Instagram Promo 1']}

In this dataframe, this function would return the ones that do not obey this rule, in this example it would return:

       Department           Function      Subfunction   Sub-subfunction
1        Dept2                 IT         Garbage    Analysis HR 2
3        Dept4                 HR          Hardware         Trash Can               
4        Dept5                 MKT           Sales           Facebook Promo 1
5        Dept6                 MKT         Communication     Car profit

What would be the best way to apply those rules? As you can see, the values of the dataframe can be outside the values of the mapped dictionary (although I can have the "Function" values all mapped in the subfunction dictionary if that helps too much).

If it also can help, it could be realized in steps. First using a condition to deal with the Function/Subfunction mapping and then dealing separetly with each condition of subfunction / sub-subfunction (although really not ideal)

Thank you for the support!

question from:https://stackoverflow.com/questions/65860489/if-a-pandas-df-column-has-a-specific-value-another-column-only-allow-a-list-of

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)
subfunction = {'HR': ['Talent', 'Analysis Human'],
              'IT': ['Tech Sup', 'Hardware'],
              'MKT': ['Sales', 'Communication']}
sub_subfunction = {'Talent': ['Good Employee 1', 'Good Employee 2', 'Good Employee 3'],
                   'Analysis Human': [ 'Analysis HR 1', 'Analysis HR 2', 'Analysis HR 3'],
                   'Tech Sup': ['IT Tech 1', 'IT Tech 2', 'IT Tech 3', 'Tech Master'],
                   'Hardware': ['PC pieces', 'Phone pieces'],
                   'Sales': ['Car profit', 'Bolt profit'],
                   'Communication': ['Facebook Promo 1', 'Instagram Promo 1']}

df[~((df.apply(lambda x: x[2] in subfunction[x[1]], axis=1)) & (df.apply(lambda x: x[3] in sub_subfunction[x[2]], axis=1)))]

output

Department  Function    Subfunction Sub-subfunction
1   Dept2   IT  Analysis Human  Analysis HR 2
3   Dept4   HR  Hardware    Good Employee 1
4   Dept5   MKT Sales   Facebook Promo 1
5   Dept6   MKT Communication   Car profit

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

1.4m articles

1.4m replys

5 comments

57.0k users

...