Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
234 views
in Technique[技术] by (71.8m points)

python - Selecting data from Pandas dataframe based on criteria stored in a dict

I have a Pandas dataframe that contains a large number of variables. This can be simplified as:

tempDF = pd.DataFrame({ 'var1': [12,12,12,12,45,45,45,51,51,51],
                        'var2': ['a','a','b','b','b','b','b','c','c','d'],
                        'var3': ['e','f','f','f','f','g','g','g','g','g'],
                        'var4': [1,2,3,3,4,5,6,6,6,7]})

If I wanted to select a subset of the dataframe (e.g. var2='b' and var4=3), I would use:

tempDF.loc[(tempDF['var2']=='b') & (tempDF['var4']==3),:]

However, is it possible to select a subset of the dataframe if the matching criteria are stored within a dict, such as:

tempDict = {'var2': 'b','var4': 3}

It's important that the variable names are not predefined and the number of variables included in the dict is changeable.

I've been puzzling over this for a while and so any suggestions would be greatly appreciated.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You can evaluate a series of conditions. They don't have to be just an equality.

df = tempDF
d = tempDict

# `repr` returns the string representation of an object.    
>>> df[eval(" & ".join(["(df['{0}'] == {1})".format(col, repr(cond)) 
       for col, cond in d.iteritems()]))]
   var1 var2 var3  var4
2    12    b    f     3
3    12    b    f     3

Looking at what eval does here:

conditions = " & ".join(["(df['{0}'] == {1})".format(col, repr(cond)) 
       for col, cond in d.iteritems()])

>>> conditions
"(df['var4'] == 3) & (df['var2'] == 'b')"

>>> eval(conditions)
0    False
1    False
2     True
3     True
4    False
5    False
6    False
7    False
8    False
9    False
dtype: bool

Here is another example using an equality constraint:

>>> eval(" & ".join(["(df['{0}'] == {1})".format(col, repr(cond)) 
                      for col, cond in d.iteritems()]))
d = {'var2': ('==', "'b'"),
     'var4': ('>', 3)}

>>> df[eval(" & ".join(["(df['{0}'] {1} {2})".format(col, cond[0], cond[1]) 
       for col, cond in d.iteritems()]))]
   var1 var2 var3  var4
4    45    b    f     4
5    45    b    g     5
6    45    b    g     6

Another alternative is to use query:

qry = " & ".join('{0} {1} {2}'.format(k, cond[0], cond[1]) for k, cond in d.iteritems())

>>> qry
"var4 > 3 & var2 == 'b'"

>>> df.query(qry)
   var1 var2 var3  var4
4    45    b    f     4
5    45    b    g     5
6    45    b    g     6

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...