Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
283 views
in Technique[技术] by (71.8m points)

python - Logical Or/bitwise OR in pandas Data Frame

I am trying to use a Boolean mask to get a match from 2 different dataframes. U

Using the logical OR operator:

x = df[(df['A'].isin(df2['B']))
      or df['A'].isin(df2['C'])]

Output:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

However using the bitwise OR operator, the results are returned successfully.

x = df[(df['A'].isin(df2['B']))
      | df['A'].isin(df2['C'])]

Output: x

Is there a difference in both and would bitwise OR be the best option here? Why doesn't the logical OR work?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

As far as I have come to understand this issue (coming from a C++ background and currently learning Python for data sciences) I stumbled upon several posts suggesting that bitwise operators (&, |) can be overloaded in classes, just like C++ does.

So basically, while you may use such bitwise operators on numbers they will compare the bits and give you the result. So for instance, if you have the following:

1 | 2 # will result in 3

What Python will actually do is compare the bits of these numbers:

00000001 | 00000010

The result will be:

00000011 (because 0 | 0 is False, ergo 0; and 0 | 1 is True, ergo 1)

As an integer: 3

It compares each bit of the numbers and spit out the result of these eight consecutive operations. This is the normal behaviour of these operators.

Enter Pandas. As you can overload these operators, Pandas has made use of this. So what bitwise operators do when coming to pandas dataframes, is the following:

(dataframe1['column'] == "expression") & (dataframe1['column'] != "another expression)

In this case, first pandas will create a series of trues or falses depending on the result of the == and != operations (be careful: you have to put braces around the outer expressions because python will always try to resolve first bitwise operators and THEN the other comparision operators!!). So it will compare each value in the column to the expression and either output a true or a false.

Then you'd have two same-length series of trues and falses. What it THEN does is take these two serieses and basically compare them with either "and" (&) or "or" (|), and finally spit out one single series either fulfilling or not fulfilling all three comparision operations.

To go even further, what I think is happening under the hood is that the &-operator actually calls a function of pandas, gives them both previously evaluated operations (so the two serieses to the left and right of the operator) and pandas then compares two distinct values at a time, returning a True or False depending on the internal mechanism to determine this.

This is basically the same principle they've used for all other operators as well (>, <, >=, <=, ==, !=).

Why do the struggle and use a different &-expression when you got the nice and neat "and"? Well, that seems to be because "and" is just hard coded and cannot be altered manually.

Hope that helps!


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...