NaN is handled perfectly when I check for its presence in a list or a set. But I don't understand how. [UPDATE: no it's not; it is reported as present if the identical instance of NaN is found; if only non-identical instances of NaN are found, it is reported as absent.]
I thought presence in a list is tested by equality, so I expected NaN to not be found since NaN != NaN.
hash(NaN) and hash(0) are both 0. How do dictionaries and sets tell NaN and 0 apart?
Is it safe to check for NaN presence in an arbitrary container using in
operator? Or is it implementation dependent?
My question is about Python 3.2.1; but if there are any changes existing/planned in future versions, I'd like to know that too.
NaN = float('nan')
print(NaN != NaN) # True
print(NaN == NaN) # False
list_ = (1, 2, NaN)
print(NaN in list_) # True; works fine but how?
set_ = {1, 2, NaN}
print(NaN in set_) # True; hash(NaN) is some fixed integer, so no surprise here
print(hash(0)) # 0
print(hash(NaN)) # 0
set_ = {1, 2, 0}
print(NaN in set_) # False; works fine, but how?
Note that if I add an instance of a user-defined class to a list
, and then check for containment, the instance's __eq__
method is called (if defined) - at least in CPython. That's why I assumed that list
containment is tested using operator ==
.
EDIT:
Per Roman's answer, it would seem that __contains__
for list
, tuple
, set
, dict
behaves in a very strange way:
def __contains__(self, x):
for element in self:
if x is element:
return True
if x == element:
return True
return False
I say 'strange' because I didn't see it explained in the documentation (maybe I missed it), and I think this is something that shouldn't be left as an implementation choice.
Of course, one NaN object may not be identical (in the sense of id
) to another NaN object. (This not really surprising; Python doesn't guarantee such identity. In fact, I never saw CPython share an instance of NaN created in different places, even though it shares an instance of a small number or a short string.) This means that testing for NaN presence in a built-in container is undefined.
This is very dangerous, and very subtle. Someone might run the very code I showed above, and incorrectly conclude that it's safe to test for NaN membership using in
.
I don't think there is a perfect workaround to this issue. One, very safe approach, is to ensure that NaN's are never added to built-in containers. (It's a pain to check for that all over the code...)
Another alternative is watch out for cases where in
might have NaN on the left side, and in such cases, test for NaN membership separately, using math.isnan()
. In addition, other operations (e.g., set intersection) need to also be avoided or rewritten.
See Question&Answers more detail:
os