Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
258 views
in Technique[技术] by (71.8m points)

python - 'is' operator behaves differently when comparing strings with spaces

I've started learning Python (python 3.3) and I was trying out the is operator. I tried this:

>>> b = 'is it the space?'
>>> a = 'is it the space?'
>>> a is b
False
>>> c = 'isitthespace'
>>> d = 'isitthespace'
>>> c is d
True
>>> e = 'isitthespace?'
>>> f = 'isitthespace?'
>>> e is f
False

It seems like the space and the question mark make the is behave differently. What's going on?

EDIT: I know I should be using ==, I just wanted to know why is behaves like this.

Question&Answers:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Warning: this answer is about the implementation details of a specific python interpreter. comparing strings with is==bad idea.

Well, at least for cpython3.4/2.7.3, the answer is "no, it is not the whitespace". Not only the whitespace:

  • Two string literals will share memory if they are either alphanumeric or reside on the same block (file, function, class or single interpreter command)

  • An expression that evaluates to a string will result in an object that is identical to the one created using a string literal, if and only if it is created using constants and binary/unary operators, and the resulting string is shorter than 21 characters.

  • Single characters are unique.

Examples

Alphanumeric string literals always share memory:

>>> x='aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
>>> y='aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
>>> x is y
True

Non-alphanumeric string literals share memory if and only if they share the enclosing syntactic block:

(interpreter)

>>> x='`!@#$%^&*() ][=-. >:"?<a'; y='`!@#$%^&*() ][=-. >:"?<a';
>>> z='`!@#$%^&*() ][=-. >:"?<a';
>>> x is y
True 
>>> x is z
False 

(file)

x='`!@#$%^&*() ][=-. >:"?<a';
y='`!@#$%^&*() ][=-. >:"?<a';
z=(lambda : '`!@#$%^&*() ][=-. >:"?<a')()
print(x is y)
print(x is z)

Output: True and False

For simple binary operations, the compiler is doing very simple constant propagation (see peephole.c), but with strings it does so only if the resulting string is shorter than 21 charcters. If this is the case, the rules mentioned earlier are in force:

>>> 'a'*10+'a'*10 is 'a'*20
True
>>> 'a'*21 is 'a'*21
False
>>> 'aaaaaaaaaaaaaaaaaaaaa' is 'aaaaaaaa' + 'aaaaaaaaaaaaa'
False
>>> t=2; 'a'*t is 'aa'
False
>>> 'a'.__add__('a') is 'aa'
False
>>> x='a' ; x+='a'; x is 'aa'
False

Single characters always share memory, of course:

>>> chr(0x20) is ' '
True

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...