I have searched the web and stack overflow questions but been unable to find an answer to this question. The observation that I've made is that in Python 2.7.3, if you assign two variables the same single character string, e.g.
>>> a = 'a'
>>> b = 'a'
>>> c = ' '
>>> d = ' '
Then the variables will share the same reference:
>>> a is b
True
>>> c is d
True
This is also true for some longer strings:
>>> a = 'abc'
>>> b = 'abc'
>>> a is b
True
>>> ' ' is ' '
True
>>> ' ' * 1 is ' ' * 1
True
However, there are a lot of cases where the reference is (unexpectantly) not shared:
>>> a = 'a c'
>>> b = 'a c'
>>> a is b
False
>>> c = ' '
>>> d = ' '
>>> c is d
False
>>> ' ' * 2 is ' ' * 2
False
Can someone please explain the reason for this?
I suspect there might be simplifications/substitutions made by the interpreter and/or some caching mechanism that makes use of the fact that python strings are immutable to optimize in some special cases, but what do I know? I tried making deep copies of strings using the str constructor and the copy.deepcopy function but the strings still inconsistently share references.
The reason I'm having problems with this is because I check for inequality of references to strings in some unit tests I'm writing for clone methods of new-style python classes.
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…