Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
108 views
in Technique[技术] by (71.8m points)

python - Why does the "is" keyword have a different behavior when there is a dot in the string?

Consider this code:

>>> x = "google"
>>> x is "google"
True
>>> x = "google.com"
>>> x is "google.com"
False
>>>

Why is it like that?

To make sure the above is correct, I have just tested on Python 2.5.4, 2.6.5, 2.7b2, Python 3.1 on windows and Python 2.7b1 on Linux.

It looks like there is consistency across all of them, so it's by design. Am I missing something?

I just find it out that from some of my personal domain filtering script failing with that.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

is verifies object identity, and any implementation of Python, when it meets literal of immutable types, is perfectly free to either make a new object of that immutable type, or seek through existing objects of that type to see if some of them could be reused (by adding a new reference to the same underlying object). This is a pragmatic choice of optimization and not subject to semantic constraints, so your code should never rely on which path a give implementation may take (or it could break with a bugfix/optimization release of Python!).

Consider for example:

>>> import dis
>>> def f():
...   x = 'google.com'
...   return x is 'google.com'
... 
>>> dis.dis(f)
  2           0 LOAD_CONST               1 ('google.com')
              3 STORE_FAST               0 (x)

  3           6 LOAD_FAST                0 (x)
              9 LOAD_CONST               1 ('google.com')
             12 COMPARE_OP               8 (is)
             15 RETURN_VALUE    

so in this particular implementation, within a function, your observation does not apply and only one object is made for the literal (any literal), and, indeed:

>>> f()
True

Pragmatically that's because within a function making a pass through the local table of constants (to save some memory by not making multiple constant immutable objects where one suffices) is pretty cheap and fast, and may offer good performance returns since the function may be called repeatedly afterwards.

But, the very same implementation, at the interactive prompt (Edit: I originally thought this would also happen at a module's top level, but a comment by @Thomas set me right, see later):

>>> x = 'google.com'
>>> y = 'google.com'
>>> id(x), id(y)
(4213000, 4290864)

does NOT bother trying to save memory that way -- the ids are different, i.e., distinct objects. There are potentially higher costs and lower returns and so the heuristics of this implementation's optimizer tell it to not bother searching and just go ahead.

Edit: at module top level, per @Thomas' observation, given e.g.:

$ cat aaa.py
x = 'google.com'
y = 'google.com'
print id(x), id(y)

again we see the table-of-constants-based memory-optimization in this implementation:

>>> import aaa
4291104 4291104

(end of Edit per @Thomas' observation).

Lastly, again on the same implementation:

>>> x = 'google'
>>> y = 'google'
>>> id(x), id(y)
(2484672, 2484672)

the heuristics are different here because the literal string "looks like it might be an identifier" -- so it might be used in operation requiring interning... so the optimizer interns it anyway (and once interned, looking for it becomes very fast of course). And indeed, surprise surprise...:

>>> z = intern(x)
>>> id(z)
2484672

...x has been interned the very first time (as you see, the return value of intern is the same object as x and y, as it has the same id()). Of course, you shouldn't rely on this either -- the optimizer doesn't have to intern anything automatically, it's just an optimization heuristic; if you need interned string, intern them explicitly, just to be safe. When you do intern strings explicitly...:

>>> x = intern('google.com')
>>> y = intern('google.com')
>>> id(x), id(y)
(4213000, 4213000)

...then you do ensure exactly the same object (i.e., same id()) results each and every time -- so you can apply micro-optimizations such as checking with is rather than == (I've hardly ever found the miniscule performance gain to be worth the bother;-).

Edit: just to clarify, here are the kind of performance differences I'm talking about, on a slow Macbook Air...:

$ python -mtimeit -s"a='google';b='google'" 'a==b'
10000000 loops, best of 3: 0.132 usec per loop
$ python -mtimeit -s"a='google';b='google'" 'a is b'
10000000 loops, best of 3: 0.107 usec per loop
$ python -mtimeit -s"a='goo.gle';b='goo.gle'" 'a==b'
10000000 loops, best of 3: 0.132 usec per loop
$ python -mtimeit -s"a='google';b='google'" 'a is b'
10000000 loops, best of 3: 0.106 usec per loop
$ python -mtimeit -s"a=intern('goo.gle');b=intern('goo.gle')" 'a is b'
10000000 loops, best of 3: 0.0966 usec per loop
$ python -mtimeit -s"a=intern('goo.gle');b=intern('goo.gle')" 'a == b'
10000000 loops, best of 3: 0.126 usec per loop

...a few tens of nanoseconds either way, at most. So, worth even thinking about only in the most extreme "optimize the [expletive deleted] out of this [expletive deleted] performance bottleneck" situations!-)


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...