Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
308 views
in Technique[技术] by (71.8m points)

python - Why is startswith slower than slicing

Why is the implementation of startwith slower than slicing?

In [1]: x = 'foobar'

In [2]: y = 'foo'

In [3]: %timeit x.startswith(y)
1000000 loops, best of 3: 321 ns per loop

In [4]: %timeit x[:3] == y
10000000 loops, best of 3: 164 ns per loop

Surprisingly, even including calculation for the length, slicing still appears significantly faster:

In [5]: %timeit x[:len(y)] == y
1000000 loops, best of 3: 251 ns per loop

Note: the first part of this behaviour is noted in Python for Data Analysis (Chapter 3), but no explanation for it is offered.

.

If helpful: here is the C code for startswith; and here is the output of dis.dis:

In [6]: import dis

In [7]: dis_it = lambda x: dis.dis(compile(x, '<none>', 'eval'))

In [8]: dis_it('x[:3]==y')
  1           0 LOAD_NAME                0 (x)
              3 LOAD_CONST               0 (3)
              6 SLICE+2             
              7 LOAD_NAME                1 (y)
             10 COMPARE_OP               2 (==)
             13 RETURN_VALUE        

In [9]: dis_it('x.startswith(y)')
  1           0 LOAD_NAME                0 (x)
              3 LOAD_ATTR                1 (startswith)
              6 LOAD_NAME                2 (y)
              9 CALL_FUNCTION            1
             12 RETURN_VALUE 
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Some of the performance difference can be explained by taking into account the time it takes the . operator to do its thing:

>>> x = 'foobar'
>>> y = 'foo'
>>> sw = x.startswith
>>> %timeit x.startswith(y)
1000000 loops, best of 3: 316 ns per loop
>>> %timeit sw(y)
1000000 loops, best of 3: 267 ns per loop
>>> %timeit x[:3] == y
10000000 loops, best of 3: 151 ns per loop

Another portion of the difference can be explained by the fact that startswith is a function, and even no-op function calls take a bit of time:

>>> def f():
...     pass
... 
>>> %timeit f()
10000000 loops, best of 3: 105 ns per loop

This does not totally explain the difference, since the version using slicing and len calls a function and is still faster (compare to sw(y) above -- 267 ns):

>>> %timeit x[:len(y)] == y
1000000 loops, best of 3: 213 ns per loop

My only guess here is that maybe Python optimizes lookup time for built-in functions, or that len calls are heavily optimized (which is probably true). It might be possible to test that with a custom len func. Or possibly this is where the differences identified by LastCoder kick in. Note also larsmans' results, which indicate that startswith is actually faster for longer strings. The whole line of reasoning above applies only to those cases where the overhead I'm talking about actually matters.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...