There are, as far as I know, three ways to create a generator through a comprehension1.
The classical one:
def f1():
g = (i for i in range(10))
The yield
variant:
def f2():
g = [(yield i) for i in range(10)]
The yield from
variant (that raises a SyntaxError
except inside of a function):
def f3():
g = [(yield from range(10))]
The three variants lead to different bytecode, which is not really surprising.
It would seem logical that the first one is the best, since it's a dedicated, straightforward syntax to create a generator through comprehension.
However, it is not the one that produces the shortest bytecode.
Disassembled in Python 3.6
Classical generator comprehension
>>> dis.dis(f1)
4 0 LOAD_CONST 1 (<code object <genexpr> at...>)
2 LOAD_CONST 2 ('f1.<locals>.<genexpr>')
4 MAKE_FUNCTION 0
6 LOAD_GLOBAL 0 (range)
8 LOAD_CONST 3 (10)
10 CALL_FUNCTION 1
12 GET_ITER
14 CALL_FUNCTION 1
16 STORE_FAST 0 (g)
5 18 LOAD_FAST 0 (g)
20 RETURN_VALUE
yield
variant
>>> dis.dis(f2)
8 0 LOAD_CONST 1 (<code object <listcomp> at...>)
2 LOAD_CONST 2 ('f2.<locals>.<listcomp>')
4 MAKE_FUNCTION 0
6 LOAD_GLOBAL 0 (range)
8 LOAD_CONST 3 (10)
10 CALL_FUNCTION 1
12 GET_ITER
14 CALL_FUNCTION 1
16 STORE_FAST 0 (g)
9 18 LOAD_FAST 0 (g)
20 RETURN_VALUE
yield from
variant
>>> dis.dis(f3)
12 0 LOAD_GLOBAL 0 (range)
2 LOAD_CONST 1 (10)
4 CALL_FUNCTION 1
6 GET_YIELD_FROM_ITER
8 LOAD_CONST 0 (None)
10 YIELD_FROM
12 BUILD_LIST 1
14 STORE_FAST 0 (g)
13 16 LOAD_FAST 0 (g)
18 RETURN_VALUE
In addition, a timeit
comparison shows that the yield from
variant is the fastest (still run with Python 3.6):
>>> timeit(f1)
0.5334039637357152
>>> timeit(f2)
0.5358906506760719
>>> timeit(f3)
0.19329123352712596
f3
is more or less 2.7 times as fast as f1
and f2
.
As Leon mentioned in a comment, the efficiency of a generator is best measured by the speed it can be iterated over.
So I changed the three functions so they iterate over the generators, and call a dummy function.
def f():
pass
def fn():
g = ...
for _ in g:
f()
The results are even more blatant:
>>> timeit(f1)
1.6017412817975778
>>> timeit(f2)
1.778684261368946
>>> timeit(f3)
0.1960603619517669
f3
is now 8.4 times as fast as f1
, and 9.3 times as fast as f2
.
Note: The results are more or less the same when the iterable is not range(10)
but a static iterable, such as [0, 1, 2, 3, 4, 5]
.
Therefore, the difference of speed has nothing to do with range
being somehow optimized.
So, what are the differences between the three ways?
More specifically, what is the difference between the yield from
variant and the two other?
Is this normal behaviour that the natural construct (elt for elt in it)
is slower than the tricky [(yield from it)]
?
Shall I from now on replace the former by the latter in all of my scripts, or is there any drawbacks to using the yield from
construct?
Edit
This is all related, so I don't feel like opening a new question, but this is getting even stranger.
I tried comparing range(10)
and [(yield from range(10))]
.
def f1():
for i in range(10):
print(i)
def f2():
for i in [(yield from range(10))]:
print(i)
>>> timeit(f1, number=100000)
26.715589237537195
>>> timeit(f2, number=100000)
0.019948781941049987
So. Now, iterating over [(yield from range(10))]
is 186 times as fast as iterating over a bare range(10)
?
How do you explain why iterating over [(yield from range(10))]
is so much faster than iterating over range(10)
?
1: For the sceptical, the three expressions that follow do produce a generator
object; try and call type
on them.
See Question&Answers more detail:
os