I read about List comprehension without [ ] in Python so now I know that
''.join([str(x) for x in mylist])
is faster than
''.join(str(x) for x in mylist)
because "list comprehensions are highly optimized"
So I suppose that the optimization relies on the parsing of the for
expression, sees mylist
, computes its length, and uses it to pre-allocate the exact array size, which saves a lot of reallocation.
When using ''.join(str(x) for x in mylist)
, join
recieves a generator blindly and has to build its list without knowing the size in advance.
But now consider this:
mylist = [1,2,5,6,3,4,5]
''.join([str(x) for x in mylist if x < 4])
How does python decide of the size of the list comprehension? Is it computed from the size of mylist
, and downsized when iterations are done (which could be very bad if the list is big and the condition filters out 99% of the elements), or does it revert back to the "don't know the size in advance" case?
EDIT: I've done some small benchmarks and it seems to confirm that there's an optimization:
without a condition:
import timeit
print(timeit.timeit("''.join([str(x) for x in [1,5,6,3,5,23,334,23234]])"))
print(timeit.timeit("''.join(str(x) for x in [1,5,6,3,5,23,334,23234])"))
yields (as expected):
3.11010817019474
3.3457350077491026
with a condition:
print(timeit.timeit("''.join([str(x) for x in [1,5,6,3,5,23,334,23234] if x < 50])"))
print(timeit.timeit("''.join(str(x) for x in [1,5,6,3,5,23,334,23234] if x < 50)"))
yields:
2.7942209702566965
3.0316467566203276
so conditional listcomp still is faster.
See Question&Answers more detail:
os