python - could sum be faster on lists

Question

Welcome To Ask or Share your Answers For Others

python - could sum be faster on lists

posted Oct 17, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - could sum be faster on lists

This is somehow a follow-up to this question

So first, you'll notice that you cannot perform a sum on a list of strings to concatenate them, python tells you to use str.join instead, and that's good advice because no matter how you use + on strings, the performance is bad.

The "cannot use sum" restriction doesn't apply to list, and though, itertools.chain.from_iterable is the preferred way to perform such list flattening.

But sum(x,[]) when x is a list of lists is definitively bad.

But should it stay that way?

I compared 3 approaches

import time
import itertools

a = [list(range(1,1000)) for _ in range(1000)]

start=time.time()
sum(a,[])
print(time.time()-start)

start=time.time()
list(itertools.chain.from_iterable(a))
print(time.time()-start)


start=time.time()
z=[]
for s in a:
    z += s
print(time.time()-start)

results:

sum() on the list of lists: 10.46647310256958. Okay, we knew.
itertools.chain: 0.07705187797546387
custom accumulated sum using in-place addition: 0.057044029235839844 (can be faster than itertools.chain as you see)

So sum is way behind because it performs result = result + b instead of result += b

So now my question:

Why can't sum use this accumulative approach when available?

(That would be transparent for already existing applications and would make possible the use of the sum built-in to flatten lists efficiently)

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-17T03:09:00+0000

We could try to make sum() smarter, but Alex Martelli and Guido van Rossum wanted to keep it focused on arithmetic summations.

FWIW, you should get reasonable performance with this simple code:

result = []
for seq in mylists:
    result += seq

For your other question, "why can't sum use this accumulative approach when available?", see this comment for builtin_sum() in Python/bltinmodule.c:

    /* It's tempting to use PyNumber_InPlaceAdd instead of
       PyNumber_Add here, to avoid quadratic running time
       when doing 'sum(list_of_lists, [])'.  However, this
       would produce a change in behaviour: a snippet like

         empty = []
         sum([[x] for x in range(10)], empty)

       would change the value of empty. */

Categories

python - could sum be faster on lists

python - could sum be faster on lists

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags