Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
194 views
in Technique[技术] by (71.8m points)

python - list comprehension filtering - "the set() trap"

A reasonably common operation is to filter one list based on another list. People quickly find that this:

[x for x in list_1 if x in list_2]

is slow for large inputs - it's O(n*m). Yuck. How do we speed this up? Use a set to make filtering lookups O(1):

s = set(list_2)
[x for x in list_1 if x in s]

This gives nice overall O(n) behavior. I however often see even veteran coders fall into The Trap?:

[x for x in list_1 if x in set(list_2)]

Ack! This is again O(n*m) since python builds set(list_2) every time, not just once.


I thought that was the end of the story - python can't optimize it away to only build the set once. Just be aware of the pitfall. Gotta live with it. Hmm.

#python 3.3.2+
list_2 = list(range(20)) #small for demonstration purposes
s = set(list_2)
list_1 = list(range(100000))
def f():
    return [x for x in list_1 if x in s]
def g():
    return [x for x in list_1 if x in set(list_2)]
def h():
    return [x for x in list_1 if x in {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19}]

%timeit f()
100 loops, best of 3: 7.31 ms per loop

%timeit g()
10 loops, best of 3: 77.4 ms per loop

%timeit h()
100 loops, best of 3: 6.66 ms per loop

Huh, python (3.3) can optimize away a set literal. It's even faster than f() in this case, presumably because it gets to replace a LOAD_GLOBAL with a LOAD_FAST.

#python 2.7.5+
%timeit h()
10 loops, best of 3: 72.5 ms per loop

Python 2 notably doesn't do this optimization. I've tried investigating further what python3 is doing but unfortunately dis.dis cannot probe the innards of comprehension expressions. Basically everything interesting turns into MAKE_FUNCTION.

So now I'm wondering - why can python 3.x optimize away the set literal to only build once, but not set(list_2)?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

In order to optimize set(list_2), the interpreter needs to prove that list_2 (and all of its elements) does not change between iterations. This is a hard problem in the general case, and it would not surprise me if the interpreter does not even attempt to tackle it.

On the other hand a set literal cannot change its value between iterations, so the optimization is known to be safe.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...