I'm rewriting Dumb Guy's code below using modern Python modules like threading
and Queue
.
import threading, urllib2
import Queue
urls_to_load = [
'http://stackoverflow.com/',
'http://slashdot.org/',
'http://www.archive.org/',
'http://www.yahoo.co.jp/',
]
def read_url(url, queue):
data = urllib2.urlopen(url).read()
print('Fetched %s from %s' % (len(data), url))
queue.put(data)
def fetch_parallel():
result = Queue.Queue()
threads = [threading.Thread(target=read_url, args = (url,result)) for url in urls_to_load]
for t in threads:
t.start()
for t in threads:
t.join()
return result
def fetch_sequencial():
result = Queue.Queue()
for url in urls_to_load:
read_url(url,result)
return result
Best time for find_sequencial()
is 2s. Best time for fetch_parallel()
is 0.9s.
Also it is incorrect to say thread
is useless in Python because of GIL. This is one of those case when thread is useful in Python because the the threads are blocked on I/O. As you can see in my result the parallel case is 2 times faster.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…