The simplest way to limit number of concurrent connections is to use a thread pool:
#!/usr/bin/env python
from itertools import izip, repeat
from multiprocessing.dummy import Pool # use threads for I/O bound tasks
from urllib2 import urlopen
def fetch(url_data):
try:
return url_data[0], urlopen(*url_data).read(), None
except EnvironmentError as e:
return url_data[0], None, str(e)
if __name__=="__main__":
pool = Pool(20) # use 20 concurrent connections
params = izip(urls, repeat(data)) # use the same data for all urls
for url, content, error in pool.imap_unorderred(fetch, params):
if error is None:
print("done: %s: %d" % (url, len(content)))
else:
print("error: %s: %s" % (url, error))
503 Service Unavailable
is a server error. It might fail to handle the load.
Name or service not known
is a dns error. If you need make many requests; install/enable a local caching dns server.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…