My solution for running 200+ spiders at once has been to create a custom command for the project. See http://doc.scrapy.org/en/latest/topics/commands.html#custom-project-commands for more information about implementing custom commands.
YOURPROJECTNAME/commands/allcrawl.py :
from scrapy.command import ScrapyCommand
import urllib
import urllib2
from scrapy import log
class AllCrawlCommand(ScrapyCommand):
requires_project = True
default_settings = {'LOG_ENABLED': False}
def short_desc(self):
return "Schedule a run for all available spiders"
def run(self, args, opts):
url = 'http://localhost:6800/schedule.json'
for s in self.crawler.spiders.list():
values = {'project' : 'YOUR_PROJECT_NAME', 'spider' : s}
data = urllib.urlencode(values)
req = urllib2.Request(url, data)
response = urllib2.urlopen(req)
log.msg(response)
Make sure to include the following in your settings.py
COMMANDS_MODULE = 'YOURPROJECTNAME.commands'
Then from the command line (in your project directory) you can simply type
scrapy allcrawl
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…