python - Locally run all of the spiders in Scrapy

Question

Welcome To Ask or Share your Answers For Others

python - Locally run all of the spiders in Scrapy

posted Oct 17, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - Locally run all of the spiders in Scrapy

Is there a way to run all of the spiders in a Scrapy project without using the Scrapy daemon? There used to be a way to run multiple spiders with scrapy crawl, but that syntax was removed and Scrapy's code changed quite a bit.

I tried creating my own command:

from scrapy.command import ScrapyCommand
from scrapy.utils.misc import load_object
from scrapy.conf import settings

class Command(ScrapyCommand):
    requires_project = True

    def syntax(self):
        return '[options]'

    def short_desc(self):
        return 'Runs all of the spiders'

    def run(self, args, opts):
        spman_cls = load_object(settings['SPIDER_MANAGER_CLASS'])
        spiders = spman_cls.from_settings(settings)

        for spider_name in spiders.list():
            spider = self.crawler.spiders.create(spider_name)
            self.crawler.crawl(spider)

        self.crawler.start()

But once a spider is registered with self.crawler.crawl(), I get assertion errors for all of the other spiders:

Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/scrapy/cmdline.py", line 138, in _run_command
    cmd.run(args, opts)
  File "/home/blender/Projects/scrapers/store_crawler/store_crawler/commands/crawlall.py", line 22, in run
    self.crawler.crawl(spider)
  File "/usr/lib/python2.7/site-packages/scrapy/crawler.py", line 47, in crawl
    return self.engine.open_spider(spider, requests)
  File "/usr/lib/python2.7/site-packages/twisted/internet/defer.py", line 1214, in unwindGenerator
    return _inlineCallbacks(None, gen, Deferred())
--- <exception caught here> ---
  File "/usr/lib/python2.7/site-packages/twisted/internet/defer.py", line 1071, in _inlineCallbacks
    result = g.send(result)
  File "/usr/lib/python2.7/site-packages/scrapy/core/engine.py", line 215, in open_spider
    spider.name
exceptions.AssertionError: No free spider slots when opening 'spidername'

Is there any way to do this? I'd rather not start subclassing core Scrapy components just to run all of my spiders like this.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-17T01:06:46+0000

replyed Oct 17, 2021 by 深蓝 (71.8m points)

Why didn't you just use something like:

scrapy list|xargs -n 1 scrapy crawl

?

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

Categories

python - Locally run all of the spiders in Scrapy

python - Locally run all of the spiders in Scrapy

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags