I have a Django site where a scrape happens when a user requests it, and my code kicks off a Scrapy spider standalone script in a new process. Naturally, this isn't working with an increase of users.
Something like this:
class StandAloneSpider(Spider):
#a regular spider
settings.overrides['LOG_ENABLED'] = True
#more settings can be changed...
crawler = CrawlerProcess( settings )
crawler.install()
crawler.configure()
spider = StandAloneSpider()
crawler.crawl( spider )
crawler.start()
I've decided to use Celery and use workers to queue up the crawl requests.
However, I'm running into issues with Tornado reactors not being able to restart. The first and second spider runs successfully, but subsequent spiders will throw the ReactorNotRestartable error.
Anyone can share any tips with running Spiders within the Celery framework?
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…