I've created a spider, and have linked a method to the spider_idle event.
How do I add a request manually? I can't just return the item from parse -- parse is not running in this case, as all known URLs have been parsed. I have a method to generate new requests, and I would like to run it from the spider_idle callback to add the created request(s).
class FooSpider(BaseSpider):
name = 'foo'
def __init__(self):
dispatcher.connect(self.dont_close_me, signals.spider_idle)
def dont_close_me(self, spider):
if spider != self:
return
# The engine instance will allow me to schedule requests, but
# how do I get the engine object?
engine = unknown_get_engine()
engine.schedule(self.create_request())
# afterward, ensure we stay alive by raising DontCloseSpider
raise DontCloseSpider("..I prefer live spiders.")
UPDATE: I've determined that I probably need the ExecutionEngine
object, but I don't exactly know how to get that from a spider, though it available from a Crawler
instance.
UPDATE 2: ..thanks. ..crawler is attached as a property of the superclass, so I can just use self.crawler with no additional effort. >.>
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…