As Gallaecio proposed, you can add a counter, but the difference here is that you export an item after the if statement. This way, it will almost always end up exporting 2 items.
import scrapy
from scrapy.crawler import CrawlerProcess
from scrapy.exceptions import CloseSpider
class TitleSpider(scrapy.Spider):
name = "title_bot"
start_urls = ["https://www.google.com/", "https://www.yahoo.com/", "https://www.bing.com/"]
item_limit = 2
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self.counter = 0
def parse(self, response):
self.counter += 1
if self.counter > self.item_limit:
raise CloseSpider
yield {'title': response.css('title::text').get()}
Why almost always? you may ask. It has to do with race condition in parse
method.
Imagine that self.counter
is currently equal to 1
, which means that one more item is expected to be exported. But now Scrapy receives two responses at the same moment and invokes the parse
method for both of them. If two threads running the parse
method will increase the counter simultaneously, they will both have self.counter
equal to 3
and thus will both raise the CloseSpider
exception.
In this case (which is very unlikely, but still can happen), spider will export only one item.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…