I coded a simple crawler.
In the settings.py file, by referring to scrapy documentation, I used
DUPEFILTER_CLASS = 'scrapy.dupefilter.RFPDupeFilter'
If I stop the crawler and restart the crawler again, it is scraping the duplicate urls again.
Am I doing something wrong?
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…