The parse_new_items function doesn't retrieve the link, I think the link is being generated dynamically hence the problem. I have looked at other posts but I am not able to solve this problem. Any help will be much appreciated.
import scrapy
import json
from scrapy.crawler import CrawlerProcess
class EtSpider(scrapy.Spider):
name = 'et'
start_urls = ["https://economictimes.indiatimes.com/archive.cms"]
def parse(self, response):
months = response.xpath('//table//tr//a/@href').re(r'/archive/year-d+,month-d+.cms')
for month in months:
month = 'https://economictimes.indiatimes.com' + month
yield scrapy.Request(month, self.parse_news_item)
def parse_news_item(self, response):
days = response.xpath('//table//tr//td//tbody//tr//td//a/@href').re(r'/archivelist/year-d+,month-d+,starttime-d+.cms')
for day in days:
self.logger.info(day)
process = CrawlerProcess()
process.crawl(EtSpider)
process.start()
question from:
https://stackoverflow.com/questions/65932035/how-to-retrieve-dynamic-link-from-economic-times-using-scrapy 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…