python - How to Retrieve Dynamic Link from Economic-Times using Scrapy

Question

Welcome To Ask or Share your Answers For Others

python - How to Retrieve Dynamic Link from Economic-Times using Scrapy

posted Oct 7, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - How to Retrieve Dynamic Link from Economic-Times using Scrapy

The parse_new_items function doesn't retrieve the link, I think the link is being generated dynamically hence the problem. I have looked at other posts but I am not able to solve this problem. Any help will be much appreciated.

import scrapy
import json
from scrapy.crawler import CrawlerProcess


class EtSpider(scrapy.Spider):
    name = 'et'
    start_urls = ["https://economictimes.indiatimes.com/archive.cms"]

    def parse(self, response):
        months = response.xpath('//table//tr//a/@href').re(r'/archive/year-d+,month-d+.cms')
        for month in months:
            month = 'https://economictimes.indiatimes.com' + month  
            yield scrapy.Request(month, self.parse_news_item)
            
    def parse_news_item(self, response):
        days = response.xpath('//table//tr//td//tbody//tr//td//a/@href').re(r'/archivelist/year-d+,month-d+,starttime-d+.cms')
        for day in days:
            self.logger.info(day)
        
process = CrawlerProcess()
process.crawl(EtSpider)
process.start()

question from:https://stackoverflow.com/questions/65932035/how-to-retrieve-dynamic-link-from-economic-times-using-scrapy

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

Categories

python - How to Retrieve Dynamic Link from Economic-Times using Scrapy

python - How to Retrieve Dynamic Link from Economic-Times using Scrapy

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags