Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
191 views
in Technique[技术] by (71.8m points)

python - How to Retrieve Dynamic Link from Economic-Times using Scrapy

The parse_new_items function doesn't retrieve the link, I think the link is being generated dynamically hence the problem. I have looked at other posts but I am not able to solve this problem. Any help will be much appreciated.

import scrapy
import json
from scrapy.crawler import CrawlerProcess


class EtSpider(scrapy.Spider):
    name = 'et'
    start_urls = ["https://economictimes.indiatimes.com/archive.cms"]

    def parse(self, response):
        months = response.xpath('//table//tr//a/@href').re(r'/archive/year-d+,month-d+.cms')
        for month in months:
            month = 'https://economictimes.indiatimes.com' + month  
            yield scrapy.Request(month, self.parse_news_item)
            
    def parse_news_item(self, response):
        days = response.xpath('//table//tr//td//tbody//tr//td//a/@href').re(r'/archivelist/year-d+,month-d+,starttime-d+.cms')
        for day in days:
            self.logger.info(day)
        
process = CrawlerProcess()
process.crawl(EtSpider)
process.start()
question from:https://stackoverflow.com/questions/65932035/how-to-retrieve-dynamic-link-from-economic-times-using-scrapy

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)
Waitting for answers

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...