python - Unable to scrape the Next page URLs using Selenium and scrapy

Question

Welcome To Ask or Share your Answers For Others

python - Unable to scrape the Next page URLs using Selenium and scrapy

posted Oct 7, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - Unable to scrape the Next page URLs using Selenium and scrapy

I am struggling to parse/scrape each page after clicking the Next button using Selenium. I am able to go to the second page, however, it fails after that. Not sure how to solve this, any suggestions? Here is the code:

class PropertyFoxSpider(scrapy.Spider):
    name = 'property_fox'
    start_urls = [
        'https://propertyfox.co.za/listing-search?currentpage=1&term_id=62515&keywords=Western+Cape&orderby=createddate:desc&status%5B%5D=Active'
    ]


    def __init__(self):
        #path to driver
        self.driver = webdriver.Chrome('path')


    def parse(self,response):
        self.driver.get(response.url)
        while True: 
            try: 
                elem = WebDriverWait(self.driver, 10).until(EC.element_to_be_clickable((By.ID, "pagerNext")))
                elem.click()
                url = self.driver.current_url
                yield scrapy.Request(url=url, callback=self.parse_page, dont_filter=False)
            except TimeoutException:
                break



    def parse_page(self, response):
        #self.driver.get(response.url)
        for prop in response.css('div.property-item'):
            link = prop.css('a::attr(href)').get()
            banner = prop.css('div.property-figure-icon div::text').get()
            sold_tag = None
            if banner:
                banner = banner.strip()
                sold_tag = 'sold' if 'sold' in banner.lower() else None

            yield scrapy.Request(
                link,
                callback=self.parse_property,
                meta={'item': {
                    'agency': self.name,
                    'url': link,
                    'offering': 'buy',
                    'banners': banner,
                    'sold_tag':  sold_tag,
                }},
            )
    def parse_property(self, response):
        item = response.meta.get('item')
     ...

question from:https://stackoverflow.com/questions/65901666/unable-to-scrape-the-next-page-urls-using-selenium-and-scrapy

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-06T19:15:12+0000

You can wait until URL changed and then scrape it

from selenium.webdriver.support.ui import WebDriverWait

url = self.driver.current_url
elem = WebDriverWait(self.driver, 10).until(EC.element_to_be_clickable((By.ID, "pagerNext")))
elem.click()
WebDriverWait(self.driver, 10).until(lambda driver: self.driver.current_url != url)
url = self.driver.current_url

Categories

python - Unable to scrape the Next page URLs using Selenium and scrapy

python - Unable to scrape the Next page URLs using Selenium and scrapy

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags