python - Scraping contents of multi web pages of a website using BeautifulSoup and Selenium

Question

Welcome To Ask or Share your Answers For Others

python - Scraping contents of multi web pages of a website using BeautifulSoup and Selenium

posted Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - Scraping contents of multi web pages of a website using BeautifulSoup and Selenium

The website I want to scrap is :

http://www.mouthshut.com/mobile-operators/Reliance-Jio-reviews-925812061

I want to get the last page number of the above the link for proceeding, which is 499 while taking the screenshot.

My code :

   from bs4 import BeautifulSoup 
   from urllib.request import urlopen as uReq
   from selenium import webdriver;import time
   from selenium.webdriver.common.by import By
   from selenium.webdriver.support.ui import WebDriverWait
   from selenium.webdriver.support import expected_conditions as EC
   from selenium.webdriver.common.desired_capabilities import         DesiredCapabilities

   firefox_capabilities = DesiredCapabilities.FIREFOX
   firefox_capabilities['marionette'] = True
   firefox_capabilities['binary'] = '/etc/firefox'

   driver = webdriver.Firefox(capabilities=firefox_capabilities)
   url = "http://www.mouthshut.com/mobile-operators/Reliance-Jio-reviews-925812061"

   driver.get(url)
   wait = WebDriverWait(driver, 10)
   soup=BeautifulSoup(driver.page_source,"lxml")
   containers = soup.findAll("ul",{"class":"pages table"})
   containers[0] = soup.findAll("li")
   li_len = len(containers[0])
   for item in soup.find("ul",{"class":"pages table"}) : 
   li_text = item.select("li")[li_len].text
   print("li_text : {}
".format(li_text))
   driver.quit()

I need help to figure out the error in my code for getting the last page number. Also, I would be grateful if someone give the alternate solution for the same and suggest ways to achieve my intention.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-23T18:23:39+0000

If you want to get the last page number of the above the link for proceeding, which is 499 you can use either Selenium or Beautifulsoup as follows :

Selenium :

from selenium import webdriver

driver = webdriver.Firefox(executable_path=r'C:UtilityBrowserDriversgeckodriver.exe')
url = "http://www.mouthshut.com/mobile-operators/Reliance-Jio-reviews-925812061"
driver.get(url)
element = driver.find_element_by_xpath("//div[@class='row pagination']//p/span[contains(.,'Reviews on Reliance Jio')]")
driver.execute_script("return arguments[0].scrollIntoView(true);", element)
print(driver.find_element_by_xpath("//ul[@class='pagination table']/li/ul[@class='pages table']//li[last()]/a").get_attribute("innerHTML"))
driver.quit()

Console Output :

Beautifulsoup :

import bs4
from bs4 import BeautifulSoup as soup
from urllib.request import urlopen as uReq

url = "http://www.mouthshut.com/mobile-operators/Reliance-Jio-reviews-925812061"
uClient = uReq(url)
page_html = uClient.read()
uClient.close()
page_soup = soup(page_html, "html.parser")
container = page_soup.find("ul",{"class":"pages table"})
all_li = container.findAll("li")
last_div = None
for last_div in all_li:pass
if last_div:
    content = last_div.getText()
    print(content)

Console Output :

Categories

python - Scraping contents of multi web pages of a website using BeautifulSoup and Selenium

python - Scraping contents of multi web pages of a website using BeautifulSoup and Selenium

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Selenium :

Beautifulsoup :

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags