python - Scraping with selenium and BeautifulSoup doesn´t return all the items in the page

Question

Welcome To Ask or Share your Answers For Others

python - Scraping with selenium and BeautifulSoup doesn´t return all the items in the page

posted Apr 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - Scraping with selenium and BeautifulSoup doesn´t return all the items in the page

So I came from the question here

Now I am able to interact with the page, scroll down the page, close the popup that appears and click at the bottom to expand the page.

The problem is when I count the items, the code only returns 20 and it should be 40.

I have checked the code again and again - I'm missing something but I don't know what.

See my code below:

from selenium import webdriver 
from bs4 import BeautifulSoup
import pandas as pd
import time
import datetime

options = webdriver.ChromeOptions()
options.add_argument('--ignore-certificate-errors')
options.add_argument('--incognito')
#options.add_argument('--headless')
driver = webdriver.Chrome(executable_path=r"C:\chromedriver.exe", options=options)

url = 'https://www.coolmod.com/componentes-pc-procesadores?f=375::No'

driver.get(url)  

iter=1
while True:
        scrollHeight = driver.execute_script("return document.documentElement.scrollHeight")
        Height=10*iter
        driver.execute_script("window.scrollTo(0, " + str(Height) + ");")
        
        if Height > scrollHeight:
            print('End of page')
            break
        iter+=1

time.sleep(3)

popup = driver.find_element_by_class_name('confirm').click()

time.sleep(3)

ver_mas = driver.find_elements_by_class_name('button-load-more')

for x in range(len(ver_mas)):

  if ver_mas[x].is_displayed():
      driver.execute_script("arguments[0].click();", ver_mas[x])
      time.sleep(10)

page_source = driver.page_source

soup = BeautifulSoup(page_source, 'lxml')
# print(soup)

items = soup.find_all('div',class_='col-xs-12 col-sm-6 col-sm-6 col-md-6 col-lg-3 col-product col-custom-width')
print(len(items))
````=

What is wrong?. I newbie in the scraping world.

Regards

question from:https://stackoverflow.com/questions/65833515/scraping-with-selenium-and-beautifulsoup-doesn%c2%b4t-return-all-the-items-in-the-pag

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-04-24T01:40:47+0000

Your while and for statements don't work as intended.

Using while True: is a bad practice
You scroll until the bottom - but the button-load-more button isn't displayed there - and Selenium will not find it as displayed
find_elements_by_class_name - looks for multiple elements - the page has only one element with that class
if ver_mas[x].is_displayed(): if you are lucky this will be executed only once because the range is 1

Below you can find the solution - here the code looks for the button, moves to it instead of scrolling, and performs a click. If the code fails to found the button - meaning that all the items were loaded - it breaks the while and moves forward.

url = 'https://www.coolmod.com/componentes-pc-procesadores?f=375::No'

driver.get(url)
time.sleep(3)
popup = driver.find_element_by_class_name('confirm').click()

iter = 1
while iter > 0:
    time.sleep(3)
    try:
        ver_mas = driver.find_element_by_class_name('button-load-more')
        actions = ActionChains(driver)
        actions.move_to_element(ver_mas).perform()
        driver.execute_script("arguments[0].click();", ver_mas)

    except NoSuchElementException:
        break
    iter += 1

page_source = driver.page_source

soup = BeautifulSoup(page_source, 'lxml')
# print(soup)

items = soup.find_all('div', class_='col-xs-12 col-sm-6 col-sm-6 col-md-6 col-lg-3 col-product col-custom-width')
print(len(items))

Categories

python - Scraping with selenium and BeautifulSoup doesn´t return all the items in the page

python - Scraping with selenium and BeautifulSoup doesn´t return all the items in the page

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags

Categories

python - Scraping with selenium and BeautifulSoup doesn&#180;t return all the items in the page

python - Scraping with selenium and BeautifulSoup doesn&#180;t return all the items in the page

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags

python - Scraping with selenium and BeautifulSoup doesn´t return all the items in the page

python - Scraping with selenium and BeautifulSoup doesn´t return all the items in the page