Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
285 views
in Technique[技术] by (71.8m points)

python - Scraping with selenium and BeautifulSoup doesn´t return all the items in the page

So I came from the question here

Now I am able to interact with the page, scroll down the page, close the popup that appears and click at the bottom to expand the page.

The problem is when I count the items, the code only returns 20 and it should be 40.

I have checked the code again and again - I'm missing something but I don't know what.

See my code below:

from selenium import webdriver 
from bs4 import BeautifulSoup
import pandas as pd
import time
import datetime

options = webdriver.ChromeOptions()
options.add_argument('--ignore-certificate-errors')
options.add_argument('--incognito')
#options.add_argument('--headless')
driver = webdriver.Chrome(executable_path=r"C:\chromedriver.exe", options=options)

url = 'https://www.coolmod.com/componentes-pc-procesadores?f=375::No'

driver.get(url)  

iter=1
while True:
        scrollHeight = driver.execute_script("return document.documentElement.scrollHeight")
        Height=10*iter
        driver.execute_script("window.scrollTo(0, " + str(Height) + ");")
        
        if Height > scrollHeight:
            print('End of page')
            break
        iter+=1

time.sleep(3)

popup = driver.find_element_by_class_name('confirm').click()

time.sleep(3)

ver_mas = driver.find_elements_by_class_name('button-load-more')

for x in range(len(ver_mas)):

  if ver_mas[x].is_displayed():
      driver.execute_script("arguments[0].click();", ver_mas[x])
      time.sleep(10)

page_source = driver.page_source

soup = BeautifulSoup(page_source, 'lxml')
# print(soup)

items = soup.find_all('div',class_='col-xs-12 col-sm-6 col-sm-6 col-md-6 col-lg-3 col-product col-custom-width')
print(len(items))
````=

What is wrong?. I newbie in the scraping world.

Regards
question from:https://stackoverflow.com/questions/65833515/scraping-with-selenium-and-beautifulsoup-doesn%c2%b4t-return-all-the-items-in-the-pag

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Your while and for statements don't work as intended.

  1. Using while True: is a bad practice
  2. You scroll until the bottom - but the button-load-more button isn't displayed there - and Selenium will not find it as displayed
  3. find_elements_by_class_name - looks for multiple elements - the page has only one element with that class
  4. if ver_mas[x].is_displayed(): if you are lucky this will be executed only once because the range is 1

Below you can find the solution - here the code looks for the button, moves to it instead of scrolling, and performs a click. If the code fails to found the button - meaning that all the items were loaded - it breaks the while and moves forward.

url = 'https://www.coolmod.com/componentes-pc-procesadores?f=375::No'

driver.get(url)
time.sleep(3)
popup = driver.find_element_by_class_name('confirm').click()

iter = 1
while iter > 0:
    time.sleep(3)
    try:
        ver_mas = driver.find_element_by_class_name('button-load-more')
        actions = ActionChains(driver)
        actions.move_to_element(ver_mas).perform()
        driver.execute_script("arguments[0].click();", ver_mas)

    except NoSuchElementException:
        break
    iter += 1

page_source = driver.page_source

soup = BeautifulSoup(page_source, 'lxml')
# print(soup)

items = soup.find_all('div', class_='col-xs-12 col-sm-6 col-sm-6 col-md-6 col-lg-3 col-product col-custom-width')
print(len(items))

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...