Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
888 views
in Technique[技术] by (71.8m points)

python 3.x - How to handle lazy-loaded images in selenium?

Before marking as duplicate, please consider that I have already looked through many related stack overflow posts, as well as websites and articles. I have not found a solution yet.

This question is a follow up to this question here Selenium Webdriver not finding XPATH despite seemingly identical strings. I determined the problem did not in fact come from the xpath method by updating the code to work in a more elegant manner:

for item in feed:
    img_div = item.find_element_by_class_name('listing-cover-photo ')
    img = WebDriverWait(img_div, 10).until(
            EC.visibility_of_element_located((By.TAG_NAME, 'img')))

This works for the first 5ish elements. But after that it times out, by getting the inner html of the img_div and printing it, I found that for elements that time out, instead of the image I want there is a div with class "lazyload-placeholder". This led me to scraping lazy-loaded elements, but there was no answer that I could find. As you can see, I am using a WebDriverWait to try and give it time to load, but I also tried a site-wide wait call, as well as a time.sleep call. Waiting does not seem to fix it. I am looking for the easiest way to handle these lazy-loaded images, preferably in Selenium, but if there are other libraries or products I can use in tandem with the Selenium code I already have, that would be great. Any help is appreciated.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Your images will only load when they're scrolled into view. It's such a common requirement that the Selenium Python docs have it in their FAQ. Adapting from this answer, the below script will scroll down the page before scraping the images.

    driver.get("https://www.grailed.com/categories/footwear")

    SCROLL_PAUSE_TIME = 0.5
    i = 0
    last_height = driver.execute_script("return document.body.scrollHeight")
    while True:
        driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
        time.sleep(SCROLL_PAUSE_TIME)
        new_height = driver.execute_script("return document.body.scrollHeight")
        if new_height == last_height:
            break
        last_height = new_height
        i += 1
        if i == 5:
            break

    driver.implicitly_wait(10)
    shoe_images = driver.find_elements(By.CSS_SELECTOR, 'div.listing-cover-photo img')

    print(len(shoe_images))

In the interest of not scrolling through shoes (seemingly) forever, I have added in a break after 5 iterations, however, you're free to remove the i variable and it will scroll down for as long as it can.

The implicit wait is there to allow catchup for any remaining images that are still loading in.

A test run yielded 82 images, I confirmed that it had scraped all on the page by using Chrome's DevTools selector which highlighted 82. You'll see a different number based on how many images you allow to load.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...