Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
736 views
in Technique[技术] by (71.8m points)

html - rvest web scraping is returning an empty data frame when attempting to collect product price information

I am trying to use 'rvest' to scrape product pricing from: https://www.lowes.com/pl/Lawn-garden-hand-tools-Outdoor-tools-equipment-Outdoors/4294612737?goToProdList=true&int_cmp=LawnGardenHandTools:C:Outdoors:Merch:shop_all_copy. I am using the below code:

library(rvest)
library(tidyverse)

url <- "https://www.lowes.com/pl/Lawn-garden-hand-tools-Outdoor-tools-equipment-Outdoors/4294612737?goToProdList=true&int_cmp=LawnGardenHandTools:C:Outdoors:Merch:shop_all_copy"

html <- read_html(url)

price <- html %>%
  html_node('body') %>%
  xml_find_all("//span[contains(@class, 'h5 js-price v-spacing-mini art-pl-price')]") %>% 
  html_text() %>%
  data.frame()

However - this is returning an empty data frame.

Any advice would be much appreciated.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

rvest can only scrape static HTML content.

Most modern commercial websites use dynamic web content generated on the fly by a JavaScript.

In order to scrape such websites, you will first need to make the site generate the HTML content you are looking for, and then you'll be able to scrape it with rvest.

To do so, you'll need to use a web browser emulator like RSelenium or Splash to allow you to move around in the site and query data programmatically.

RSelenium needs installation of a Docker Selenium server. This is also recommended for Splash.

At the end of this long and interesting journey, you'll have to be creative so that the Website doesn't think that it's being queried by a robot :
bot Captcha


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...