I want to scrap all the shapefiles from the following website: https://www.sciencebase.gov/catalog/items?q=&filter0=browseCategory%3DData&community=California+Condor&filter1=browseType%3DMap+Service&filter2=browseType%3DOGC+WMS+Layer&filter3=browseType%3DDownloadable&filter4=facets.facetName%3DShapefile&&filter5=browseType%3DShapefile
I used the following script:
installed.packages('rvest')
library(rvest)
library(tidyverse)
## CONSTANT ----
URL <- "https://www.sciencebase.gov/catalog/item/54471eb5e4b0f888a81b82ca"
dir_out <- "~/condor"
## MAIN ----
# Get the webpage content
webpage <- read_html(URL)
# Extract the information of interest from the website
data <- html_nodes(webpage, ".sb-file-get sb-download-link")
# Grab the base URLs to download all the referenced data
url_base <- html_attr(data,"href")
# Filter the zip files
shapefile_base <- grep("*.zip",url_base, value=TRUE)
# Fix the double `//`
shapefile_fixed <- gsub("//", "/", shapefile_base)
# Add the URL prefix
shapefile_full <- paste0("https://www.sciencebase.gov/",shapefile_fixed)
# Create the output directory
dir.create(dir_out, showWarnings = FALSE)
# Create a list of filenames
filenames_full <- file.path(dir_out,basename(shapefile_full))
# Download the files
lapply(shapefile_full, FUN=function(x) download.file(x, file.path(dir_out,basename(x))))
# Unzip the files
unzip(filenames_full, overwrite = TRUE)
However for the download file section, I get the following error
> lapply(shapefile_full, FUN=function(x) download.file(x, file.path(dir_out,basename(x))))
trying URL 'https://www.sciencebase.gov/'
Content type 'text/html' length unknown
downloaded 111 bytes
[[1]]
[1] 0
> # Unzip the files
> unzip(filenames_full, overwrite = TRUE)
Warning message:
In unzip(filenames_full, overwrite = TRUE) :
error 1 in extracting from zip file
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…