I have a basic R script which I have cobbled together using Rselenium which allows me to log into a website, once authenticated my script then goes to the first page of interest and pulls 3 pieces of text from the page.
Luckily for me the URL has been created in such a way that I can pass in a vector of numbers to the URL to take me to the next page of interest hence the use of map().
While on each page I want to scrape the same 3 elements off the page and store them in a master data frame for later analysis.
I wish to use the map family of functions so that I can become more familiar with them but I am really struggling to get these to work, could anyone kindly tell me where I am going wrong?
Here is the main part of my code (go to website and log in)
library(RSelenium)
# https://stackoverflow.com/questions/55201226/session-not-created-this-version-of-chromedriver-only-supports-chrome-version-7/56173984
rd <- rsDriver(browser = "chrome",
chromever = "88.0.4324.27",
port = netstat::free_port())
remdr <- rd[["client"]]
# url of the site's login page
url <- "https://www.myWebsite.com/"
# Navigating to the page
remdr$navigate(url)
# Wait 5 secs for the page to load
Sys.sleep(5)
# Find the initial login button to bring up the username and password fields
loginbutton <- remdr$findElement(using = 'css selector','.plain')
# Click the initial login button to bring up the username and password fields
loginbutton$clickElement()
# Find the username box
username <- remdr$findElement(using = 'css selector','#username')
# Find the password box
password <- remdr$findElement(using = 'css selector','#password')
# Find the final login button
login <- remdr$findElement(using = 'css selector','#btnLoginSubmit1')
# Input the username
username$sendKeysToElement(list("myUsername"))
# Input the password
password$sendKeysToElement(list("myPassword"))
# Click login
login$clickElement()
And hey presto we're in!
Now my code takes me to the initial page of interest (index = 1)
Above I mentioned that I am looking to increment through each page and I can do this by substituting an integer into the URL at the rcId
element, see below
#remdr$navigate("https://myWebsite.com/rc_redesign/#/layout/jcard/drugCard?accountId=XXXXXX&rcId=1&searchType=R&reimbCode=&searchTerm=&searchTexts=*") # Navigating to the page
For each rcId in 1:9999 I wish to grab the following 3 elements and store them in a data frame
hcpcs_info <- remdr$findElement(using = 'class','is-jcard-heading')
hcpcs <- hcpcs_info$getElementText()[[1]]
hcpcs_description <- remdr$findElement(using = 'class','is-jcard-desc')
hcpcs_desc <- hcpcs_description$getElementText()[[1]]
tc_info <- remdr$findElement(using = 'css selector','.col-12.ng-star-inserted')
therapeutic_class <- tc_info$getElementText()[[1]]
I have tried creating a separate function and passing to map but I am not advance enough to piece this together, below is what I have tried.
my_function <- function(index) {
remdr$navigate(sprintf("https://rc2.reimbursementcodes.com/rc_redesign/#/layout/jcard/drugCard?accountId=113479&rcId=%d&searchType=R&reimbCode=*&searchTerm=*&searchTexts=*",index)
Sys.sleep(5)
hcpcs_info[index] <- remdr$findElement(using = 'class','is-jcard-heading')
hcpcs[index] <- hcpcs_info$getElementText()[index][[1]])
}
x <- 1:10 %>%
map(~ my_function(.x))
Any help would be greatly appreciated
question from:
https://stackoverflow.com/questions/66056336/using-purrrmap-to-loop-through-web-pages-for-scraping-with-rselenium