We can extend your base code and treat the web page as an API endpoint since it takes parameters:
library(httr)
library(rvest)
I use more than ^^ below via ::
but I don't want to pollute the namespace.
I'd usually end up writing a small, parameterized function or small package with a cpl parameterized functions to encapsulate the logic below.
httr::GET(
url = "http://weather.uwyo.edu/cgi-bin/sounding",
query = list(
region = "seasia",
TYPE = "TEXT:LIST",
YEAR = "2006",
MONTH = "09",
FROM = "0100",
TO = "0100",
STNM = "48657"
)
) -> res
^^ makes the web page request and gathers the response.
httr::content(res, as="parsed") %>%
html_nodes("pre") -> wx_dat
^^ turns it into an html_document
Now, we extract the readings:
html_text(wx_dat[[1]]) %>% # turn the first <pre> node into text
strsplit("
") %>% # split it into lines
unlist() %>% # turn it back into a character vector
{ col_names <<- .[3]; . } %>% # pull out the column names (we'll use them later)
.[-(1:5)] %>% # strip off the header
paste0(collapse="
") -> readings # turn it back into a big text blob
^^ cleaned up the table and we'll use readr::read_table()
to parse it. We'll also turn the extract column names into the actual colum names:
readr::read_table(readings, col_names = tolower(unlist(strsplit(trimws(col_names), " +"))))
## # A tibble: 106 x 11
## pres hght temp dwpt relh mixr drct sknt thta thte thtv
## <dbl> <int> <dbl> <dbl> <int> <dbl> <int> <int> <dbl> <dbl> <dbl>
## 1 1009 16 23.8 22.7 94 17.6 170 2 296. 347. 299.
## 2 1002 78 24.6 21.6 83 16.5 252 4 298. 346. 300.
## 3 1000 96 24.4 21.3 83 16.2 275 4 298. 345. 300.
## 4 962 434 22.9 20 84 15.6 235 10 299. 345 302.
## 5 925 777 21.4 18.7 85 14.9 245 11 301. 345. 304.
## 6 887 1142 20.3 16 76 13.0 255 15 304. 343. 306.
## 7 850 1512 19.2 13.2 68 11.3 230 17 306. 341. 308.
## 8 839 1624 18.8 11.8 64 10.5 225 17 307 339. 309.
## 9 828 1735 18 11.4 65 10.3 220 17 307. 339. 309.
## 10 789 2142 15.1 10 72 9.84 205 16 308. 339. 310.
## # ... with 96 more rows
You didn't say you wanted the station metadata but we can get that too (in the second <pre>
:
html_text(wx_dat[[2]]) %>%
strsplit("
") %>%
unlist() %>%
trimws() %>% # get rid of whitespace
.[-1] %>% # blank line removal
strsplit(": ") %>% # separate field and value
lapply(function(x) setNames(as.list(x), c("measure", "value"))) %>% # make each pair a named list
dplyr::bind_rows() -> metadata # turn it into a data frame
metadata
## # A tibble: 30 x 2
## measure value
## <chr> <chr>
## 1 Station identifier WMKD
## 2 Station number 48657
## 3 Observation time 060901/0000
## 4 Station latitude 3.78
## 5 Station longitude 103.21
## 6 Station elevation 16.0
## 7 Showalter index 0.34
## 8 Lifted index -1.40
## 9 LIFT computed using virtual temperature -1.63
## 10 SWEAT index 195.39
## # ... with 20 more rows