One way around this is to call the API that the page is using to get that info.
The API returns json which you can parse with a json parser. I use jsonconverter.bas. After installing the code from that link in a standard module called JsonConverter, go to VBE > Tools > References > Add a reference to Microsoft Scripting Runtime.
Finding the API:
If you press F12 to open developer tools and go to the Network
tab and then press F5 to refresh any url of interest you will see the recorded web traffic. You can find the API call there.
See my answer here on how to search the network traffic using a specific observation value you expect to see in the response - this will filter the list of network traffic to those items containing the value of interest. Be sensible in selecting the value - you want something unlikely to occur elsewhere. You can also filter the network traffic to XHR only.
The API response:
The API returns json. More specifically, it returns a dictionary containing 2 keys. The second key, "observations
", can be used to return a collection (denoted by []
) of dictionaries (denoted by {}
).
Each dictionary represents a row of the table (daily observations). You can loop this collection, and then loop the inner dictionaries, to access the table row values and reconstruct the table by populating an array. Explore example json response here.
Explanation of json structure:
click here to enlarge
Explanation of code:
The code is broken down into a number of helper subs and functions, allocating certains tasks to each, to
make code easier to debug and follow, as well as better align with Object Oriented Programming Principles.
Overall the process is:
- Gather urls for
Worksheet("Sheet1")
. Helper function GetAllUrls
.
- Process those urls and only retain the dates which correspond with Tue-Thur. These are kept as strings formatted as
"yyyymmdd"
so can be passed to API later. This is handled by helper functions GetOnlyQualifyingUrlsDates
and IncludeThisDate
. IncludeThisDate
performs the check for whether to include; GetOnlyQualifyingUrlsDates
handles the looping and formatting of results.
- Issue xmlhttp requests by looping over qualifying url dates and concatenating those into the url for the API call, then issuing the request. This is performed by the main sub
GetTables
.
- Sheet creation, for output, is handled by helper function
CreateWorksheet
. This function calls another helper function, SheetExists
, to ensure sheets are only created if they don't already exist, otherwise, the existing sheet by that name is used.
- The resultant json response, from step 3, is passed to a helper sub
WriteOutResults
which accepts the json variable and the output sheet object as arguments. It extracts all the info from the json response; essentially reconstructing the table. It adds the table and headers to the appropriate sheet.
It calls helper function Epoch2Date
, which handles the unix timestamp to datetime conversion for the two unix fields in the json object.
TODO:
- The API key may be time limited. Add a helper function which returns the current valid key.
- The API accepts start date and end date parameters in the url construct. It would be far better to issue one request for the entire range if possible, or chunked ranges e.g. months, to reduce the number of requests made. This would also reduce the likelihood of being blocked. This would mean some additional code would need to be written, before writing out results, to ensure only dates of interest are being written to sheets. Though you could write out all then simply loop all sheets and delete those that aren't wanted (perfectly doable if we are talking about 365 dates total). Personally, I would handle the include date part in the construction of the table from a single request (if possible) that has the min and max dates for entire urls listed passed as start and end date parameters. I would then write a single flat table out to one sheet as this will be much easier for later data analysis.
VBA:
Option Explicit
Public Sub GetTables()
'VBE > Tools > References > Microsoft Scripting Runtime
Dim json As Object, qualifyingUrlsDates(), urls(), url As String
Dim ws As Worksheet, wsOutput As Worksheet, i As Long, startDate As String, endDate As String
Set ws = ThisWorkbook.Worksheets("Sheet1")
urls = GetAllUrls(2, ws, "A")
qualifyingUrlsDates = GetOnlyQualifyingUrlsDates(urls)
'API key may be not be valid over time so look at obtaining by prior request
With CreateObject("MSXML2.XMLHTTP") 'issue xmlhttp request for each valid date (this would be better done using start and enddate to specify entire range _
of batches e.g. months within total range to cut down on requests
For i = LBound(qualifyingUrlsDates) To UBound(qualifyingUrlsDates)
startDate = qualifyingUrlsDates(i)
endDate = startDate ' a little verbose but useful for explaining
url = "https://api.weather.com/v1/geocode/31.76/-106.49/observations/historical.json?apiKey=6532d6454b8aa370768e63d6ba5a832e&startDate=" & startDate & "&endDate=" & endDate & "&units=e"
.Open "GET", url, False
.send
Set json = JsonConverter.ParseJson(.responseText)("observations")
Set wsOutput = CreateWorksheet(qualifyingUrlsDates(i))
WriteOutResults wsOutput, json
Next
End With
End Sub
Public Sub WriteOutResults(ByVal wsOutput As Worksheet, ByVal json As Object)
'json is a collection of dictionaries. Each dictionary is a time period reading from the day i.e. one row in output
Dim results(), item As Object, headers(), r As Long, c As Long, key As Variant
headers = json.item(1).keys 'get the headers which are the keys of each dictionary
ReDim results(1 To json.Count, 1 To UBound(headers) + 1)
For Each item In json
r = r + 1: c = 0 'increase row in results array to store results for table row
For Each key In item.keys
c = c + 1 'increase column number in results array for writing out results
Select Case key
Case "valid_time_gmt", "expire_time_gmt" 'convert unix timestamp fields to datetime
results(r, c) = Epoch2Date(item(key))
Case Else
results(r, c) = item(key)
End Select
Next
Next
With wsOutput
.Cells(1, 1).Resize(1, UBound(headers) + 1) = headers
.Cells(2, 1).Resize(UBound(results, 1), UBound(results, 2)) = results
End With
End Sub
Public Function GetOnlyQualifyingUrlsDates(ByVal urls As Variant) As Variant
Dim i As Long, output(), counter As Long
ReDim output(1 To UBound(urls))
For i = LBound(urls) To UBound(urls)
If IncludeThisDate(urls(i)) Then 'check if weekday is to be included
counter = counter + 1
output(counter) = Format$(Right$(urls(i), 8), "yyyymmdd") 'if to include then add to output array of urls of interest
End If
Next
ReDim Preserve output(1 To counter)
GetOnlyQualifyingUrlsDates = output
End Function
Public Function IncludeThisDate(ByVal url As String) As Boolean
'tue, wed, thurs are valid
IncludeThisDate = Not IsError(Application.Match(Weekday(Right$(url, 8), vbSunday), Array(3, 4, 5)))
End Function
Public Function SheetExists(ByVal sheetName As String) As Boolean '<== function by @Rory
SheetExists = Evaluate("ISREF('" & sheetName & "'!A1)")
End Function
Public Function GetAllUrls(ByVal startRow As Long, ByVal ws As Worksheet, ByVal columnName As String) As Variant
'transpose used based on premise no more than a couple of years of dates
'startRow is start row for urls, ws is sheet where urls found, columnName is string representation of column for urls e.g. "A"
With ws
GetAllUrls = Application.Transpose(ws.Range("A" & startRow & ":A" & .Cells(.rows.Count, columnName).End(xlUp).Row).Value)
End With
End Function
Public Function CreateWorksheet(ByVal sheetName As String) As Worksheet
Dim ws As Worksheet
If SheetExists(sheetName) Then
Set ws = ThisWorkbook.Worksheets(sheetName)
'do something.... clear it? Then add new data to it?
Else
Set ws = ThisWorkbook.Worksheets.Add
ws.Name = sheetName
End If
Set CreateWorksheet = ws
End Function
Public Function Epoch2Date(ByVal E As Currency, Optional msFrac) As Date '@ Schmidt http://www.vbforums.com/showthread.php?805245-EPOCH-to-Date-and-vice-versa
Const Estart As Double = #1/1/1970#
msFrac = 0
If E > 10000000000@ Then E = E * 0.001: msFrac = E - Int(E)
Epoch2Date = Estart + (E - msFrac) / 86400
End Function