Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
272 views
in Technique[技术] by (71.8m points)

excel - Scraping using VBA

i am trying to extract one figure from a gov website, I have done a lot of googling and I am kinda lost for ideas, my code below returns a figure but it isnt the figure I want to get and I am not entirely sure why.

I want to subtract the figure from the 'Cases by Area (Whole Pandemic)' table 'Upper tier LA' section and 'Southend on Sea' Case number.

https://coronavirus.data.gov.uk/details/cases

I stole this code from online somewhere and tried to replicate with my class number I found within F12 section on the site.

Sub ExtractLastValue()

Set objIE = CreateObject("InternetExplorer.Application")

objIE.Top = 0
objIE.Left = 0
objIE.Width = 800
objIE.Height = 600

objIE.Visible = True

objIE.Navigate ("https://coronavirus.data.gov.uk/details/cases")

Do
DoEvents
Loop Until objIE.readystate = 4

MsgBox objIE.document.getElementsByClassName("sc-bYEvPH khGBIg govuk-table__cell govuk-table__cell--numeric ")(0).innerText

Set objIE = Nothing

End Sub



与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Data comes from the official API and returns a json response dynamically on that page when you click the Upper Tier panel.


Have a look and play with the API guidance here:

https://coronavirus.data.gov.uk/details/developers-guide


You can make a direct xhr request by following the guidance in the API documentation and then using a json parser to handle the response. For your request it would be something like the following:

https://coronavirus.data.gov.uk/api/v1/data?filters=areaName=Southend-on-Sea&areaType=utla&latestBy=cumCasesByPublishDate&structure=
{"date":"date", "areaName":"areaName","cumCasesByPublishDate":"cumCasesByPublishDate",
"cumCasesByPublishDateRate":"cumCasesByPublishDateRate"}

XHR:

A worked example using jsonconverter.bas as the json parser

Option Explicit

Public Sub GetCovidNumbers()
    Dim http As Object, json As Object

    Set http = CreateObject("MSXML2.XMLHTTP")

    With http
        .Open "GET", "https://coronavirus.data.gov.uk/api/v1/data?filters=areaName=Southend-on-Sea&areaType=utla&latestBy=cumCasesByPublishDate&structure={""date"":""date"",""areaName"":""areaName"",""cumCasesByPublishDate"":""cumCasesByPublishDate"",""cumCasesByPublishDateRate"":""cumCasesByPublishDateRate""}", False
        .setRequestHeader "User-Agent", "Mozilla/5.0"
        .send
        Set json = JsonConverter.ParseJson(.responseText)("data")(1)
    End With
    With ActiveSheet
        Dim arr()
        arr = json.Keys
        .Cells(1, 1).Resize(1, UBound(arr) + 1) = arr
        arr = json.Items
        .Cells(2, 1).Resize(1, UBound(arr) + 1) = arr
    End With
End Sub

Json library (Used in above solution):

I use jsonconverter.bas. Download raw code from here and add to standard module called JsonConverter . You then need to go VBE > Tools > References > Add reference to Microsoft Scripting Runtime. Remove the top Attribute line from the copied code.


Internet Explorer:

You could do a slower, more complicated, internet explorer solution where you need to select the utla option when present, then select from the table the desired value:

Option Explicit

Public Sub GetCovidNumbers()
    'Tools references Microsoft Internet Controls and Microsoft HTML Object Library
    
    Dim ie As SHDocVw.InternetExplorer, t As Date, ele As Object
    Const MAX_WAIT_SEC As Long = 10
    
    Set ie = New SHDocVw.InternetExplorer
    
    With ie
        .Visible = True
        .Navigate2 "https://coronavirus.data.gov.uk/details/cases"
        While .Busy Or .ReadyState <> READYSTATE_COMPLETE: DoEvents: Wend
        
        t = Timer 'timed loop for element to be present to click on (to get utla)
        Do
            On Error Resume Next
            Set ele = .Document.querySelector("#card-cases_by_area_whole_pandemic [aria-label='Upper tier LA']")
            On Error GoTo 0
            If Timer - t > MAX_WAIT_SEC Then Exit Do
        Loop While ele Is Nothing

        If ele Is Nothing Then Exit Sub
        
        ele.Click
        
        While .Busy Or .ReadyState <> READYSTATE_COMPLETE: DoEvents: Wend
        
        Dim table As MSHTML.HTMLTable, datetime As String, result()
        
        Set table = .Document.querySelector("table[download='cumCasesByPublishDate,cumCasesByPublishDateRate']")
        datetime = .Document.querySelector("time").getAttribute("datetime")
        result = GetDataForUtla("Southend-on-Sea", datetime, table)
        
        With ActiveSheet
            .Cells(1, 1).Resize(1, 4) = Array("Datetime", "Area", "Cases", "Rate per 100,000 population")
            .Cells(2, 1).Resize(1, UBound(result) + 1) = result
        End With
        .Quit
    End With
    
End Sub

Public Function GetDataForUtla(ByVal utla As String, ByVal datetime As String, ByVal table As MSHTML.HTMLTable) As Variant
    Dim row As MSHTML.HTMLTableRow, i As Long

    For Each row In table.Rows
        
        If InStr(row.outerHTML, utla) > 0 Then
            Dim arr(4)
            arr(0) = datetime
            For i = 0 To 2
                arr(i + 1) = row.Children(i).innerText
            Next
            GetDataForUtla = arr
            Exit Function
        End If
    Next
    GetDataForUtla = Array("Not found")
End Function

References:

  1. https://developer.mozilla.org/en-US/docs/Web/CSS/CSS_Selectors
  2. https://developer.mozilla.org/en-US/docs/Web/API/Document/querySelector

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...