Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
261 views
in Technique[技术] by (71.8m points)

html - Scrape table from website

I have the following code which navigates to a website, enters in two names (used here for example, the real names will pull a list of 10 names from a spreadsheet), then searches for their records. I'm trying to pull the resulting table that is generated into a spreadsheet. I've tried it a few ways but can't seem to get it to work. Looking for code to go under the comment "Scrape Table Here". I know this involves accessing the site's HTML which I can also do but I'm not familiar enough with HTML to figure this one out on my own. Bonus question: I'd like to also add each person's ID# to the spreadsheet. In the HTML, it's listed after "MP_Details?". For example, for "Robert Jones" it's "36481" that I'm looking to grab. Basically everything highlighted in red in the screenshot, I'd like to pull from the table and spit out on a spreadsheet:

Sub Input_And_Return()

'Create new instance of Internet Explorer
Dim ieApp As Object: Set ieApp = New InternetExplorer
Dim ieDoc As Object
Dim html As HTMLDocument

ieApp.Visible = True
ieApp.navigate "https://hdmaster.net/MP/MP_Public"
Do While ieApp.Busy: DoEvents: Loop
Do Until ieApp.readyState = READYSTATE_COMPLETE: DoEvents: Loop

Set ieDoc = ieApp.document
Set html = ieApp.document

'Enter names into search box and click search
With ieDoc.forms(0)
    .SearchFor.Value = "Anderson, Kelly" & Chr(10) & "Jones, Robert"
    .submit
End With

'Scrape Table Here

'Close down IE and reset status bar
Set ieApp = Nothing
Application.StatusBar = ""

End Sub

HTML Screenshot

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You could copy the table outerHTML to the clipboard and paste that to Excel. It is nice, easy and quick.

Option Explicit
Public Sub GetInfo()
    Dim IE As New InternetExplorer
    Dim nameList As String
    nameList = "Anderson, Kelly" & Chr$(10) & "Jones, Robert"
    With IE
        .Visible = True
        .navigate "https://hdmaster.net/MP/MP_Public"

        While .Busy Or .readyState < 4: DoEvents: Wend

        With .document
            .querySelector("[name=SearchFor]").Value = nameList
            .querySelector("#search").Click
        End With

        While .Busy Or .readyState < 4: DoEvents: Wend

        Dim clipboard As Object
        Set clipboard = GetObject("New:{1C3B4210-F441-11CE-B9EA-00AA006B1A69}")
        clipboard.SetText .document.querySelector(".newTable").outerHTML
        clipboard.PutInClipboard
        ThisWorkbook.Worksheets("Sheet1").Cells(1, 1).PasteSpecial
        .Quit
    End With
End Sub

References (VBE > Tools > References):

  1. Microsoft HTML Object Library
  2. Microsoft Internet Controls

Your code version of the above:

Public Sub Input_And_Return()
    Dim ieApp As Object: Set ieApp = New InternetExplorer
    Dim ieDoc As Object

    With ieApp
        .Visible = True
        .navigate "https://hdmaster.net/MP/MP_Public"
        While .Busy Or .readyState < 4: DoEvents: Wend

        With .document.forms(0)
            .SearchFor.Value = "Anderson, Kelly" & Chr$(10) & "Jones, Robert"
            .submit
            Dim clipboard As Object
            Set clipboard = GetObject("New:{1C3B4210-F441-11CE-B9EA-00AA006B1A69}")
            clipboard.SetText .getElementsByClassName("newTable")(0).outerHTML
            clipboard.PutInClipboard
        End With
        ThisWorkbook.Worksheets("Sheet1").Cells(1, 1).PasteSpecial
        .Quit
    End With
End Sub

Or by looping rows and columns of the table:

Public Sub Input_And_Return()
    Dim ieApp As Object, ieDoc As Object
    Set ieApp = New InternetExplorer
    With ieApp
        .Visible = True
        .navigate "https://hdmaster.net/MP/MP_Public"
        While .Busy Or .readyState < 4: DoEvents: Wend

        With .document.forms(0)
            .SearchFor.Value = "Anderson, Kelly" & Chr$(10) & "Jones, Robert"
            .submit
            Dim r As Long, c As Long, tr As Object, td As Object
            With .getElementsByClassName("newTable")(0)
                For Each tr In .getElementsByTagName("tr")
                    r = r + 1: c = 1
                    For Each td In tr.getElementsByTagName("td")
                        Cells(r, c).Value = td.innerText
                        c = c + 1
                    Next td
                Next tr
            End With
        End With
        .Quit
    End With
End Sub

Output:

output


EDIT:

Some ugly code to get the short ids

Option Explicit
Public Sub Input_And_Return()
    Dim ieApp As Object, ieDoc As Object
    Set ieApp = New InternetExplorer
    With ieApp
        .Visible = True
        .navigate "https://hdmaster.net/MP/MP_Public"
        While .Busy Or .readyState < 4: DoEvents: Wend

        With .document.forms(0)
            .SearchFor.Value = "Anderson, Kelly" & Chr$(10) & "Jones, Robert"
            .submit
            Dim r As Long, c As Long, tr As Object, td As Object, hTable As Object, aNodeList As Object
            Set hTable = .getElementsByClassName("newTable")(0)
            Set aNodeList = .getElementsByClassName("newTable")(0).querySelectorAll("[align=center][onclick*='javascript:rowClick']")

            Dim idDict As Object, i As Long, tempVal As Long
            Set idDict = CreateObject("Scripting.Dictionary")

            For i = 0 To aNodeList.Length - 1
                tempVal = Split(Split(aNodeList.Item(i).onclick, "id=")(1), Chr$(39))(0)
                If Not idDict.exists(tempVal) Then idDict.Add tempVal, vbNullString
            Next i

            With hTable
                For Each tr In .getElementsByTagName("tr")
                    r = r + 1: c = 1
                    For Each td In tr.getElementsByTagName("td")
                        Cells(r, c).Value = td.innerText
                        c = c + 1
                    Next td
                Next tr
               If idDict.Count = r - 1 Then Cells(2, c).Resize(idDict.Count, 1) = Application.WorksheetFunction.Transpose(idDict.keys)
            End With
        End With
        .Quit
    End With
End Sub

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...