You could either 1) Use the WebBrowser's built-in methods to iterate through all <h1>
tags or get the very first one, or 2) Use a Regex.
Using the built-in methods
Iterating though all tags is simple, you just have to use the HtmlDocument.GetElementsByTagName()
method.
Getting the first found tag (chronologically):
Dim h1Text As String = WebBrowser1.Document.GetElementsByTagName("h1")(0).InnerText
Iterating through all tags:
Dim h1Strings As New List(Of String)
For Each h1Tag As HtmlElement In WebBrowser1.Document.GetElementsByTagName("h1")
h1Strings.Add(h1Tag.InnerText)
Next
Using a Regex
Using a Regex is not that hard if you know what you are doing. To start with put this Imports
statement on the very top of your code file:
Imports System.Text.RegularExpressions
Now you just have to search the WebBrowser's DocumentText
for the <h1>
tag.
Dim h1Text As String = Regex.Match(WebBrowser1.DocumentText, "(?<=<h1[^<>/]*>)((?!</h1>).)*(?=</h1>)", RegexOptions.IgnoreCase).Value
The Regex pattern explained:
(?<=<h1[^<>/]*>)((?!</h1>).)*(?=</h1>)
(?<= ...)
: The matched text must be preceded with whatever ...
is.
<h1[^<>/]*>
: Match the <h1>
opening tag with any attributes.
[^<>/]*
: Match all characters that are not <
, >
or /
.
((?!</h1>).)*
: Match all characters that are not preceded by an </h1>
tag.
(?=</h1>)
: The match must
be followed by a </h1>
tag.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…