html - how to get any text into WebBrowser Document without get any Attribute ? vb.net

Question

Welcome To Ask or Share your Answers For Others

html - how to get any text into WebBrowser Document without get any Attribute ? vb.net

posted Jan 31, 2022 in Technique[技术] by 深蓝 (71.8m points)

html - how to get any text into WebBrowser Document without get any Attribute ? vb.net

how to get any text into WebBrowser Document without get any Attribute in vb.net?!

example1:

<h1>text here</h1>

example2:

<h1 name="anything">text here</h1>

how can i get "text here" ?!

thanks. :)

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2022-01-31T07:21:34+0000

You could either 1) Use the WebBrowser's built-in methods to iterate through all <h1> tags or get the very first one, or 2) Use a Regex.

Using the built-in methods

Iterating though all tags is simple, you just have to use the HtmlDocument.GetElementsByTagName() method.

Getting the first found tag (chronologically):

Dim h1Text As String = WebBrowser1.Document.GetElementsByTagName("h1")(0).InnerText

Iterating through all tags:

Dim h1Strings As New List(Of String)

For Each h1Tag As HtmlElement In WebBrowser1.Document.GetElementsByTagName("h1")
    h1Strings.Add(h1Tag.InnerText)
Next

Using a Regex

Using a Regex is not that hard if you know what you are doing. To start with put this Imports statement on the very top of your code file:

Imports System.Text.RegularExpressions

Now you just have to search the WebBrowser's DocumentText for the <h1> tag.

Dim h1Text As String = Regex.Match(WebBrowser1.DocumentText, "(?<=<h1[^<>/]*>)((?!</h1>).)*(?=</h1>)", RegexOptions.IgnoreCase).Value

The Regex pattern explained:

(?<=<h1[^<>/]*>)((?!</h1>).)*(?=</h1>)

(?<= ...): The matched text must be preceded with whatever ... is.

<h1[^<>/]*>: Match the <h1> opening tag with any attributes.

[^<>/]*: Match all characters that are not <, > or /.

((?!</h1>).)*: Match all characters that are not preceded by an </h1> tag.

(?=</h1>): The match must be followed by a </h1> tag.

Categories

html - how to get any text into WebBrowser Document without get any Attribute ? vb.net

html - how to get any text into WebBrowser Document without get any Attribute ? vb.net

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags