c# - How do you parse an HTML string for image tags to get at the SRC information?

Question

Welcome To Ask or Share your Answers For Others

c# - How do you parse an HTML string for image tags to get at the SRC information?

1 Reply

深蓝 · Answer 1 · 2021-10-23T17:57:53+0000

If your input string is valid XHTML you can treat is as xml, load it into an xmldocument, and do XPath magic :) But it's not always the case.

Otherwise you can try this function, that will return all image links from HtmlSource :

public List<Uri> FetchLinksFromSource(string htmlSource)
{
    List<Uri> links = new List<Uri>();
    string regexImgSrc = @"<img[^>]*?srcs*=s*[""']?([^'"" >]+?)[ '""][^>]*?>";
    MatchCollection matchesImgSrc = Regex.Matches(htmlSource, regexImgSrc, RegexOptions.IgnoreCase | RegexOptions.Singleline);
    foreach (Match m in matchesImgSrc)
    {
        string href = m.Groups[1].Value;
        links.Add(new Uri(href));
    }
    return links;
}

And you can use it like this :

HttpWebRequest request = (HttpWebRequest)WebRequest.Create("http://www.example.com");
request.Credentials = System.Net.CredentialCache.DefaultCredentials;
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
if (response.StatusCode == HttpStatusCode.OK)
{
    using(StreamReader sr = new StreamReader(response.GetResponseStream()))
    {
        List<Uri> links = FetchLinksFromSource(sr.ReadToEnd());
    }
}

Categories

c# - How do you parse an HTML string for image tags to get at the SRC information?

c# - How do you parse an HTML string for image tags to get at the SRC information?

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags