Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
278 views
in Technique[技术] by (71.8m points)

html - C# Webclient.DownloadData only returns garbage

I have following code, to download the html source of a webadress, but when I run it I only get random characters and a lot of Questionmarks as an output.

The code:

 ServicePointManager.ServerCertificateValidationCallback = new RemoteCertificateValidationCallback(
                delegate
                {
                    return true;
                });
                using (WebClient webClient = new WebClient())
                {
                    webClient.Headers["User-Agent"] = "Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.2.6) Gecko/20100625 Firefox/3.6.6 (.NET CLR 3.5.30729)";
                    webClient.Headers["Accept"] = "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";
                    webClient.Headers["Accept-Language"] = "en-us,en;q=0.5";
                    webClient.Headers["Accept-Encoding"] = "gzip,deflate";
                    webClient.Headers["Accept-Charset"] = "ISO-8859-1,utf-8;q=0.7,*;q=0.7";

                    var htmlData = webClient.DownloadData("https://de.WEBSITE.com/EXAMPLE");
                    var htmlCode = Encoding.UTF8.GetString(htmlData);
                    
                    Console.WriteLine(htmlCode);
                }
question from:https://stackoverflow.com/questions/65905193/c-sharp-webclient-downloaddata-only-returns-garbage

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

It's just because you are downloading the gzip-compressed data. You should decompress data and then convert it to UTF-8.

webClient.Headers["User-Agent"] = "Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.2.6) Gecko/20100625 Firefox/3.6.6 (.NET CLR 3.5.30729)";
webClient.Headers["Accept"] = "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";
webClient.Headers["Accept-Language"] = "en-us,en;q=0.5";
webClient.Headers["Accept-Encoding"] = "gzip,deflate";
webClient.Headers["Accept-Charset"] = "ISO-8859-1,utf-8;q=0.7,*;q=0.7";

var htmlData = webClient.DownloadData("https://de.WEBSITE.com/EXAMPLE");
using (var msi = new MemoryStream(htmlData))
using (var mso = new MemoryStream()) {
    using (var gs = new GZipStream(msi, CompressionMode.Decompress)) {
        gs.CopyTo(mso);
    }

    var htmlCode = Encoding.UTF8.GetString(mso.ToArray());
    Console.WriteLine(htmlCode);
}

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...