Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
651 views
in Technique[技术] by (71.8m points)

c# - WebClient hangs until timeout

I try do download a web page using the WebClient, but it hangs until the timeout in WebClient is reached, and then fails with an Exception.

The following code will not work

WebClient client = new WebClient();
string url = "https://www.nasdaq.com/de/symbol/aapl/dividend-history";
string page = client.DownloadString(url);

Using a different URL, the transfer works fine. For example

WebClient client = new WebClient();
string url = "https://www.ariva.de/apple-aktie";
string page = client.DownloadString(url);

completes very quick and has the whole html in the page variable.

Using a HttpClient or WebRequest/WebResponse gives the same result on the first URL: block until timeout exception.

Both URLs load fine in a browser, in roughly 2-5 seconds. Any idea what the problem is, and what solution is available?

I noticed that when using a WebBrowser control on a Windows Forms dialog, the first URL loads with 20+ javascript errors that need to be confirm-clicked. Same can be observed when developer tools are open in a browser when accessing the first URL.

However, WebClient does NOT act on the return it gets. It does not run the javascript, and does not load referenced pictures, css or other scripts, so this should not be a problem.

Thanks!

Ralf

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

The first site, "https://www.nasdaq.com/de/symbol/aapl/dividend-history";, requires:

The User-agent here is important. If a recent User-agent is specified in the WebRequest.UserAgent, the WebSite may activate the Http 2.0 protocol and HSTS (HTTP Strict Transport Security). These are supported/understood only by recent Browsers (as a reference, FireFox 56 or newer).

Using a less recent Browser as User-agent is necessary, otherwise the WebSite will expect (and wait for) a dynamic response. Using an older User-agent, the WebSite will activate the Http 1.1 protocol and never HSTS.

The second site, "https://www.ariva.de/apple-aktie";, requires:

  • ServicePointManager.SecurityProtocol = SecurityProtocolType.Tls12
  • No Server Certificate validation is required
  • No specific User-agent is required

I suggest to setup a WebRequest (or a corresponding HttpClient setup) this way:
(WebClient could work, but it'd probably require a derived Custom Control)

private async void button1_Click(object sender, EventArgs e)
{
    button1.Enabled = false;
    Uri uri = new Uri("https://www.nasdaq.com/de/symbol/aapl/dividend-history");
    string destinationFile = "[Some Local File]";
    await HTTPDownload(uri, destinationFile);
    button1.Enabled = true;
}


CookieContainer httpCookieJar = new CookieContainer();

//The 32bit IE11 header is the User-agent used here
public async Task HTTPDownload(Uri resourceURI, string filePath)
{
    // Windows 7 may require to explicitly set the Protocol
    ServicePointManager.SecurityProtocol = SecurityProtocolType.Tls12;
    // Only blindly accept the Server certificates if you know and trust the source
    ServicePointManager.ServerCertificateValidationCallback += (s, cert, ch, sec) => { return true; };
    ServicePointManager.DefaultConnectionLimit = 50;

    var httpRequest = WebRequest.CreateHttp(resourceURI);

    try
    {
        httpRequest.CookieContainer = httpCookieJar;
        httpRequest.Timeout = (int)TimeSpan.FromSeconds(15).TotalMilliseconds;
        httpRequest.AllowAutoRedirect = true;
        httpRequest.AutomaticDecompression = DecompressionMethods.GZip | DecompressionMethods.Deflate;
        httpRequest.ServicePoint.Expect100Continue = false;
        httpRequest.UserAgent = "Mozilla / 5.0(Windows NT 6.1; WOW32; Trident / 7.0; rv: 11.0) like Gecko";
        httpRequest.Accept = "ext/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";
        httpRequest.Headers.Add(HttpRequestHeader.AcceptEncoding, "gzip, deflate;q=0.8");
        httpRequest.Headers.Add(HttpRequestHeader.CacheControl, "no-cache");

        using (var httpResponse = (HttpWebResponse)await httpRequest.GetResponseAsync())
        using (var responseStream = httpResponse.GetResponseStream())
        {
            if (httpResponse.StatusCode == HttpStatusCode.OK) {
                try {
                    int buffersize = 132072;
                    using (var fileStream = File.Create(filePath, buffersize, FileOptions.Asynchronous)) {
                        int read;
                        byte[] buffer = new byte[buffersize];
                        while ((read = await responseStream.ReadAsync(buffer, 0, buffer.Length)) > 0)
                        {
                            await fileStream.WriteAsync(buffer, 0, read);
                        }
                    };
                }
                catch (DirectoryNotFoundException) { /* Log or throw */}
                catch (PathTooLongException) { /* Log or throw */}
                catch (IOException) { /* Log or throw */}
            }
        };
    }
    catch (WebException) { /* Log and message */} 
    catch (Exception) { /* Log and message */}
}

The first WebSite (nasdaq.com) returned payload length is 101.562 bytes
The second WebSite (www.ariva.de) returned payload length is 56.919 bytes


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...