Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
336 views
in Technique[技术] by (71.8m points)

c# - HttpWebRequest Unable to download data from nasdaq.com but able from browsers

I am trying to download this website csv file, the file small only take like 2 seconds to download with any browsers.

http://www.nasdaq.com/screening/companies-by-name.aspx?letter=0&exchange=AMEX&render=download

using HttpWebRequest and also WebClient but looks like nasdaq.com is not letting the data to flow through with these two methods, I also tried with Fiddler and nothing coming back. I only can download this data using any browsers.

I tried to change the header, the agent, security protocol, redirect, a little on cookie and many settings but I'm still stuck with this problem.

If anyone has any ideas on how to make it work please let me know, please only reply to this post if you have a solution. Thank you.

Code below in in C# .Net Framework 4.5+

The code below can download other websites but not the nasdaq.com website.

    static void Main(string[] args)
    {
        try
        {
            string testUrl = "https://www.nasdaq.com/screening/companies-by-name.aspx?letter=0&exchange=AMEX&render=download";
          HttpWebRequestTestDownload(testUrl);

        }catch(Exception ex)
        {

            Console.WriteLine(ex.Message);
        }
    }

    public static void HttpWebRequestTestDownload(string address)
    {
        //Example from 
        //https://msdn.microsoft.com/en-us/library/system.net.httpwebrequest.getresponse(v=vs.110).aspx

        System.Net.HttpWebRequest wReq = (System.Net.HttpWebRequest)System.Net.WebRequest.Create(address);
        wReq.KeepAlive = false;

        System.Net.ServicePointManager.SecurityProtocol = System.Net.SecurityProtocolType.Ssl3;
        ServicePointManager.Expect100Continue = true;
        ServicePointManager.SecurityProtocol = SecurityProtocolType.Tls12;
        ServicePointManager.ServerCertificateValidationCallback = delegate { return true; };

        //I also tried the below and still not working

        //wReq.AllowAutoRedirect = true;
        //wReq.KeepAlive = false;
        //wReq.Timeout = 10 * 60 * 1000;//10 minutes


        ////Accept-Encoding
        //wReq.Accept = "application/csv,application/json,text/csv,text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";
        ////Request format text/html. Will improve this if nessary Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
        ////http://www.useragentstring.com/ 
        //wReq.UserAgent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.162 Safari/537.36";
        //wReq.ProtocolVersion = HttpVersion.Version11;
        //// wReq.Headers.Add("Accept-Language", "en_eg");
        //wReq.ServicePoint.Expect100Continue = false;
        ////Fixing invalid SSL problem
        //System.Net.ServicePointManager.ServerCertificateValidationCallback = delegate { return true; };
        ////Fixing  the underlying connection was closed: An unexpected error occurred on a send for Framework 4.5 or higher
        //ServicePointManager.SecurityProtocol = SecurityProtocolType.Ssl3 | SecurityProtocolType.Tls | SecurityProtocolType.Tls11 | SecurityProtocolType.Tls12;
        //wReq.Headers.Add("Accept-Encoding", "gzip, deflate");//Accept encoding



        // Set some reasonable limits on resources used by this request
        wReq.MaximumAutomaticRedirections = 4;
        wReq.MaximumResponseHeadersLength = 4;
        // Set credentials to use for this request.
        wReq.Credentials = System.Net.CredentialCache.DefaultCredentials;
        System.Net.HttpWebResponse response = (System.Net.HttpWebResponse)wReq.GetResponse();

        Console.WriteLine("Content length is {0}", response.ContentLength);
        Console.WriteLine("Content type is {0}", response.ContentType);

        // Get the stream associated with the response.
        System.IO.Stream receiveStream = response.GetResponseStream();

        // Pipes the stream to a higher level stream reader with the required encoding format. 
        System.IO.StreamReader readStream = new StreamReader(receiveStream, Encoding.UTF8);
        Console.WriteLine("Response stream received.");
        Console.WriteLine(readStream.ReadToEnd());
        response.Close();
        readStream.Close();

    }

    public static void WebClientTestDownload(string address)
    {
        System.Net.WebClient client = new System.Net.WebClient();
        string reply = client.DownloadString(address);
    }
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

I was able to resolve the problem. Tips for everyone, use fiddler to capture the network and use the same header. It works after i have all of headers required by this website.

using (WebClient web = new WebClient())
{
     web.Headers[HttpRequestHeader.Host] = "www.nasdaq.com"
     web.Headers[HttpRequestHeader.Accept] = "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,/;q=0.8";
     web.Headers[HttpRequestHeader.AcceptEncoding] = "gzip, deflate";
     web.Headers[HttpRequestHeader.UserAgent] = "Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Mobile Safari/537.36";
     string reply = web.DownloadString(url).;
}

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...