Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
994 views
in Technique[技术] by (71.8m points)

delphi - Indy - IdHttp how to handle page redirects?

Using: Delphi 2010, latest version of Indy

I am trying to scrape the data off Googles Adsense web page, with an aim to get the reports. However I have been unsuccessful so far. It stops after the first request and does not proceed.

Using Fiddler to debug the traffic/requests to Google Adsense website, and a web browser to load the Adsense page, I can see that the request (from the webbrowser) generates a number of redirects until the page is loaded.

However, my Delphi application is only generating a couple of requests before it stops.

Here are the steps I have followed:

  1. Drop a IdHTTP and a IdSSLIOHandlerSocketOpenSSL1 component on the form.
  2. Set the IdHTTP component properties AllowCookies and HandleRedirects to True, and IOHandler property to the IdSSLIOHandlerSocketOpenSSL1.
  3. Set the IdSSLIOHandlerSocketOpenSSL1 component property Method := 'sslvSSLv23'

Finally I have this code:

procedure TfmMain.GetUrlToFile(AURL, AFile : String);
var
 Output : TMemoryStream;
begin
  Output := TMemoryStream.Create;
  try
    IdHTTP1.Get(FURL, Output);
    Output.SaveToFile(AFile);
  finally
    Output.Free;
  end;
end;

However, it does not get to the login page as expected. I would expect it to behave as if it was a webbrowser and proceed through the redirects until it finds the final page.

This is the output of the headers from Fiddler:

HTTP/1.1 302 Found
Location: https://encrypted.google.com/
Cache-Control: private
Content-Type: text/html; charset=UTF-8
Set-Cookie: PREF=ID=5166063f01b64b03:FF=0:TM=1293571783:LM=1293571783:S=a5OtsOqxu_GiV3d6; expires=Thu, 27-Dec-2012 21:29:43 GMT; path=/; domain=.google.com
Set-Cookie: NID=42=XFUwZdkyF0TJKmoJjqoGgYNtGyOz-Irvz7ivao2z0--pCBKPpAvCGUeaa5GXLneP41wlpse-yU5UuC57pBfMkv434t7XB1H68ET0ZgVDNEPNmIVEQRVj7AA1Lnvv2Aez; expires=Wed, 29-Jun-2011 21:29:43 GMT; path=/; domain=.google.com; HttpOnly
Date: Tue, 28 Dec 2010 21:29:43 GMT
Server: gws
Content-Length: 226
X-XSS-Protection: 1; mode=block

Firstly, is there anything wrong with this output?

Is there something more that I should do to get the IdHTTP component to keep pursuing the redirects until the final page?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

IdHTTP component property values prior to making the call:

    Name := 'IdHTTP1';
    IOHandler := IdSSLIOHandlerSocketOpenSSL1;
    AllowCookies := True;
    HandleRedirects := True;
    RedirectMaximum := 35;
    Request.UserAgent := 
      'Mozilla/5.0 (Windows NT 5.1; rv:2.0b8) Gecko/20100101 Firefox/4.' +
      '0b8';
    HTTPOptions := [hoForceEncodeParams];
    OnRedirect := IdHTTP1Redirect;
    CookieManager := IdCookieManager1;

Redirect event handler:

procedure TfmMain.IdHTTP1Redirect(Sender: TObject; var dest: string; var
    NumRedirect: Integer; var Handled: Boolean; var VMethod: string);
begin
   Handled := True;
end;

Making the call:

  FURL := 'https://www.google.com';

  GetUrlToFile( (FURL + '/adsense/'), 'a.html');




  procedure TfmMain.GetUrlToFile(AURL, AFile : String);
  var
   Output : TMemoryStream;
  begin
    Output := TMemoryStream.Create;
    try
      try
       IdHTTP1.Get(AURL, Output);
       IdHTTP1.Disconnect;
      except

      end;
      Output.SaveToFile(AFile);
    finally
      Output.Free;
    end;
  end;





Here's the (request and response headers) output from Fiddler:

alt text


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

1.4m articles

1.4m replys

5 comments

56.8k users

...