I need to preserve the same session id when navigating over a site's pages using C#.Net (like a crawler). I found a couple of methods, a http sniffer was very handy, to compare what my IE browser was sending (HTTP request) and receiving from the web server (HTTP response), as the important information is in the headers (that are not displayed by the browser).
Please don't make confusion between session id which is public from server to browser, and server's session variables which are private to server code (like php).
WebHeaderCollection headerCollection = new WebHeaderCollection();
using (HttpWebResponse response = (HttpWebResponse)request.GetResponse())
{
/* save headers */
for (int i = 0; i < response.Headers.Count; i++)
{
headerCollection.Add(response.Headers.AllKeys[i], response.Headers.Get(i));
}
/* save cookies */
cookieContainer = new CookieContainer();
foreach (Cookie cookie in response.Cookies)
{
cookieContainer.Add(cookie);
}
}
to make the other GET or POST requests:
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(uri);
...
/* restore PHPSESSID */
for (int i = 0; i < headerCollection.Count; i++)
{
string key = headerCollection.GetKey(i);
if (key == "Set-Cookie")
{
key = "Cookie";
}
else
{
continue;
}
string value = headerCollection.Get(i);
request.Headers.Add(key, value);
}
/* restore cookies */
request.CookieContainer = cookieContainer;
/* complete request */
Stream writeStream = request.GetRequestStream()
My request is to contribute with better code, or additional ideas to make a better crawler session preserving.
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…