Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.1k views
in Technique[技术] by (71.8m points)

http headers - HEAD request receives "403 forbidden" while GET "200 ok"?

after several months having the site disappear from search results in every major search engine, I finally found out a possible reason.

I used WebBug to investigate server header. See the difference if the request is HEAD or GET.

HEAD Sent data:

HEAD / HTTP/1.1
Host: www.attu.it
Connection: close
Accept: */*
User-Agent: WebBug/5.0

HEAD Received data:

HTTP/1.1 403 Forbidden
Date: Tue, 10 Aug 2010 23:01:00 GMT
Server: Apache/2.2
Connection: close
Content-Type: text/html; charset=iso-8859-1

GET Sent data:

GET / HTTP/1.1
Host: www.attu.it
Connection: close
Accept: */*
User-Agent: WebBug/5.0

GET Received data:

HTTP/1.1 200 OK
Date: Tue, 10 Aug 2010 23:06:15 GMT
Server: Apache/2.2
Last-Modified: Fri, 08 Jan 2010 08:58:01 GMT
ETag: "671f91b-2d2-47ca362815840"
Accept-Ranges: bytes
Content-Length: 722
Connection: close
Content-Type: text/html

// HTML code here

Now, browsers by default send a GET request (at least this is what firebug says). Is it possible that crawlers send a HEAD request instead? If so, why only this server responds with a 403, while other servers from other sites I'm mantaining do not?

In case it's important, the only line present in .htaccess is (unless my client changed it, as they don't want to give me access to their server)

AddType text/x-component .htc

UPDATE
Thanks @Ryk. FireBug and Fiddler both send GET requests, which get 200 (or 300) responses. As expected. So I guess it's either a server bad setting (even though it's strange as the hosting is from a major company with millions of clients) or something they put in the .htaccess. They will have to let me look into their account.

The second part of my question was if that could be the cause of the website not appearing in any search engine (site:www.attu.it gives no results). Any thought?

UPDATE 2
After some fiddling around, it turns out there was the phpMyAdmin robots-blocking .htaccess in the root directory, that caused any request from robots to be sent back with a 403 Forbidden

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

I would suggest installing Fiddler and looking carefully at the request. I have seen sometimes that an icon on the page that is in a folder that requires authentication causes a 403 to be returned.

Fiddler will give you a good idea, and you can also try Firefox and install FireBug add-on and inspecting the page for errors.

Looking at the site I get a bunch of 404's for the favicon.ico, but apart from that when I do a simple GET request I get a 200 OK, but when I do a HEAD, I also get a 403. Looking into it now.

UPDATE: I think it might be a configuration on the Apache server, but not 100% sure. http://hc.apache.org/httpclient-3.x/methods/head.html

UPDATE2: Reading this http://www.pubbs.net/200811/httpd/17210-usershttpd-how-to-reject-head-request.html makes me believe that your Apache server could be set to reject HEAD requests. In that case it will return a 403.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...