Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.2k views
in Technique[技术] by (71.8m points)

http - Is there any advantage of using X-Robot-Tag instead of robots.txt?

It looks like there are two mainstream solutions for instructing crawlers what to index and what not to index: adding an X-Robot-Tag HTTP header, or indicating a robots.txt.

Is there any advantage to using the former?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

With robots.txt you cannot disallow indexing of your documents.

They have different purposes:

  • robots.txt can disallow crawling (with Disallow)
  • X-Robots-Tag 1 can disallow indexing (with noindex)

(And both offer additional different features, e.g., linking to your Sitemap in robots.txt, disallowing following links in X-Robots-Tag, and many more.)

Crawling means accessing the document. Indexing means providing a link to (and possibly metadata from or about) the document in an index. In the typical case, a bot indexes a document after having crawled it, but that’s not necessary.

A bot that isn’t allowed to crawl a document may still index it (without ever accessing it). A bot that isn’t allowed to index a document may still crawl it. You can’t disallow both.

1 Note that the header is called X-Robots-Tag, not X-Robot-Tag. By the way, the metadata name robots (for the HTML meta element) is an alternative to the HTTP header.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...