python - Screen scraping: getting around "HTTP Error 403: request disallowed by robots.txt"

Question

Welcome To Ask or Share your Answers For Others

python - Screen scraping: getting around "HTTP Error 403: request disallowed by robots.txt"

posted Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - Screen scraping: getting around "HTTP Error 403: request disallowed by robots.txt"

Is there a way to get around the following?

httperror_seek_wrapper: HTTP Error 403: request disallowed by robots.txt

Is the only way around this to contact the site-owner (barnesandnoble.com).. i'm building a site that would bring them more sales, not sure why they would deny access at a certain depth.

I'm using mechanize and BeautifulSoup on Python2.6.

hoping for a work-around

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-23T18:28:21+0000

replyed Oct 24, 2021 by 深蓝 (71.8m points)

oh you need to ignore the robots.txt

br = mechanize.Browser()
br.set_handle_robots(False)

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

Categories

python - Screen scraping: getting around "HTTP Error 403: request disallowed by robots.txt"

python - Screen scraping: getting around "HTTP Error 403: request disallowed by robots.txt"

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags