http - Python's `urllib2`: Why do I get error 403 when I `urlopen` a Wikipedia page?

Question

Welcome To Ask or Share your Answers For Others

http - Python's `urllib2`: Why do I get error 403 when I `urlopen` a Wikipedia page?

posted Oct 17, 2021 in Technique[技术] by 深蓝 (71.8m points)

http - Python's `urllib2`: Why do I get error 403 when I `urlopen` a Wikipedia page?

I have a strange bug when trying to urlopen a certain page from Wikipedia. This is the page:

http://en.wikipedia.org/wiki/OpenCola_(drink)

This is the shell session:

>>> f = urllib2.urlopen('http://en.wikipedia.org/wiki/OpenCola_(drink)')
Traceback (most recent call last):
  File "C:Program FilesWing IDE 4.0srcdebugserver\_sandbox.py", line 1, in <module>
    # Used internally for debug sandbox under external interpreter
  File "c:Python26Liburllib2.py", line 126, in urlopen
    return _opener.open(url, data, timeout)
  File "c:Python26Liburllib2.py", line 397, in open
    response = meth(req, response)
  File "c:Python26Liburllib2.py", line 510, in http_response
    'http', request, response, code, msg, hdrs)
  File "c:Python26Liburllib2.py", line 435, in error
    return self._call_chain(*args)
  File "c:Python26Liburllib2.py", line 369, in _call_chain
    result = func(*args)
  File "c:Python26Liburllib2.py", line 518, in http_error_default
    raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 403: Forbidden

This happened to me on two different systems in different continents. Does anyone have an idea why this happens?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

1.4m articles

1.4m replys

5 comments

57.0k users

Most popular tags

javascript python c# java How android c++ php ios html sql r c node.js .net iphone asp.net css reactjs jquery ruby What Android objective mysql linux Is git Python windows Why regex angular swift amazon excel algorithm macos Java visual how bash Can multithreading PHP Using scala angularjs typescript apache spring performance postgresql database flutter json rust arrays C# dart vba django wpf xml vue.js In go Get google jQuery xcode jsf http Google mongodb string shell oop powershell SQL C++ security assembly docker Javascript Android: Does haskell Convert azure debugging delphi vb.net Spring datetime pandas oracle math Django

联盟问答网站-Union QA website

Xstack问答社区

生活宝问答社区

OverStack问答社区

Ostack问答社区

在这了问答社区

在哪了问答社区

Xstack问答社区

无极谷问答社区

TouSu问答社区

SQlite问答社区

Qi-U问答社区

MLink问答社区

Jonic问答社区

Jike问答社区

16892问答社区

Vigges问答社区

55276问答社区

OGeek问答社区

深圳家问答社区

深圳家问答社区

深圳家问答社区

Vigges问答社区

Vigges问答社区

在这了问答社区

DevDocs API Documentations

Xstack问答社区

生活宝问答社区

OverStack问答社区

Ostack问答社区

在这了问答社区

在哪了问答社区

Xstack问答社区

无极谷问答社区

TouSu问答社区

SQlite问答社区

Qi-U问答社区

MLink问答社区

Jonic问答社区

Jike问答社区

16892问答社区

Vigges问答社区

55276问答社区

OGeek问答社区

深圳家问答社区

深圳家问答社区

深圳家问答社区

Vigges问答社区

Vigges问答社区

在这了问答社区

在这了问答社区

DevDocs API Documentations

Xstack问答社区

生活宝问答社区

OverStack问答社区

Ostack问答社区

在这了问答社区

在哪了问答社区

Xstack问答社区

无极谷问答社区

TouSu问答社区

SQlite问答社区

Qi-U问答社区

MLink问答社区

Jonic问答社区

Jike问答社区

16892问答社区

Vigges问答社区

55276问答社区

OGeek问答社区

深圳家问答社区

深圳家问答社区

深圳家问答社区

Vigges问答社区

Vigges问答社区

在这了问答社区

DevDocs API Documentations

广告位招租

深蓝 · Answer 1 · 2021-10-17T03:10:22+0000

Wikipedias stance is:

Data retrieval: Bots may not be used to retrieve bulk content for any use not directly related to an approved bot task. This includes dynamically loading pages from another website, which may result in the website being blacklisted and permanently denied access. If you would like to download bulk content or mirror a project, please do so by downloading or hosting your own copy of our database.

That is why Python is blocked. You're supposed to download data dumps.

Anyways, you can read pages like this in Python 2:

req = urllib2.Request(url, headers={'User-Agent' : "Magic Browser"}) 
con = urllib2.urlopen( req )
print con.read()

Or in Python 3:

import urllib
req = urllib.request.Request(url, headers={'User-Agent' : "Magic Browser"}) 
con = urllib.request.urlopen( req )
print(con.read())

Categories

http - Python's `urllib2`: Why do I get error 403 when I `urlopen` a Wikipedia page?

http - Python's `urllib2`: Why do I get error 403 when I `urlopen` a Wikipedia page?

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags