When you call con.text()
, this returns a bytes
object. Calling str()
on it returns a string of the representation of it - thus, the escapes are used rather than the real characters, if you don't specify an encoding. (That means that your string ends up containing \xe2\x80\x99
as well as all sorts of other undesired things.) bytes
is mostly like str
in Python 2: it doesn't have any encoding information stored. str
in Python 3 is like unicode
in Python 2; it has the encoding. So, when turning a bytes
object into a str
object, you need to tell it what encoding it is actually in. In this case, that's utf-8
.
Instead of calling str()
on it, you would be better to use bytes.decode
; it's the same thing, just neater.
>>> import urllib.request as u
>>> zipcode = 47401
>>> url = 'http://watchdog.net/us/?zip={}'.format(zipcode)
>>> con = u.urlopen(url)
>>> page = con.read().decode('utf-8')
>>> page[page.find("<title>") + 7:page.find("</title>") - 15]
'IN-09: Indiana’s 9th'
The only functional change that has been made here is the specification to decode the bytes
object as 'utf-8'
.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…