Educated guesses (mentioned above) are probably just a check for Content-Type
header as being sent by server (quite misleading use of educated imho).
For response header Content-Type: text/html
the result is ISO-8859-1 (default for HTML4), regardless any content analysis (ie. default for HTML5 is UTF-8).
For response header Content-Type: text/html; charset=utf-8
the result is UTF-8.
Luckily for us, requests uses chardet library and that usually works quite well (attribute requests.Response.apparent_encoding
), so you usually want to do:
r = requests.get("https://martin.slouf.name/")
# override encoding by real educated guess as provided by chardet
r.encoding = r.apparent_encoding
# access the data
r.text
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…