utf 8 - python requests.get() returns improperly decoded text instead of UTF-8?

Question

Welcome To Ask or Share your Answers For Others

utf 8 - python requests.get() returns improperly decoded text instead of UTF-8?

1 Reply

深蓝 · Answer 1 · 2021-10-17T00:05:54+0000

Educated guesses (mentioned above) are probably just a check for Content-Type header as being sent by server (quite misleading use of educated imho).

For response header Content-Type: text/html the result is ISO-8859-1 (default for HTML4), regardless any content analysis (ie. default for HTML5 is UTF-8).

For response header Content-Type: text/html; charset=utf-8 the result is UTF-8.

Luckily for us, requests uses chardet library and that usually works quite well (attribute requests.Response.apparent_encoding), so you usually want to do:

r = requests.get("https://martin.slouf.name/")
# override encoding by real educated guess as provided by chardet
r.encoding = r.apparent_encoding
# access the data
r.text

Categories

utf 8 - python requests.get() returns improperly decoded text instead of UTF-8?

utf 8 - python requests.get() returns improperly decoded text instead of UTF-8?

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags