My question is why the two encoding variables are different in the first place
They serve different purposes.
sys.stdout.encoding
should be the encoding that your terminal uses to interpret text otherwise you may get mojibake in the output. It may be utf-8 in one environment, cp437 in another, etc.
sys.getdefaultencoding()
is used on Python 2 for implicit conversions (when the encoding is not set explicitly) i.e., Python 2 may mix ascii-only bytestrings and Unicode strings together e.g., xml.etree.ElementTree
stores text in ascii range as bytestrings or json.dumps()
returns an ascii-only bytestring instead of Unicode in Python 2 — perhaps due to performance — bytes were cheaper than Unicode for representing ascii characters. Implicit conversions are forbidden in Python 3.
sys.getdefaultencoding()
is always 'ascii'
on all systems in Python 2 unless you override it that you should not do otherwise it may hide bugs and your data may be easily corrupted due to the implicit conversions using a possibly wrong encoding for the data.
btw, there is another common encoding sys.getfilesystemencoding()
that may be different from the two. sys.getfilesystemencoding()
should be the encoding that is used to encode OS data (filenames, command-line arguments, environment variables).
The source code encoding declared using # -*- coding: utf-8 -*-
may be different from all of the already-mentioned encodings.
Naturally, if you read data from a file, network; it may use character encodings different from the above e.g., if a file created in notepad is saved using Windows ANSI encoding such as cp1252
then on another system all the standard encodings can be different from it.
The point being: there could be multiple encodings for reasons unrelated to Python and to avoid the headache, use Unicode to represent text: convert as soon as possible encoded text to Unicode on input, and encode it to bytes (possibly using a different encoding) as late as possible on output — this is so called the concept of Unicode sandwich.
how do I manage to use the wrong encoding in this simple piece of code?
Your first code example is not fine. You use non-ascii literal characters in a byte string on Python 2 that you should not do. Use bytestrings' literals only for binary data (or so called native strings if necessary). The code may produce mojibake such as I need 20 000Γé?.
(notice the character noise) if you run it using Python 2 in any environment that does not use utf-8-compatible encoding such as Windows console
The second code example is ok assuming reload(sys)
is not part of it. If you don't want to prefix all string literals with u''
; you could use from __future__ import unicode_literals
Your actual issue is UnicodeEncodeError
error and reload(sys)
is not the right solution!
The correct solution is to configure your locale properly on POSIX (LANG
, LC_CTYPE
) or set PYTHONIOENCODING
envvar if the output is redirected to a pipe/file or install win-unicode-console
to print Unicode to Windows console.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…