Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
315 views
in Technique[技术] by (71.8m points)

encoding - Why do different python versions have different behaviors on stand output print?

The Python 3.4 and Python 3.8/3.9 are different when I try execute below statement:

print('u212B')

Python 3.8/3.9 can print it correctly.

?

Python 3.4 will report an exception:

Traceback (most recent call last):
  File "test.py", line 9, in <module>
    print('u212B')
UnicodeEncodeError: 'gbk' codec can't encode character 'u212b' in position 0: illegal multibyte sequence

And according to this page, I can avoid the exception by overwrite sys.stdout via statement:

sys.stdout = io.TextIOWrapper(buffer=sys.stdout.buffer,encoding='utf-8')

But python 3.4 still print different charactor as below:

鈩?

So my questions are:

  1. Why do different python versions have different behaviors on stand output print?
  2. How can I print correct value ? in python 3.4?

Edit 1:

I guess the difference is caused by PEP 528 -- Change Windows console encoding to UTF-8. But I still don't understand the machanism of console encoding and how I can print correct character in Python 3.4.


Edit 2:

One more difference, sys.getfilesystemencoding() will get utf-8 in Python 3.8/3.9 and get mbcs in Python 3.4.

question from:https://stackoverflow.com/questions/65839911/why-do-different-python-versions-have-different-behaviors-on-stand-output-print

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Why?

Regarding the rationale behind the stdout encoding you can read more in the answers here: Changing default encoding of Python?

In short, Python 3.4 is using your OS's encoding by default as the one for stdout whereas with Python 3.8 it is set to UTF-8.

How to fix this?

You can use a new method - reconfigure introduced with Python 3.7:

sys.stdout.reconfigure(encoding='utf-8')

Typically, you can try setting the environment variable PYTHONIOENCODING to utf-8:

set PYTHONIOENCODING=utf8

in most of the operating systems except Windows where another environment variable must be set for it to work:

set PYTHONLEGACYWINDOWSIOENCODING=1

You can fix it in the version of Python preceding v. 3.7 via installing win-unicode-console package that handles UTF issues transparently on Windows:

pip install win-unicode-console

If you are not running the code directly from a console there is a possibility that your IDE configuration is interfering.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...