What's the deal with Python 3.4, Unicode, different languages and Windows?

Question

Welcome To Ask or Share your Answers For Others

What's the deal with Python 3.4, Unicode, different languages and Windows?

posted Oct 17, 2021 in Technique[技术] by 深蓝 (71.8m points)

What's the deal with Python 3.4, Unicode, different languages and Windows?

Happy examples:

#!/usr/bin/env python
# -*- coding: utf-8 -*-

czech = u'Leo? Janá?ek'.encode("utf-8")
print(czech)

pl = u'Zdzis?aw Beksiński'.encode("utf-8")
print(pl)

jp = u'リング 山村 貞子'.encode("utf-8")
print(jp)

chinese = u'五行'.encode("utf-8")
print(chinese)

MIR = u'Машина для Инженерных Расчётов'.encode("utf-8")
print(MIR)

pt = u'Minha Língua Portuguesa: ?áà'.encode("utf-8")
print(pt)

Unhappy output:

b'Leoxc5xa1 Janxc3xa1xc4x8dek'
b'Zdzisxc5x82aw Beksixc5x84ski'
b'xe3x83xaaxe3x83xb3xe3x82xb0 xe5xb1xb1xe6x9dx91 xe8xb2x9exe5xadx90'
b'xe4xbax94xe8xa1x8c'
b'xd0x9cxd0xb0xd1x88xd0xb8xd0xbdxd0xb0 xd0xb4xd0xbbxd1x8f xd0x98xd0xbdxd0xb6xd0xb5xd0xbdxd0xb5xd1x80xd0xbdxd1x8bxd1x85 xd0xa0xd0xb0xd1x81xd1x87xd1x91xd1x82xd0xbexd0xb2'
b'Minha Lxc3xadngua Portuguesa: xc3xa7xc3xa1xc3xa0'

And if I print them like this:

jp = u'リング 山村 貞子'
print(jp)

I get:

Traceback (most recent call last):
  File "x.py", line 5, in <module>
    print(jp)
  File "C:Python34libencodingscp850.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode characters in position
0-2: character maps to <undefined>

I've also tried the following from this question (And other alternatives that involve sys.stdout.encoding):

#!/usr/bin/env python
# -*- coding: utf-8 -*-

from __future__ import print_function
import sys

def safeprint(s):
    try:
        print(s)
    except UnicodeEncodeError:
        if sys.version_info >= (3,):
            print(s.encode('utf8').decode(sys.stdout.encoding))
        else:
            print(s.encode('utf8'))

jp = u'リング 山村 貞子'
safeprint(jp)

And things get even more cryptic:

πa?πa│πé? σ??μ￥? Φ▓?σ?é

And the docs were not very helpful.

So, what's the deal with Python 3.4, Unicode, different languages and Windows? Almost all possible examples I could find, deal with Python 2.x.

Is there a general and cross-platform way of printing ANY Unicode character from any language in a decent and non-nasty way in Python 3.4?

EDIT:

I've tried typing at the terminal:

chcp 65001

To change the code page, as proposed here and in the comments, and it did not work (Including the attempt with sys.stdout.encoding)

Question&Answers:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-16T22:10:24+0000

Update: Since Python 3.6, the code example that prints Unicode strings directly should just work now (even without py -mrun).

Python can print text in multiple languages in Windows console whatever chcp says:

T:> py -mpip install win-unicode-console
T:> py -mrun your_script.py

where your_script.py prints Unicode directly e.g.:

#!/usr/bin/env python3
print('? á?')      # cz
print('? ń')       # pl
print('リング')     # jp
print('五行')      # cn
print('ш я жх ё') # ru
print('í ?áà')    # pt

All you need is to configure the font in your Windows console that can display the desired characters.

You could also run your Python script via IDLE without installing non-stdlib modules:

T:> py -midlelib -r your_script.py

To write to a file/pipe, use PYTHONIOENCODING=utf-8 as @Mark Tolonen suggested:

T:> set PYTHONIOENCODING=utf-8
T:> py your_script.py >output-utf8.txt

Only the last solution supports non-BMP characters such as ?? (U+1F612 UNAMUSED FACE) -- py -mrun can write them but Windows console displays them as boxes even if the font supports corresponding Unicode characters (though you can copy-paste the boxes into another program, to get the characters).

Categories

What's the deal with Python 3.4, Unicode, different languages and Windows?

What's the deal with Python 3.4, Unicode, different languages and Windows?

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags