Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
350 views
in Technique[技术] by (71.8m points)

Python 3: How to specify stdin encoding

While porting code from Python 2 to Python 3, I run into this problem when reading UTF-8 text from standard input. In Python 2, this works fine:

for line in sys.stdin:
    ...

But Python 3 expects ASCII from sys.stdin, and if there are non-ASCII characters in the input, I get the error:

UnicodeDecodeError: 'ascii' codec can't decode byte .. in position ..: ordinal not in range(128)

For a regular file, I would specify the encoding when opening the file:

with open('filename', 'r', encoding='utf-8') as file:
    for line in file:
        ...

But how can I specify the encoding for standard input? Other SO posts (e.g. How to change the stdin encoding on python) have suggested using

input_stream = codecs.getreader('utf-8')(sys.stdin)
for line in input_stream:
    ...

However, this doesn't work in Python 3. I still get the same error message. I'm using Ubuntu 12.04.2 and my locale is set to en_US.UTF-8.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Python 3 does not expect ASCII from sys.stdin. It'll open stdin in text mode and make an educated guess as to what encoding is used. That guess may come down to ASCII, but that is not a given. See the sys.stdin documentation on how the codec is selected.

Like other file objects opened in text mode, the sys.stdin object derives from the io.TextIOBase base class; it has a .buffer attribute pointing to the underlying buffered IO instance (which in turn has a .raw attribute).

Wrap the sys.stdin.buffer attribute in a new io.TextIOWrapper() instance to specify a different encoding:

import io
import sys

input_stream = io.TextIOWrapper(sys.stdin.buffer, encoding='utf-8')

Alternatively, set the PYTHONIOENCODING environment variable to the desired codec when running python.

From Python 3.7 onwards, you can also reconfigure the existing std* wrappers, provided you do it at the start (before any data has been read):

# Python 3.7 and newer
sys.stdin.reconfigure(encoding='utf-8')

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...