While porting code from Python 2 to Python 3, I run into this problem when reading UTF-8 text from standard input. In Python 2, this works fine:
for line in sys.stdin:
...
But Python 3 expects ASCII from sys.stdin, and if there are non-ASCII characters in the input, I get the error:
UnicodeDecodeError: 'ascii' codec can't decode byte .. in position ..: ordinal not in range(128)
For a regular file, I would specify the encoding when opening the file:
with open('filename', 'r', encoding='utf-8') as file:
for line in file:
...
But how can I specify the encoding for standard input? Other SO posts (e.g. How to change the stdin encoding on python) have suggested using
input_stream = codecs.getreader('utf-8')(sys.stdin)
for line in input_stream:
...
However, this doesn't work in Python 3. I still get the same error message. I'm using Ubuntu 12.04.2 and my locale is set to en_US.UTF-8.
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…