python - UnicodeEncodeError: 'ascii' codec can't encode character u'u2013' in position 3 2: ordinal not in range(128)

Question

Welcome To Ask or Share your Answers For Others

python - UnicodeEncodeError: 'ascii' codec can't encode character u'u2013' in position 3 2: ordinal not in range(128)

posted Oct 17, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - UnicodeEncodeError: 'ascii' codec can't encode character u'u2013' in position 3 2: ordinal not in range(128)

I am parsing an XSL file using xlrd. Most of the things are working fine. I have a dictionary where keys are strings and values are lists of strings. All the keys and values are Unicode. I can print most of the keys and values using str() method. But some values have the Unicode character u2013 for which I get the above error.

I suspect that this is happening because this is Unicode embedded in Unicode and the Python interpreter cannot decode it. So how can I get rid of this error?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-17T00:13:27+0000

You can print Unicode objects as well, you don't need to do str() around it.

Assuming you really want a str:

When you do str(u'u2013') you are trying to convert the Unicode string to a 8-bit string. To do this you need to use an encoding, a mapping between Unicode data to 8-bit data. What str() does is that is uses the system default encoding, which under Python 2 is ASCII. ASCII contains only the 127 first code points of Unicode, that is u0000 to u007F1. The result is that you get the above error, the ASCII codec just doesn't know what u2013 is (it's a long dash, btw).

You therefore need to specify which encoding you want to use. Common ones are ISO-8859-1, most commonly known as Latin-1, which contains the 256 first code points; UTF-8, which can encode all code-points by using variable length encoding, CP1252 that is common on Windows, and various Chinese and Japanese encodings.

You use them like this:

u'u2013'.encode('utf8')

The result is a str containing a sequence of bytes that is the uTF8 representation of the character in question:

'xe2x80x93'

And you can print it:

>>> print 'xe2x80x93'
–

Categories

python - UnicodeEncodeError: 'ascii' codec can't encode character u'u2013' in position 3 2: ordinal not in range(128)

python - UnicodeEncodeError: 'ascii' codec can't encode character u'u2013' in position 3 2: ordinal not in range(128)

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags