Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
267 views
in Technique[技术] by (71.8m points)

python - Converting byte string in unicode string

I have a code such that:

a = "u0432"
b = u"u0432"
c = b"u0432"
d = c.decode('utf8')

print(type(a), a)
print(type(b), b)
print(type(c), c)
print(type(d), d)

And output:

<class 'str'> в
<class 'str'> в
<class 'bytes'> b'\u0432'
<class 'str'> u0432

Why in the latter case I see a character code, instead of the character? How I can transform Byte string to Unicode string that in case of an output I saw the character, instead of its code?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

In strings (or Unicode objects in Python 2), u has a special meaning, namely saying, "here comes a Unicode character specified by it's Unicode ID". Hence u"u0432" will result in the character в.

The b'' prefix tells you this is a sequence of 8-bit bytes, and bytes object has no Unicode characters, so the u code has no special meaning. Hence, b"u0432" is just the sequence of the bytes ,u,0,4,3 and 2.

Essentially you have an 8-bit string containing not a Unicode character, but the specification of a Unicode character.

You can convert this specification using the unicode escape encoder.

>>> c.decode('unicode_escape')
'в'

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...