python - Test a string if it's Unicode, which UTF standard is and get its length in bytes?

Question

Welcome To Ask or Share your Answers For Others

python - Test a string if it's Unicode, which UTF standard is and get its length in bytes?

1 Reply

深蓝 · Answer 1 · 2021-10-23T17:45:29+0000

try:
    string.decode('utf-8')
    print "string is UTF-8, length %d bytes" % len(string)
except UnicodeError:
    print "string is not UTF-8"

In Python 2, str is a sequence of bytes and unicode is a sequence of characters. You use str.decode to decode a byte sequence to unicode, and unicode.encode to encode a sequence of characters to str. So for example, u"é" is the unicode string containing the single character U+00E9 and can also be written u"xe9"; encoding into UTF-8 gives the byte sequence "xc3xa9".

In Python 3, this is changed; bytes is a sequence of bytes and str is a sequence of characters.

Categories

python - Test a string if it's Unicode, which UTF standard is and get its length in bytes?

python - Test a string if it's Unicode, which UTF standard is and get its length in bytes?

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags