Python 3.x makes a clear distinction between the types: (Python 3.x明确区分了两种类型:)
-
str
= '...'
literals = a sequence of Unicode characters (UTF-16 or UTF-32, depending on how Python was compiled) (str
= '...'
文字= Unicode字符序列(UTF-16或UTF-32,取决于Python的编译方式))
-
bytes
= b'...'
literals = a sequence of octets (integers between 0 and 255) (bytes
= b'...'
文字=八位字节序列(0到255之间的整数))
If you're familiar with Java or C#, think of str
as String
and bytes
as byte[]
. (如果您熟悉Java或C#,则将str
视为String
并将bytes
视为byte[]
。) If you're familiar with SQL, think of str
as NVARCHAR
and bytes
as BINARY
or BLOB
. (如果您熟悉SQL,请将str
视为NVARCHAR
,将bytes
视为BINARY
或BLOB
。) If you're familiar with the Windows registry, think of str
as REG_SZ
and bytes
as REG_BINARY
. (如果您熟悉Windows注册表,则将str
视为REG_SZ
,将bytes
视为REG_BINARY
。) If you're familiar with C(++), then forget everything you've learned about char
and strings, because A CHARACTER IS NOT A BYTE . (如果您熟悉C(++),那么请忘记您所学到的关于char
和string的所有知识,因为CHARACTER不是BYTE 。) That idea is long obsolete. (这个想法早已过时。)
You use str
when you want to represent text. (要表示文本时,请使用str
。)
print('???? ????')
You use bytes
when you want to represent low-level binary data like structs. (当您要表示低级二进制数据(如struct)时,可以使用bytes
。)
NaN = struct.unpack('>d', b'xffxf8x00x00x00x00x00x00')[0]
You can encode a str
to a bytes
object. (您可以将str
编码为bytes
对象。)
>>> 'uFEFF'.encode('UTF-8')
b'xefxbbxbf'
And you can decode a bytes
into a str
. (您可以将bytes
解码为str
。)
>>> b'xE2x82xAC'.decode('UTF-8')
'€'
But you can't freely mix the two types. (但是您不能随意混合使用这两种类型。)
>>> b'xEFxBBxBF' + 'Text with a UTF-8 BOM'
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: can't concat bytes to str
The b'...'
notation is somewhat confusing in that it allows the bytes 0x01-0x7F to be specified with ASCII characters instead of hex numbers. (b'...'
符号有点令人困惑,因为它允许使用ASCII字符而不是十六进制数字指定字节0x01-0x7F。)
>>> b'A' == b'x41'
True
But I must emphasize, a character is not a byte . (但是我必须强调, 字符不是字节 。)
>>> 'A' == b'A'
False
In Python 2.x (在Python 2.x中)
Pre-3.0 versions of Python lacked this kind of distinction between text and binary data. (Python 3.0之前的版本在文本和二进制数据之间缺乏这种区别。) Instead, there was: (相反,有:)
-
unicode
= u'...'
literals = sequence of Unicode characters = 3.x str
(unicode
= u'...'
文字= Unicode字符序列= 3.x str
)
-
str
= '...'
literals = sequences of confounded bytes/characters (str
= '...'
文字=混杂字节/字符的序列)
- Usually text, encoded in some unspecified encoding. (通常为文本,以某种未指定的编码进行编码。)
- But also used to represent binary data like
struct.pack
output. (而且还用于表示二进制数据,例如struct.pack
输出。)
In order to ease the 2.x-to-3.x transition, the b'...'
literal syntax was backported to Python 2.6, in order to allow distinguishing binary strings (which should be bytes
in 3.x) from text strings (which should be str
in 3.x). (为了简化从2.x到3.x的过渡,将b'...'
文字语法反向移植到Python 2.6,以便区分文本中的二进制字符串(应为3.x中的bytes
)字符串(在3.x中应为str
)。) The b
prefix does nothing in 2.x, but tells the 2to3
script not to convert it to a Unicode string in 3.x. (b
前缀在2.x中不执行任何操作,但告诉2to3
脚本不要在3.x中将其转换为Unicode字符串。)
So yes, b'...'
literals in Python have the same purpose that they do in PHP. (所以是的,Python中的b'...'
字面量具有与PHP相同的目的。)
Also, just out of curiosity, are there more symbols than the b and u that do other things? (另外,出于好奇,是否还有比b和u更多的符号可以执行其他操作?)
The r
prefix creates a raw string (eg, r'\t'
is a backslash + t
instead of a tab), and triple quotes '''...'''
or """..."""
allow multi-line string literals. (r
前缀创建一个原始字符串(例如, r'\t'
是反斜杠+ t
而不是制表符),并且三引号'''...'''
或"""..."""
允许多行字符串文字。)