Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
186 views
in Technique[技术] by (71.8m points)

python - 字符串文字前的'b'字符做什么?(What does the 'b' character do in front of a string literal?)

Apparently, the following is valid syntax (显然,以下是有效的语法)

my_string = b'The string'

I would like to know: (我想知道:)

  1. What does this b character in front of the string mean? (字符串前面的b字符是什么意思?)
  2. What are the effects of using it? (使用它有什么作用?)
  3. What are appropriate situations to use it? (在什么情况下可以使用它?)

I found a related question right here on SO, but that question is about PHP though, and it states the b is used to indicate the string is binary, as opposed to Unicode, which was needed for code to be compatible from version of PHP < 6, when migrating to PHP 6. I don't think this applies to Python. (我在SO上找到了一个相关的问题 ,但是这个问题是关于PHP的,它指出b用于表示字符串是二进制的,与Unicode相反,Unicode是使代码与PHP版本兼容的必需< 6,移植到PHP 6时。我认为这不适用于Python。)

I did find this documentation on the Python site about using a u character in the same syntax to specify a string as Unicode. (我确实在Python网站上找到了有关以相同语法使用u字符将字符串指定为Unicode的文档 。) Unfortunately, it doesn't mention the b character anywhere in that document. (不幸的是,它在该文档的任何地方都没有提到b字符。)

Also, just out of curiosity, are there more symbols than the b and u that do other things? (另外,出于好奇,是否还有比bu更多的符号可以执行其他操作?)

  ask by Jesse Webb translate from so

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Python 3.x makes a clear distinction between the types: (Python 3.x明确区分了两种类型:)

  • str = '...' literals = a sequence of Unicode characters (UTF-16 or UTF-32, depending on how Python was compiled) (str = '...'文字= Unicode字符序列(UTF-16或UTF-32,取决于Python的编译方式))
  • bytes = b'...' literals = a sequence of octets (integers between 0 and 255) (bytes = b'...'文字=八位字节序列(0到255之间的整数))

If you're familiar with Java or C#, think of str as String and bytes as byte[] . (如果您熟悉Java或C#,则将str视为String并将bytes视为byte[] 。) If you're familiar with SQL, think of str as NVARCHAR and bytes as BINARY or BLOB . (如果您熟悉SQL,请将str视为NVARCHAR ,将bytes视为BINARYBLOB 。) If you're familiar with the Windows registry, think of str as REG_SZ and bytes as REG_BINARY . (如果您熟悉Windows注册表,则将str视为REG_SZ ,将bytes视为REG_BINARY 。) If you're familiar with C(++), then forget everything you've learned about char and strings, because A CHARACTER IS NOT A BYTE . (如果您熟悉C(++),那么请忘记您所学到的关于char和string的所有知识,因为CHARACTER不是BYTE 。) That idea is long obsolete. (这个想法早已过时。)

You use str when you want to represent text. (要表示文本时,请使用str 。)

print('???? ????')

You use bytes when you want to represent low-level binary data like structs. (当您要表示低级二进制数据(如struct)时,可以使用bytes 。)

NaN = struct.unpack('>d', b'xffxf8x00x00x00x00x00x00')[0]

You can encode a str to a bytes object. (您可以将str 编码bytes对象。)

>>> 'uFEFF'.encode('UTF-8')
b'xefxbbxbf'

And you can decode a bytes into a str . (您可以将bytes解码为str 。)

>>> b'xE2x82xAC'.decode('UTF-8')
'€'

But you can't freely mix the two types. (但是您不能随意混合使用这两种类型。)

>>> b'xEFxBBxBF' + 'Text with a UTF-8 BOM'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: can't concat bytes to str

The b'...' notation is somewhat confusing in that it allows the bytes 0x01-0x7F to be specified with ASCII characters instead of hex numbers. (b'...'符号有点令人困惑,因为它允许使用ASCII字符而不是十六进制数字指定字节0x01-0x7F。)

>>> b'A' == b'x41'
True

But I must emphasize, a character is not a byte . (但是我必须强调, 字符不是字节 。)

>>> 'A' == b'A'
False

In Python 2.x (在Python 2.x中)

Pre-3.0 versions of Python lacked this kind of distinction between text and binary data. (Python 3.0之前的版本在文本和二进制数据之间缺乏这种区别。) Instead, there was: (相反,有:)

  • unicode = u'...' literals = sequence of Unicode characters = 3.x str (unicode = u'...'文字= Unicode字符序列= 3.x str)
  • str = '...' literals = sequences of confounded bytes/characters (str = '...'文字=混杂字节/字符的序列)
    • Usually text, encoded in some unspecified encoding. (通常为文本,以某种未指定的编码进行编码。)
    • But also used to represent binary data like struct.pack output. (而且还用于表示二进制数据,例如struct.pack输出。)

In order to ease the 2.x-to-3.x transition, the b'...' literal syntax was backported to Python 2.6, in order to allow distinguishing binary strings (which should be bytes in 3.x) from text strings (which should be str in 3.x). (为了简化从2.x到3.x的过渡,将b'...'文字语法反向移植到Python 2.6,以便区分文本中的二进制字符串(应为3.x中的bytes )字符串(在3.x中应为str )。) The b prefix does nothing in 2.x, but tells the 2to3 script not to convert it to a Unicode string in 3.x. (b前缀在2.x中不执行任何操作,但告诉2to3脚本不要在3.x中将其转换为Unicode字符串。)

So yes, b'...' literals in Python have the same purpose that they do in PHP. (所以是的,Python中的b'...'字面量具有与PHP相同的目的。)

Also, just out of curiosity, are there more symbols than the b and u that do other things? (另外,出于好奇,是否还有比b和u更多的符号可以执行其他操作?)

The r prefix creates a raw string (eg, r'\t' is a backslash + t instead of a tab), and triple quotes '''...''' or """...""" allow multi-line string literals. (r前缀创建一个原始字符串(例如, r'\t'是反斜杠+ t而不是制表符),并且三引号'''...'''"""..."""允许多行字符串文字。)


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...