Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
277 views
in Technique[技术] by (71.8m points)

python - Why do numbers in a string become "x0n" when a backslash precedes them?

I was doing a few experiments with escape backslashes in the Python 3.4 shell and noticed something quite strange.

>>> string = "estest123"
>>> string
'estestx01x02x03'
>>> string = "5"
>>> string
'5'
>>> string = "567"
>>> string
'5x06x07'

As you can see in the above code, I defined a variable string as "estest123". However, when I entered string in the console, instead of printing "estest123", it printed "estestx01x02x03". Why does this occur, and what is it used for?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

In Python string literals, the character starts escape sequences. translates to a newline character, to a tab, etc. xhh hex sequences let you produce codepoints with hex values instead, uhhhh produce codepoints with 4-digit hex values, and Uhhhhhhhh produce codepoints with 8-digit hex values.

See the String and Bytes Literals documentation, which contains a table of all the possible escape sequences.

When Python echoes a string object in the interpreter (or you use the repr() function on a string object), then Python creates a representation of the string value. That representation happens to use the exact same Python string literal syntax, to make it easier to debug your values, as you can use the representation to recreate the exact same value.

To keep non-printable characters from either causing havoc or not be shown at all, Python uses the same escape sequence syntax to represent those characters. Thus bytes that are not printable are represented using suitable xhh sequences, or if possible, one of the c single letter escapes (so newlines are shown as ).

In your example, you created non-printable bytes using the ooo octal value escape sequence syntax. The digits are interpreted as an octal number to create a corrensponding codepoint. When echoing that string value back, the default xhh syntax is used to represent the exact same value in hexadecimal:

>>> '20' # Octal for 16
'x10'

while your became a tab character:

>>> print('est')
    est

Note how there is no letter t there; instead, the remaining est is indented by whitespace, a horizontal tab.

If you need to include literal backslash characters you need to double the character:

>>> '\test\1\2\3'
'\test\1\2\3'
>>> print('\test\1\2\3')
est123
>>> len('\test\1\2\3')
11

Note that the representation used doubled backslashes! If it didn't, you'd not be able to copy the string and paste it back into Python to recreate the value. Using print() to write the value to the terminal as actual characters (and not as a string representation) shows that there are single backslashes there, and taking the length shows we have just 11 characters in the string, not 15.

You can also use a raw string literal. That's just a different syntax, the string objects that are created from the syntax are the exact same type, with the same value. It is just a different way of spelling out string values. In a raw string literal, backslashes are just backslashes, as long as they are not the last character in the string; most escape sequences do not work in a raw string literal:

>>> r'est123'
'\test\1\2\3'

Last but not least, if you are creating strings that represent filenames on your Windows system, you could also use forward slashes; most APIs in Window don't mind and accept both types of slash as separators in the filename:

>>> 'C:/This/is/a/valid/path'
'C:/This/is/a/valid/path'

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...