Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
3.0k views
in Technique[技术] by (71.8m points)

python - What is a difference in XOR function between python2 and python3?

I have two strings:

string1 = "xc5x06x92xd0x02k=x91"
string2 = "qwert000"

and function:

def xor(str1,str2):
    ret = ''
    for i in range(8):
        ret += chr(ord(str1[i]) ^ ord(str2[i]))
    return ret

The result of the above function is:

in python2.7: ′?q??vk=? ; in hex: ef bf bd 71 ef bf bd ef bf bd 76 6b 3d ef bf bd

in python3.6 ′q÷¢vk=‘ ; in hex: b4 71 f7 a2 76 6b 3d 91

I suppose this is connected with the fact that in python2 str type is limited to ascii, but how to get the same value in both versions ?


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

It is the same value in both versions. You're just printing it on a locale that doesn't support some of the characters, and it's using the Unicode replacement character to display it (the ef bf bd sequences in your output are where a character it didn't recognize became the replacement character; whatever you used to convert to bytes seamlessly replaced the Unicode replacement character with its UTF-8 encoding).

When the locale is correct and you have terminal/font support that handles the result, it works identically on Python 2 and Python 3. The only real difference is that Python 3 has somewhat saner behaviors under some locales (e.g. Windows console using UTF-8 automatically in 3.6, legacy C locale coercion in 3.7), but you got the same string, it's just outputting and displaying it that produces the wrong result while trying to avoid unencodable characters.

To be clear, Python 2 str is not limited to ASCII. In terms of what it can hold, it's equivalent to Python 3 bytes; both can hold arbitrary values in the range [0, 256). The literals differ (Py2 allows non-ASCII characters in a literal without escapes, though without a file encoding declaration, it's not portable), but Py2 str can hold 'xff' just like Py3 bytes's b'xff'.

Note that your code often won't work identically when the str contains characters outside the ASCII range that aren't inserted using escapes (it's dependent on the encoding declaration for the file what non-ASCII literal characters in a string literal mean for Python 2), and definitely won't work the same for stuff that's not in latin-1 (because it will have ordinals larger than 256 in Py3, and who knows what in Py2) unless the inputs are of unicode type in Python 2 (e.g. for literals, prefixed with u).


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...