Encoding.Unicode
is Microsoft's misleading name for UTF-16 (a double-wide encoding, used in the Windows world for historical reasons but not used by anyone else). http://msdn.microsoft.com/en-us/library/system.text.encoding.unicode.aspx
If you inspect your bytes
array, you'll see that every second byte is 0x00
(because of the double-wide encoding).
You should be using Encoding.UTF8.GetBytes
instead.
But also, you will see different results depending on whether or not you consider the terminating ''
byte to be part of the data you're hashing. Hashing the two bytes "Hi"
will give a different result from hashing the three bytes "Hi"
. You'll have to decide which you want to do. (Presumably you want to do whichever one your friend's PHP code is doing.)
For ASCII text, Encoding.UTF8
will definitely be suitable. If you're aiming for perfect compatibility with your friend's code, even on non-ASCII inputs, you'd better try a few test cases with non-ASCII characters such as é
and 家
and see whether your results still match up. If not, you'll have to figure out what encoding your friend is really using; it might be one of the 8-bit "code pages" that used to be popular before the invention of Unicode. (Again, I think Windows is the main reason that anyone still needs to worry about "code pages".)
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…