c# - Encoding used in cast from char to byte

Question

Welcome To Ask or Share your Answers For Others

c# - Encoding used in cast from char to byte

posted Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

c# - Encoding used in cast from char to byte

Take a look at the following C# code (function extracted from the BuildProtectedURLWithValidity function in http://wmsauth.org/examples):

byte[] StringToBytesToBeHashed(string to_be_hashed) {
    byte[] to_be_hashed_byte_array = new byte[to_be_hashed.Length];
    int i = 0;
    foreach (char cur_char in to_be_hashed)
    {
        to_be_hashed_byte_array[i++] = (byte)cur_char;
    }
    return to_be_hashed_byte_array;
}

My question is: What the casting from byte to char does in terms of Encoding?

I guess it really does nothing in terms of Encoding, but does that mean that the Encoding.Default is the one which is used and so the byte to return will depend on how the framework will encode the underlying string in the specific Operative System?

And besides, is the char actually bigger than a byte (I'm guessing 2 bytes) and will actually omit the first byte?

I was thinking in replacing all this by:

Encoding.UTF8.GetBytes(stringToBeHashed)

What do you think?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-23T19:30:41+0000

The .NET Framework uses Unicode to represent all its characters and strings. The integer value of a char (which you may obtain by casting to int) is equivalent to its UTF-16 code unit. For characters in the Basic Multilingual Plane (which constitute the majority of characters you'll ever encounter), this value is the Unicode code point.

The .NET Framework uses the Char structure to represent a Unicode character. The Unicode Standard identifies each Unicode character with a unique 21-bit scalar number called a code point, and defines the UTF-16 encoding form that specifies how a code point is encoded into a sequence of one or more 16-bit values. Each 16-bit value ranges from hexadecimal 0x0000 through 0xFFFF and is stored in a Char structure. The value of a Char object is its 16-bit numeric (ordinal) value. — Char Structure

Casting a char to byte will result in data loss for any character whose value is larger than 255. Try running the following simple example to understand why:

char c1 = 'D';        // code point 68
byte b1 = (byte)c1;   // b1 is 68

char c2 = 'ń';        // code point 324
byte b2 = (byte)c2;   // b2 is 68 too!
                      // 324 % 256 == 68

Yes, you should definitely use Encoding.UTF8.GetBytes instead.

Categories

c# - Encoding used in cast from char to byte

c# - Encoding used in cast from char to byte

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags