According to the Java SE 7 Specification, Java uses the Unicode UTF-16 standard to represent characters.
When imagining a String
as a simple array of 16-bit variables each containing one character, life is simple.
Unfortunately, there are code points for which 16 bits simply aren't enough (I believe it was 16/17th of all Unicode characters). So in a String
, this poses no direct problem, because when wanting to store one of these ~1.048.576 characters using an additional two bytes, simply two array positions in that String
would be used.
This, without posing any direct problem, works for String
s, because there can always be an additional two bytes. Though when it comes to single variables which, in contrast to the UTF-16 encoding, have a fixed length of 16 bits, how can these characters be stored, and in particular, how does Java do it with its 2-byte "char" type?
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…