The JSON RFC, section 2.5, says in part:
To escape an extended character that is not in the Basic Multilingual
Plane, the character is represented as a twelve-character sequence,
encoding the UTF-16 surrogate pair. So, for example, a string
containing only the G clef character (U+1D11E) may be represented as
"uD834uDD1E".
Assume I have a valid reason to encode JSON as UTF-16BE (which is allowed). When doing so, is it still necessary to escape characters that are not in the Basic Multilingual Plane? E.g., instead of this:
00 5C 00 75 00 44 00 38 00 33 00 34 00 5C 00 75 00 44 00 44 00 31 00 45
u D 8 3 4 u D D 1 E
which is the 24-byte UTF-16BE byte sequence for uD834uDD1E
, is it legal to do this:
D8 34 DD 1E
i.e., use the 4-byte UTF-16BE values directly?
Similarly, if I were to encode the same JSON string as UTF-32BE, could I simply use the code-point value directly:
00 01 D1 1E
?
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…