There is an inconsistency when creating a String with UTF-8 encoding.
Run this code:
public static void encodingIssue() throws IOException {
byte[] array = new byte[3];
array[0] = (byte) -19;
array[1] = (byte) -69;
array[2] = (byte) -100;
String str = new String(array, "UTF-8");
for (char c : str.toCharArray()) {
System.out.println((int) c);
}
}
On Java 1.8.0_20 (and earlier versions) we have the result
65533
On Java 1.7 and 1.6 we have the correct result:
57052
Have you encountered this error? Is there a workaround for this?
This inconsistency manifests itself also for Shift_JIS, JIS_X0212-1990, x-IBM300, x-IBM834, x-IBM942, x-IBM942C, x-JIS0208, but obviously UTF-8 is the more urgent.
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…