Yes, Java uses a UTF-16-esque encoding for internal representations of Strings, and, yes, it encodes characters outside the Basic Multilingual Plane (BMP) using the surrogacy scheme.
If you know you'll be dealing with characters outside the BMP, then here is the canonical way to iterate over the characters of a Java String:
final int length = s.length();
for (int offset = 0; offset < length; ) {
final int codepoint = s.codePointAt(offset);
// do something with the codepoint
offset += Character.charCount(codepoint);
}
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…