I've read in some places that JavaScript strings are UTF-16, and in other places they're UCS-2. I did some searching around to try to figure out the difference and found this:
Q: What is the difference between UCS-2 and UTF-16?
A: UCS-2 is obsolete terminology which refers to a Unicode
implementation up to Unicode 1.1, before surrogate code points and
UTF-16 were added to Version 2.0 of the standard. This term should now
be avoided.
UCS-2 does not define a distinct data format, because UTF-16 and UCS-2
are identical for purposes of data exchange. Both are 16-bit, and have
exactly the same code unit representation.
Sometimes in the past an implementation has been labeled "UCS-2" to
indicate that it does not support supplementary characters and doesn't
interpret pairs of surrogate code points as characters. Such an
implementation would not handle processing of character properties,
code point boundaries, collation, etc. for supplementary characters.
via: http://www.unicode.org/faq/utf_bom.html#utf16-11
So my question is, is it because the JavaScript string object's methods and indexes act on 16-bit data values instead of characters what make some people consider it UCS-2? And if so, would a JavaScript string object oriented around characters instead of 16-bit data chunks be considered UTF-16? Or is there something else I'm missing?
Edit: As requested, here are some sources saying JavaScript strings are UCS-2:
http://blog.mozilla.com/nnethercote/2011/07/01/faster-javascript-parsing/
http://terenceyim.wordpress.com/tag/ucs2/
EDIT: For anyone who may come across this, be sure to check out this link:
http://mathiasbynens.be/notes/javascript-encoding
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…