Recently I read lots of things about Unicode code points and how they evolved over time and sure I read http://www.joelonsoftware.com/articles/Unicode.html this also.
But something I couldn't find the real reason for is why Java uses UTF-16 for a char.
For example, If I had the string which contains 1024 letters of ASCII scoped character string. It means 1024 * 2 bytes
which equals 2KB string memory which it will consume in any way.
So if Java base char would be UTF-8 it would be just 1KB of data. Even if the string has any character which needs to 2bytes for example 10 character of "字" naturally it will increase the size of the memory consumption. (1014 * 1 byte) + (10 * 2 bytes) = 1KB + 20 bytes
The result isn't that obvious 1KB + 20 bytes VS. 2KB
I don't say about ASCII but my curiosity about this is why is it not UTF-8 which just takes care of multibyte chars also. UTF-16 looks like a waste of memory in any string which has lots of non-multibyte chars.
Is there any good reason behind this?
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…