Here is a beginner question on Unicode. I'm using Embarcadero C++ Builder 2009, where they supposedly changed the default strings to use Unicode.
- I type various symbols in my source editor, that aren't part of the standard "7-bit ASCII".
- My program is using the String type of C++ Builder to fetch user input.
- I am also adding input manually by setting a value to a wchar_t.
It would seem that there are conflicts in how the symbols are interpreted. Sometimes I get a symbol with for example the code 0x00C7 ('?'), but sometimes the same symbol is coded as 0xFFC7, for example in the source code editor. To my understanding, the former is proper Unicode, the latter is "something else". Can someone confirm this?
I wonder where this "something else" encoding is coming from, and how to get rid of it?
EDIT: Further research: it seems that one place where the 0xFF** encoding appears is when I do something like this:
string str = ...;
wchar_t wch = (wchar_t)str[i];
Same result no matter if it is std::string or VCL String. Is wchar_t
not the same as Unicode?
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…