Let's imagine I have a UTF-8 encoded std::string
containing the following:
óó
and I'd like to convert it to the following:
óó
Ideally I want the uppercase/lowercase approach I'm using to be generic across all of UTF-8. If that's even possible.
The original byte sequence in the string is 0xc3b3c3b3
(two bytes per character, and two instances of ó
) and I'd like the output to be 0xc393c393
(two instances of ó
). There are some examples on StackOverflow but they use wide character strings, and other answers say you shouldn't be using wide character strings for UTF-8. It also appears that this problem can be very "tricky" in that the output might be dependent upon the user's locale.
I was expecting to just use something like std::toupper()
, but the usage is really unclear to me because it seems like I'm not just converting one character at a time but an entire string. Also, this Ideone example I put together seems to show that toupper()
of 0xc3b3
is just 0xc3b3
, which is an unexpected result. Calling setlocale
to either UTF-8 or ISO8859-1 doesn't appear to change the outcome.
I'd love some guidance if you could shed some light on either what I'm doing wrong or why my question/premise is faulty!
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…