string - How to uppercase/lowercase UTF-8 characters in C++?

Question

Welcome To Ask or Share your Answers For Others

string - How to uppercase/lowercase UTF-8 characters in C++?

posted Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

string - How to uppercase/lowercase UTF-8 characters in C++?

Let's imagine I have a UTF-8 encoded std::string containing the following:

óó

and I'd like to convert it to the following:

óó

Ideally I want the uppercase/lowercase approach I'm using to be generic across all of UTF-8. If that's even possible.

The original byte sequence in the string is 0xc3b3c3b3 (two bytes per character, and two instances of ó) and I'd like the output to be 0xc393c393 (two instances of ó). There are some examples on StackOverflow but they use wide character strings, and other answers say you shouldn't be using wide character strings for UTF-8. It also appears that this problem can be very "tricky" in that the output might be dependent upon the user's locale.

I was expecting to just use something like std::toupper(), but the usage is really unclear to me because it seems like I'm not just converting one character at a time but an entire string. Also, this Ideone example I put together seems to show that toupper() of 0xc3b3 is just 0xc3b3, which is an unexpected result. Calling setlocale to either UTF-8 or ISO8859-1 doesn't appear to change the outcome.

I'd love some guidance if you could shed some light on either what I'm doing wrong or why my question/premise is faulty!

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-23T19:16:30+0000

There is no standard way to do Unicode case conversion in C++. There are ways that work on some C++ implementations, but the standard doesn't require them to.

If you want guaranteed Unicode case conversion, you will need to use a library like ICU or Boost.Locale (aka: ICU with a more C++-like interface).

Categories

string - How to uppercase/lowercase UTF-8 characters in C++?

string - How to uppercase/lowercase UTF-8 characters in C++?

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags