Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
440 views
in Technique[技术] by (71.8m points)

c++ - char vs wchar_t when to use which data type

I want to understand the difference between char and wchar_t ? I understand that wchar_t uses more bytes but can I get a clear cut example to differentiate when I would use char vs wchar_t

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Short anwser:

You should never use wchar_t in modern C++, except when interacting with OS-specific APIs (basically use wchar_t only to call Windows API functions).

Long answer:

Design of standard C++ library implies there is only one way to handle Unicode - by storing UTF-8 encoded strings in char arrays, as almost all functions exist only in char variants (think of std::exception::what).

In a C++ program you have two locales:

  • Standard C library locale set by std::setlocale
  • Standard C++ library locale set by std::locale::global

Unfortunately, none of them defines behavior of standard functions that open files (like std::fopen, std::fstream::open etc). Behavior differs between OSes:

  • Linux is encoding agnostic, so those function simply pass char string to underlying system call
  • On Windows char string is converted to wide string using user specific locale before system call is made

Everything usually works fine on Linux as everyone uses UTF-8 based locales so all user input and arguments passed to main functions will be UTF-8 encoded. But you might still need to switch current locales to UTF-8 variants explicitly as by default C++ program starts using default "C" locale. At this point, if you only care about Linux and don't need to support Windows, you can use char arrays and std::string assuming it is UTF-8 sequences and everything "just works".

Problems appear when you want to support Windows, as there you always have additional 3rd locale: the one set for the current user which can be configured somewhere in "Control Panel". The main issue is that this locale is never a unicode locale, so it is impossible to use functions like std::fopen(const char *) and std::fstream::open(const char *) to open a file using Unicode path. On Windows you will have to use custom wrappers that use non-standard Windows specific functions like _wfopen, std::fstream::open(const wchar_t *) on Windows. You can check Boost.Nowide (not yet included in Boost) to see how this can be done: http://cppcms.com/files/nowide/html/

With C++17 you can use std::filesystem::path to store file path in a portable way, but it is still broken on Windows:

  • Implicit constructor std::filesystem::path::path(const char *) uses user-specific locale on MSVC and there is no way to make it use UTF-8. Function std::filesystem::u8string should be used to construct path from UTF-8 string, but it is too easy to forget about this and use implicit constructor instead.
  • std::error_category::message(int) for both error categories returns error description using user-specific encoding.

So what we have on Windows is:

  • Standard library functions that open files are broken and should never be used.
  • Arguments passed to main(int, char**) are broken and should never be used.
  • WinAPI functions ending with *A and macros are broken and should never be used.
  • std::filesystem::path is partially broken and should never be used directly.
  • Error categories returned by std::generic_category and std::system_category are broken and should never be used.

If you need long term solution for a non-trivial project, I would recommend:

  • Using Boost.Nowide or implementing similar functionality directly - this fixes broken standard library.
  • Re-implementing standard error categories returned by std::generic_category and std::system_category so that they would always return UTF-8 encoded strings.
  • Wrapping std::filesystem::path so that new class would always use UTF-8 when converting path to string and string to path.
  • Wrapping all required functions from std::filesystem so that they would use your path wrapper and your error categories.

Unfortunately, this won't fix issues with other libraries that work with files, but many are broken anyway (do not support unicode).

You can check this link for further explanation: http://utf8everywhere.org/


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...