c++ - std :: wstring VS std :: string(std::wstring VS std::string)

Question

Welcome To Ask or Share your Answers For Others

c++ - std :: wstring VS std :: string(std::wstring VS std::string)

posted Feb 21, 2021 in Technique[技术] by 深蓝 (71.8m points)

c++ - std :: wstring VS std :: string(std::wstring VS std::string)

I am not able to understand the differences between std::string and std::wstring . (我无法理解std::string和std::wstring之间的区别。) I know wstring supports wide characters such as Unicode characters. (我知道wstring支持宽字符，例如Unicode字符。) I have got the following questions: (我有以下问题：)

When should I use std::wstring over std::string ? (什么时候应该在std::string使用std::wstring ？)
Can std::string hold the entire ASCII character set, including the special characters? (std::string容纳整个ASCII字符集，包括特殊字符吗？)
Is std::wstring supported by all popular C++ compilers? (所有流行的C ++编译器都支持std::wstring吗？)
What is exactly a " wide character "? (什么是“ 宽字符 ”？)

ask by translate from so

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-02-20T21:31:29+0000

`string` ? (`string` ？) `wstring` ? (`wstring` ？)

std::string is a basic_string templated on a char , and std::wstring on a wchar_t . (std::string是在char上模板化的basic_string ，而std::wstring在wchar_t上模板化。)

`char` vs. `wchar_t` (`char` vs. `wchar_t`)

char is supposed to hold a character, usually an 8-bit character. (char应该包含一个字符，通常是8位字符。)
wchar_t is supposed to hold a wide character, and then, things get tricky: (wchar_t应该具有宽字符，然后，事情变得棘手：)
On Linux, a wchar_t is 4 bytes, while on Windows, it's 2 bytes. (在Linux上， wchar_t是4个字节，而在Windows上，它是2个字节。)

What about Unicode , then? (那么Unicode呢？)

The problem is that neither char nor wchar_t is directly tied to unicode. (问题是char和wchar_t都没有直接绑定到unicode。)

On Linux? (在Linux上？)

Let's take a Linux OS: My Ubuntu system is already unicode aware. (让我们以Linux操作系统为例：我的Ubuntu系统已经支持Unicode。) When I work with a char string, it is natively encoded in UTF-8 (ie Unicode string of chars). (当我使用char字符串时，它以UTF-8 （即char的Unicode字符串）本地编码。) The following code: (如下代码：)

#include <cstring>
#include <iostream>

int main(int argc, char* argv[])
{
   const char text[] = "olé" ;


   std::cout << "sizeof(char)    : " << sizeof(char) << std::endl ;
   std::cout << "text            : " << text << std::endl ;
   std::cout << "sizeof(text)    : " << sizeof(text) << std::endl ;
   std::cout << "strlen(text)    : " << strlen(text) << std::endl ;

   std::cout << "text(ordinals)  :" ;

   for(size_t i = 0, iMax = strlen(text); i < iMax; ++i)
   {
      std::cout << " " << static_cast<unsigned int>(
                              static_cast<unsigned char>(text[i])
                          );
   }

   std::cout << std::endl << std::endl ;

   // - - - 

   const wchar_t wtext[] = L"olé" ;

   std::cout << "sizeof(wchar_t) : " << sizeof(wchar_t) << std::endl ;
   //std::cout << "wtext           : " << wtext << std::endl ; <- error
   std::cout << "wtext           : UNABLE TO CONVERT NATIVELY." << std::endl ;
   std::wcout << L"wtext           : " << wtext << std::endl;

   std::cout << "sizeof(wtext)   : " << sizeof(wtext) << std::endl ;
   std::cout << "wcslen(wtext)   : " << wcslen(wtext) << std::endl ;

   std::cout << "wtext(ordinals) :" ;

   for(size_t i = 0, iMax = wcslen(wtext); i < iMax; ++i)
   {
      std::cout << " " << static_cast<unsigned int>(
                              static_cast<unsigned short>(wtext[i])
                              );
   }

   std::cout << std::endl << std::endl ;

   return 0;
}

outputs the following text: (输出以下文本：)

sizeof(char)    : 1
text            : olé
sizeof(text)    : 5
strlen(text)    : 4
text(ordinals)  : 111 108 195 169

sizeof(wchar_t) : 4
wtext           : UNABLE TO CONVERT NATIVELY.
wtext           : ol?
sizeof(wtext)   : 16
wcslen(wtext)   : 3
wtext(ordinals) : 111 108 233

You'll see the "olé" text in char is really constructed by four chars: 110, 108, 195 and 169 (not counting the trailing zero). (您会看到char的“olé”文本实际上是由四个字符构成的：110、108、195和169（不计算结尾的零）。) (I'll let you study the wchar_t code as an exercise) (（我将让您学习wchar_t代码作为练习）)

So, when working with a char on Linux, you should usually end up using Unicode without even knowing it. (因此，在Linux上使用char时，通常通常甚至在不知道的情况下最终使用Unicode。) And as std::string works with char , so std::string is already unicode-ready. (并且std::string与char ，因此std::string已经可以使用Unicode了。)

Note that std::string , like the C string API, will consider the "olé" string to have 4 characters, not three. (请注意，与C字符串API一样， std::string将认为“olé”字符串具有4个字符，而不是3个字符。) So you should be cautious when truncating/playing with unicode chars because some combination of chars is forbidden in UTF-8. (因此，在截断/播放unicode字符时，请务必谨慎，因为UTF-8中禁止使用某些字符组合。)

On Windows? (在Windows上？)

On Windows, this is a bit different. (在Windows上，这有点不同。) Win32 had to support a lot of application working with char and on different charsets / codepages produced in all the world, before the advent of Unicode. (在Unicode出现之前，Win32必须支持许多与char一起使用的应用程序，并支持世界各地生产的不同字符集 / 代码页。)

So their solution was an interesting one: If an application works with char , then the char strings are encoded/printed/shown on GUI labels using the local charset/codepage on the machine. (因此，他们的解决方案是一个有趣的解决方案：如果应用程序使用char ，则使用计算机上的本地charset / codepage将char字符串编码/打印/显示在GUI标签上。) For example, "olé" would be "olé" in a French-localized Windows, but would be something different on an cyrillic-localized Windows ("olй" if you use Windows-1251 ). (例如，在法语本地化的Windows中，“olé”将是“olé”，但是在西里尔语本地化的Windows中，“olé”将有所不同（如果使用Windows-1251，则为“olй”）。) Thus, "historical apps" will usually still work the same old way. (因此，“历史应用程序”通常仍将以相同的旧方式工作。)

For Unicode based applications, Windows uses wchar_t , which is 2-bytes wide, and is encoded in UTF-16 , which is Unicode encoded on 2-bytes characters (or at the very least, the mostly compatible UCS-2, which is almost the same thing IIRC). (对于基于Unicode的应用程序，Windows使用wchar_t ，它是2字节宽，并以UTF-16编码， UTF-16是2字节字符的Unicode编码（或者至少是最兼容的UCS-2，这几乎是IIRC一样）。)

Applications using char are said "multibyte" (because each glyph is composed of one or more char s), while applications using wchar_t are said "widechar" (because each glyph is composed of one or two wchar_t . See MultiByteToWideChar and WideCharToMultiByte Win32 conversion API for more info. (使用char应用程序称为“多字节”（因为每个字形由一个或多个char组成），而使用wchar_t应用程序称为“ widechar”（因为每个字形由一个或两个wchar_t 。请参见MultiByteToWideChar和WideCharToMultiByte Win32转换API有关更多信息。)

Thus, if you work on Windows, you badly want to use wchar_t (unless you use a framework hiding that, like GTK+ or QT ...). (因此，如果您在Windows上工作，则非常想使用wchar_t （除非您使用隐藏该框架的框架，例如GTK +或QT ...）。) The fact is that behind the scenes, Windows works with wchar_t strings, so even historical applications will have their char strings converted in wchar_t when using API like SetWindowText() (low level API function to set the label on a Win32 GUI). (事实是，在幕后，Windows使用了wchar_t字符串，因此，即使历史应用程序在使用SetWindowText()类的API SetWindowText()在Win32 GUI上设置标签的低级API函数SetWindowText()时，也将在wchar_t转换其char字符串。)

Memory issues? (内存问题？)

UTF-32 is 4 bytes per characters, so there is no much to add, if only that a UTF-8 text and UTF-16 text will always use less or the same amount of memory than an UTF-32 text (and usually less). (UTF-32是每个字符4个字节，因此，只要UTF-8文本和UTF-16文本将始终比UTF-32文本使用更少或相同的内存量（通常更少），就没有太多要添加的内容了。）。)

If there is a memory issue, then you should know than for most western languages, UTF-8 text will use less memory than the same UTF-16 one. (如果存在内存问题，那么您应该比大多数西方语言都知道，UTF-8文本将比相同的UTF-16使用更少的内存。)

Still, for o

Categories

c++ - std :: wstring VS std :: string(std::wstring VS std::string)

c++ - std :: wstring VS std :: string(std::wstring VS std::string)

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

`string` ? (`string` ？) `wstring` ? (`wstring` ？)

`char` vs. `wchar_t` (`char` vs. `wchar_t`)

What about Unicode , then? (那么Unicode呢？)

On Linux? (在Linux上？)

On Windows? (在Windows上？)

Memory issues? (内存问题？)

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags

Categories

c++ - std :: wstring VS std :: string(std::wstring VS std::string)

c++ - std :: wstring VS std :: string(std::wstring VS std::string)

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

string ? (string ？) wstring ? (wstring ？)

char vs. wchar_t (char vs. wchar_t)

What about Unicode , then? (那么Unicode呢？)

On Linux? (在Linux上？)

On Windows? (在Windows上？)

Memory issues? (内存问题？)

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags

`string` ? (`string` ？) `wstring` ? (`wstring` ？)

`char` vs. `wchar_t` (`char` vs. `wchar_t`)