Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
202 views
in Technique[技术] by (71.8m points)

c++ - Why doesn't printf format unicode parameters?

When using printf to format a double-byte string into a single-byte string:

printf("%ls
", L"s:\яшертыHello");   // %ls for a wide string (%s varies meaning depending on the project's unicode settings).

Clearly, some characters can't be represented as ascii characters, so sometimes I have seen behaviour where double-byte characters get turned into a '?' mark character. But, this seems to depend on the particular characters. For the printf above, the output is:

s:

I was hoping I might get something like:

s:??????Hello

I'm afraid I've lost the example, but I think for one string when it encountered unicode characters, replaced the first one with a '?' and then gave up on the rest.

So, my question is, what's supposed to happen when you format a wide string into a single-byte string. Documentation here: http://msdn.microsoft.com/en-us/library/hf4y5e3w.aspx says "Characters are displayed up to the first null character". But, I'm not seeing that. Is this a bug in printf, or is the behaviour I'm seeing documented somewhere, if so, where.

Thanks for your help.

UPDATE

Thanks for the answers from people giving me alternatives to using printf. I am going to change to an alternative, but I'm really interested out of curiosity why does printf not have reliable documented behaviour. It appears almost as if the implementer of it went out of their way to make this not work.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

I expect your code to work -- and it works here on Linux -- but it is locale dependent. That means you have to set up the locale and your locale must support the character set used. Here is my test program:

#include <locale.h>
#include <stdio.h>

int main()
{
    int c;
    char* l = setlocale(LC_ALL, "");
    if (l == NULL) {
        printf("Locale not set
");
    } else {
        printf("Locale set to %s
", l);
    }
    printf("%ls
", L"s:\яшертыHello");
    return 0;
}

and here is an execution trace:

$ env LC_ALL=en_US.utf8 ./a.out
Locale set to en_US.utf8
s:яшертыHello

If it says that the locale isn't set or is set to "C", it is normal that you don't get the result you expect.

Edit: see the answers to this question for the equivalent of en_US.utf8 for Windows.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...