Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
233 views
in Technique[技术] by (71.8m points)

c++ - double byte character sequence conversion issue in Visual Studio 2015

I am trying to convert double byte character sequence (DBCS) in CP936 to wchar_t using C++ locale. This is the code:

#include <iostream>
#include <locale>
#include <codecvt>

// 国 in CP936
char const src[] = "xB9xFA";

int main()
{
    std::locale loc(".936");
    typedef std::codecvt<wchar_t, char, std::mbstate_t> codecvt_type;
    codecvt_type const & cvt = std::use_facet<codecvt_type>(loc);

    std::mbstate_t state;
    std::memset(&state, 0, sizeof(state));

    char const * src_mid = src;
    wchar_t buf[10];
    wchar_t * buf_mid = buf;

    std::codecvt_base::result res = cvt.in(state,
        src, src + 2, src_mid,
        buf, buf + 10, buf_mid);
    int eno = errno;
    std::cout << "res: " << +res << "
"
        << "errno: " << eno << "
";

    return 0;
}

Now, the conversion always ends with error and errno set to 42, which is EILSEQ. I have debugged the code and I think I can see what goes wrong but I do not understand why.

What goes wrong is that the code that ultimately leads to call to MultiByteToWideChar(), has a conditional like this:

if ( ploc->_Isleadbyte[ch >> 3] & (1 << (ch & 7)) )

This branch is never taken, despite the fact that the source string AFAIK contains correct lead byte and trailing byte. I have checked the _Isleadbyte array in debugger and it is all zeroes. So this branch which sets the input length to 2 is never taken and instead the one where length is set to 1 is taken and thus the MultiByteToWideChar() fails because lead byte has to be accompanied by trailing byte.

I have even checked that C_936.NLS is present in C:WindowsSystem32, so that should not be the problem.

So, I guess the question is: Is this issue on my end, with the test code, with Windows OS setup, missing components? Or is this issue in the Visual Studio 2015 code?

UPDATE

So I have incidentally stumbled upon this question:Shift-JIS decoding fails using wifstrem in Visual C++ 2013

The OPs own answer shows a workaround:

const int oldMbcp = _getmbcp();
_setmbcp(932);
const std::locale locale("Japanese_Japan.932");
_setmbcp(oldMbcp);

The same workaround seems to work for the CP936 that I am trying to use.

UPDATE 2

I filed a bug report with Microsoft.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)
Waitting for answers

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

1.4m articles

1.4m replys

5 comments

57.0k users

...