Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
552 views
in Technique[技术] by (71.8m points)

c# - Why is csc.exe crashing when I last left the output encoding as UTF8?

I am having am having or have run into a very strange thing.

I wonder if others have and why it's happening.

Having run a one line program with this line System.Console.WriteLine(System.Console.OutputEncoding.EncodingName); I see the Encoding is Western European (DOS)

Fine

Here is a list of some codepages 1200 Unicode and 65001 utf-8 and Windows-1252 Western European (Windows) and 850 Western European DOS from https://msdn.microsoft.com/en-us/library/system.text.encoding(v=vs.110).aspx

Say I write a C sharp program to change the encoding to utf-8

class sdf
{
  static void Main(string[] args)
{
System.Console.WriteLine(System.Console.OutputEncoding.EncodingName);
  System.Console.OutputEncoding=System.Text.Encoding.GetEncoding(65001);
System.Console.WriteLine(System.Console.OutputEncoding.EncodingName);
}
}

It works, it prints

Western European (DOS)
Unicode (UTF-8)

Now when I run csc again, csc crashes.

enter image description here

I checked my RAM for 14 hours, 8 passes, with memtest. I ran chkdsk my hard drive, all fine. And this is definitely not those, this is a coding issue. I know that because if I open up a new cmd prompt, then run csc, it doesn't crash.

So running that c sharp program, changes the shell such that the next time just running csc crashes csc itself, in that big way.

If I compile the code below, then run it, then run csc, then run csc, or csc whatever.cs, I get csc crashing.

So close the cmd prompt, Open a new one.

This time, experiment with comment and uncommenting the second line of the program

I find that if the second line (the line that changes the codepage to 850 (DOS Western Europe), is there, then it it won't crash the next time I run csc.

Whereas if I comment out that second line, so the program exits having the codepage/encoding changed to UTF-8 then then next time csc runs, csc crashes.

// uncomment the last line, and then // this runs but makes csc crash next time.

class asdf
{
  static void Main()
  {

     System.Console.OutputEncoding = System.Text.Encoding.UTF8; //output and to utf8
     System.Console.OutputEncoding=System.Text.Encoding.GetEncoding(850); 
  }
}

I am not the only person that has run into something like this

though no explanation was found there https://social.msdn.microsoft.com/Forums/vstudio/en-US/0e5f477e-0c32-4e88-acf7-d53d43d5b566/c-command-line-compiler-cscexe-immediately-crashes-when-run-in-code-page-65001-utf8?forum=csharpgeneral

I can deal with it by making sure the last line sets the codepage to 850. Though as i'll explain that's an inadequate solution..

Also i'd like to know if this is some problem with CSC that others have too. Or any other solutions.

added

uuu1.cs

// uuu1.cs
class asdf
{
static void Main()
{

System.Console.InputEncoding  = System.Text.Encoding.UTF8;
System.Console.OutputEncoding = System.Text.Encoding.UTF8;

// not unicode.  UTF8 means redirection will then work

System.Console.WriteLine("?");

// try redirecting too..

// and try  checking for csc crash or not
//System.Console.OutputEncoding=System.Text.Encoding.GetEncoding(850);
//System.Console.InputEncoding =System.Text.Encoding.GetEncoding(850);
//problem is that when that is commented, it breaks the redirection



}
}

Adding the line / uncomment the last lines so I have

System.Console.OutputEncoding=System.Text.Encoding.GetEncoding(850);

would stop the crash but is an inadequate solution, because for example.. If I want to redirect the output of a program to a file, then I need UTF8 all the way from beginning to end, otherwise it doesn't work

this works with the codepage 850 line uncommented

c:lah>uuu1>r.r<ENTER>  
c:lah>type r.r <ENTER>  
c:lah>?  

If I uncomment the last lines, thus changing the codepage to 850 then sure csc won't crash on the next run, but the redirection doesn't work and r.r doesn't contain that character.

Added 2

Han's answer makes me notice another way of triggering this error

C:Usersharveysomecs3>csc<ENTER>
Microsoft (R) Visual C# Compiler version 4.0.30319.18408
for Microsoft (R) .NET Framework 4.5
Copyright (C) Microsoft Corporation. All rights reserved.

warning CS2008: No source files specified
error CS1562: Outputs without source must have the /out option specified

C:Usersharveysomecs3>chcp  65001<ENTER>
Active code page: 65001

C:Usersharveysomecs3>csc<ENTER>  <-- CRASH

C:Usersharveysomecs3>
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Well, you found a bug in the way the C# compiler deals with having to output text to the console when it is switched to UTF-8. It has a self-diagnostic to ensure the conversion from an UTF-16 encoded string to the console output code page worked correctly, it slams the Big Red Button when it didn't. The stack trace looks like this:

csc.exe!OnCriticalInternalError()  + 0x4 bytes  
csc.exe!ConsoleOutput::WideToConsole()  + 0xdc51 bytes  
csc.exe!ConsoleOutput::print_internal()  + 0x2c bytes   
csc.exe!ConsoleOutput::print()  + 0x80 bytes    
csc.exe!ConsoleOutput::PrintString()  + 0xb5 bytes  
csc.exe!ConsoleOutput::PrintBanner()  + 0x50 bytes  
csc.exe!_main()  + 0x2d0eb bytes    

The actual code for WideToConsole() is not available, the closest match is this version from the SSCLI20 distribution:

/*
 * Like WideCharToMultiByte, but translates to the console code page. Returns length,
 * INCLUDING null terminator.
 */
int ConsoleOutput::WideCharToConsole(LPCWSTR wideStr, LPSTR lpBuffer, int nBufferMax)
{
    if (m_fUTF8Output) {
        if (nBufferMax == 0) {
            return UTF8LengthOfUnicode(wideStr, (int)wcslen(wideStr)) + 1; // +1 for nul terminator
        }
        else {
            int cchConverted = NULL_TERMINATED_MODE;
            return UnicodeToUTF8 (wideStr, &cchConverted, lpBuffer, nBufferMax);
        }

    }
    else {
        return WideCharToMultiByte(GetConsoleOutputCP(), 0, wideStr, -1, lpBuffer, nBufferMax, 0, 0);
    }
}

/*
 * Convert Unicode string to Console ANSI string allocated with VSAlloc
 */
HRESULT ConsoleOutput::WideToConsole(LPCWSTR wideStr, CAllocBuffer &buffer)
{
    int cch = WideCharToConsole(wideStr, NULL, 0);
    buffer.AllocCount(cch);
    if (0 == WideCharToConsole(wideStr, buffer.GetData(), cch)) {
        VSFAIL("How'd the string size change?");
        // We have to NULL terminate the output because WideCharToMultiByte didn't
        buffer.SetAt(0, '');
        return E_FAIL;
    }
    return S_OK;
}

The crash occurs somewhere around the VSFAIL() assert, judging from the machine code. I can see the return E_FAIL statement. It was however changed from the version I posted, the if() statement was modified and it looks like VSFAIL() was replaced by RETAILVERIFY(). Something broke when they made those changes, probably in UnicodeToUTF8() which is now named UTF16ToUTF8(). Re-emphasizing, the version I posted does not in fact crash, you can see for yourself by running C:WindowsMicrosoft.NETFrameworkv2.0.50727csc.exe. Only the v4 version of csc.exe has this bug.

The actual bug is hard to dig out from the machine code, best to let Microsoft worry about that. You can file the bug at connect.microsoft.com. I don't see a report that resembles it, fairly remarkable btw. The workaround for this bug is to use CHCP to change the codepage back.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...