Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.0k views
in Technique[技术] by (71.8m points)

.net - C# Encoding.Converting Latin to Hebrew

I'm trying to fetch and parse an online excel document which is written in hebrew but unfortunately in a non-hebrew encoding.

As an example I'm trying to convert the following string: "aìé??_1", which serves as the 1st sheet name to hebrew using C# code, but I'm unable to do so.

I know the above is convertible, since when I open it up in NotePad++ and select Encoding/Character Sets/Hebrew/Windows 1255, I can see: "?????_1" which is the correct hebrew representation of the above string.

I'm using the below code

            string str = "aìé??_1";

            Encoding windows = Encoding.GetEncoding("Windows-1255");
            Encoding ascii = Encoding.GetEncoding("Windows-1252");
            byte[] asciiBytes = ascii.GetBytes(str);
            byte[] windowsBytes = Encoding.Convert(ascii, windows, asciiBytes);

            char[] windowsChars = new char[windows.GetCharCount(windowsBytes, 0, windowsBytes.Length)];
            windows.GetChars(windowsBytes, 0, windowsBytes.Length, windowsChars, 0);
            string windowsString = new string(windowsChars);

I assumed that the encoding of the origin string is Windows-1252 since when I paste it in NotePad++ and change the encoding to Windows-1252 the string remains the same...

I'm probably doing something wrong here, anyone know how to convert the above correctly?

Thanks,

Mikey

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)
const string Str = "aìé??_1";

Encoding latinEncoding = Encoding.GetEncoding("Windows-1252");
Encoding hebrewEncoding = Encoding.GetEncoding("Windows-1255");

byte[] latinBytes = latinEncoding.GetBytes(Str);

string hebrewString = hebrewEncoding.GetString(latinBytes);

hebrewString:

?????_1

In your supplied example "Window-1252" is not actualy ASCII, it is extended ASCII, and for some reason Encoding.Convert with these two encodings cannot convert extended range ASCII, so all +127 characters are converted as 63 (i.e. ?). When "converting" from one extended ASCII character byte[] to another, I would expect the bytes to be the same, it is only when you convert them to a .Net unicode string I would expect them to be different. Not sure why Convert is converting +127 chars to '?'.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...