Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
201 views
in Technique[技术] by (71.8m points)

How to find out the Encoding of a File? C#

Well i need to find out which of the files i found in some directory is UTF8 Encoded either ANSI encoded to change the Encoding in something else i decide later. My problem is.. how can i find out if a file is UTF8 or ANSI Encoded? Both of the encodings are actually posible in my files.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

There is no reliable way to do it (since the file might be just random binary), however the process done by Windows Notepad software is detailed in Micheal S Kaplan's blog:

http://www.siao2.com/2007/04/22/2239345.aspx

  1. Check the first two bytes; 1. If there is a UTF-16 LE BOM, then treat it (and load it) as a "Unicode" file; 2. If there is a UTF-16 BE BOM, then treat it (and load it) as a "Unicode (Big Endian)" file; 3. If the first two bytes look like the start of a UTF-8 BOM, then check the next byte and if we have a UTF-8 BOM, then treat it (and load it) as a "UTF-8" file;
  2. Check with IsTextUnicode to see if that function think it is BOM-less UTF-16 LE, if so, then treat it (and load it) as a "Unicode" file;
  3. Check to see if it UTF-8 using the original RFC 2279 definition from 1998 and if it then treat it (and load it) as a "UTF-8" file;
  4. Assume an ANSI file using the default system code page of the machine.

Now note that there are some holes here, like the fact that step 2 does not do quite as good with BOM-less UTF-16 BE (there may even be a bug here, I'm not sure -- if so it's a bug in Notepad beyond any bug in IsTextUnicode).


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...