Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
299 views
in Technique[技术] by (71.8m points)

ruby - How to read a file in utf8 encoding and output in Windows 10?

What is proper procedure to read and output utf8 encoded data in Windows 10?

My attempt to read utf8 encoded file in Windows 10 and output lines into terminal does not reproduce symbols of some languages.

  • OS: Windows 10
  • Native codepage: 437
  • Switched codepage: 65001

In cmd window issued command chcp 65001. Following ruby code reads utf8 encoded file and outputs lines with puts.

fname = 'hello_world.dat'

File.open(fname,'r:UTF-8') do |f|
    puts f.read
end

hello_world.dat content

Afrikaans:    Hello Wêreld!
Albanian:     P?rshendetje Bot?!
Amharic:      ??? ???!
Arabic:       ????? ???????!
Armenian:     ????? ??????!
Basque:       Kaixo Mundua!
Belarussian:  Прыв?танне Сусвет!
Bengali:      ??? ?????!
Bulgarian:    Здравей свят!
Catalan:      Hola món!
Chichewa:     Moni Dziko Lapansi!
Chinese:      你好世界!
Croatian:     Pozdrav svijete!
Czech:        Ahoj světe!
Danish:       Hej Verden!
Dutch:        Hallo Wereld!
English:      Hello World!
Estonian:     Tere maailm!
Finnish:      Hei maailma!
French:       Bonjour monde!
Frisian:      Hallo wrald!
Georgian:     ????????? ???????!
German:       Hallo Welt!
Greek:        Γει? σου Κ?σμε!
Hausa:        Sannu Duniya!
Hebrew:       ???? ????!
Hindi:        ?????? ??????!
Hungarian:    Helló Világ!
Icelandic:    Halló heimur!
Igbo:         Ndewo ?wa!
Indonesian:   Halo Dunia!
Italian:      Ciao mondo!
Japanese:     こんにちは世界!
Kazakh:       С?лем ?лем!
Khmer:        ??????????????!
Kyrgyz:       Салам д?йн?!
Lao:          ?????????????????!
Latvian:      Sveika pasaule!
Lithuanian:   Labas pasauli!
Luxemburgish: Moien Welt!
Macedonian:   Здраво свету!
Malay:        Hai dunia!
Malayalam:    ???? ?????!
Mongolian:    Сайн уу дэлхий!
Myanmar:      ??????????????????!
Nepali:       ??????? ?????!
Norwegian:    Hei Verden!
Pashto:       ???? ???!
Persian:      ???? ????!
Polish:       Witaj ?wiecie!
Portuguese:   Olá Mundo!
Punjabi:      ??? ???? ???? ?????!
Romanian:     Salut Lume!
Russian:      Привет мир!
Scots Gaelic: Hàlo a Shaoghail!
Serbian:      Здраво Свете!
Sesotho:      Lefat?e Lumela!
Sinhala:      ???? ???????!
Slovenian:    Pozdravljen svet!
Spanish:      ?Hola Mundo!
Sundanese:    Halo Dunya!
Swahili:      Salamu Dunia!
Swedish:      Hej v?rlden!
Tajik:        Салом ?а?он!
Thai:         ????????????!
Turkish:      Selam Dünya!
Ukrainian:    Прив?т Св?т!
Uzbek:        Salom Dunyo!
Vietnamese:   Chào th? gi?i!
Welsh:        Helo Byd!
Xhosa:        Molo Lizwe!
Yiddish:      ???? ?????!
Yoruba:       Mo ki O Ile Aiye!
Zulu:         Sawubona Mhlaba!

Steven Penny suggested to use PowerShell and do not change code page. Following picture demonstrates that the issue persists.

Windows Terminal installer (which is not a part of Windows distribution) solves utf8 output issue, please see included screen capture.

image_1 image_2 Power Shell Windows Terminal

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

The problem is, you are using a some methods and tools that are really old. First:

  • Native codepage: 437
  • Switched codepage: 65001

You don't need to mess with the codepage any more, just leave it as the default. Also, from you picture I see you are also using Console Host, which is also really old. Windows Terminal [1] has been available since 2019, and has built in UTF-8 support. Using Windows Terminal, I can run your script, even without specifying UTF-8:

fname = 'hello_world.dat'

File.open(fname,'r') do |f|
   puts f.read
end

and I get perfect result:

To use Windows Terminal, download the msixbundle file [2], then install it. Or, as it's essentially just a Zip file, you can rename it to file.zip and extract it with Windows, then run WindowsTerminal.exe. Or, since you are really having trouble with this process, you can use a portable version I just created [3] (at your own risk).

  1. https://github.com/microsoft/terminal
  2. https://github.com/microsoft/terminal/releases/tag/v1.8.1444.0
  3. https://github.com/microsoft/terminal/files/6563899/CascadiaPackage_1.8.1444.0_x64.zip

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...