import java.io.PrintStream;
class Kyrill {
public static void main(String args[])
throws java.io.UnsupportedEncodingException
{
String ru = "Русский язык";
PrintStream ps = new PrintStream(System.out, true, "UTF-8");
System.out.println(ru.length());
System.out.println(ru);
ps.println(ru);
}
}
D:Temp :: chcp 65001
Aktive Codepage: 65001.
D:Temp :: javac -encoding utf-8 Kyrill.java && java Kyrill
12
??????? ????
Русский языкй язык
Note that you might see some trailing junk in the output (I do) but if you redirect the output to a file you'll see that this is just a display artefact.
So you can make it work by using a PrintStream. The System.out uses the platform encoding (cp1252 for me), and that doesn't have cyrillic characters.
Additional note for you to grok the encoding business:
D:Temp :: chcp 1251
Aktive Codepage: 1251.
:: This is another codepage (8 bits only) that maps bytes to cyrillic characters.
:: Edit the source file to have:
:: PrintStream ps = new PrintStream(System.out, true, "Windows-1251");
:: We intend to match the console output; else we won't get the expected result.
D:Temp :: javac -encoding utf-8 Kyrill.java && java Kyrill
12
??????? ????
Русский язык
So you can see that contrary to what some people believe, the Windows console does grok Unicode in the casual sense that it can print Greek and Russian.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…