utf 8 - Java convert Windows-1252 to UTF-8, some letters are wrong

Question

Welcome To Ask or Share your Answers For Others

utf 8 - Java convert Windows-1252 to UTF-8, some letters are wrong

posted Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

utf 8 - Java convert Windows-1252 to UTF-8, some letters are wrong

I receive data from a external Microsoft SQL 2008 database (I make queries with MyBatis). The data is encoded as "Windows-1252".

I have tried to re-encode to UTF-8:

String textoFormado = ...value from MyBatis... ; 
String s = new String(textoFormado.getBytes("Windows-1252"), "UTF-8");

Almost the whole string is correctly decoded, but some letters with accents are not.

For example:

I received this: ??vila
The code above makes: ??vila
I expected: ávila

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-23T19:39:44+0000

Obviously, textoFormado is a variable of type String. This means that the bytes were already decoded. Java then internally uses a 16-bit Unicode representation. What you did, is to encode your string with Windows-1252 followed by reading the resulting bytes with an UTF-8 encoding. That does not work.

What you need is the correct encoding when reading the bytes:

byte[] sourceBytes = getRawBytes();
String data = new String(sourceBytes , "Windows-1252");

For using this string inside your program, you do not need to do anything. Simply use it. If - however - you want to write the data back to a file for example, you need to encode again:

byte[] destinationBytes = data.getBytes("UTF-8");
// write bytes to destination file here

Categories

utf 8 - Java convert Windows-1252 to UTF-8, some letters are wrong

utf 8 - Java convert Windows-1252 to UTF-8, some letters are wrong

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags