I am afraid the patching code from a wrong encoding to a correct encoding cannot guarantee all characters survive the process. The main principle in java is that String always holds Unicode text; so there always is a conversion to bytes representing text in some encoding.
response = new String(response.getBytes(), "UTF-8");
This is wrong. getBytes()
without charset uses the default charset from the platform which runs the current application. So it has a different effect on your development Windows PC and the production Linux server. Any effect is totally misleading.
response = Html.fromHtml(response).toString();
This encodes HTML entities. In a request a sign then the <form>
is missing an accept-encoding="UTF-8"
. Part of the request headers. Then the browser sends non-Latin as HTML entities.
Here it might be a communication failure between layers, where the request part is missing a UTF-8 accepting header.
response = fixEncodingUnicode(response);
or str = new String(response.getBytes("windows-1254"), "UTF-8");
Unneeded as String in java already is in Unicode. It would introduce a diamond whenever a Unicode symbol was not translatable in Windows-1254.
So all seems wrong. The error seems to be made earlier on.
Correct the requests, as otherwise a correct request might give wrong results. Go for UTF-8 rather than Windows-1254.
You can dump, log the bytes if the input parameter response
, with something like:
Arrays.toString(response.codePoints().toArray())
(A hexadecimal format would be more readable.)
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…