Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
2.7k views
in Technique[技术] by (71.8m points)

json - Gson Unicode characters conversion to Unicode character codes

Check out my code below. I have a JSON string which contains Unicode character codes. I convert it to my Java object and then convert it back to JSON string. However, you can see that input and output JSON strings don't match. Is it possible to convert my object to original JSON string using Gson? I want outputJson to be the same as inputJson.

static class Book {
    String description;
}

public static void test() {
    Gson gson = new Gson();

    String inputJson = "{"description":"Tikrovi\u0161kai para\u0161ytas k\u016brinys"}";
    Book book = gson.fromJson(inputJson, Book.class);
    String outputJson = gson.toJson(book);

    System.out.println(inputJson);
    System.out.println(outputJson);
    // Prints:
    // {"description":"Tikroviu0161kai parau0161ytas ku016brinys"}
    // {"description":"Tikrovi?kai para?ytas kūrinys"}
}
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Unfortunately, Gson does not seem to support it. All JSON input/output is concentrated in Gson (as of 2.8.0) JsonReader and JsonWriter respectively. JsonReader can read Unicode escapes using its private readEscapeCharacter method. However, unlike JsonReader, JsonWriter simply writes a string to the backing Writer instance making no character corrections for characters above 127 except u2028 and ??u2029. The only thing, probably, you can do here is writing a custom escaping Writer so that you could emit Unicode escapes.

final class EscapedWriter
        extends Writer {

    private static final char[] hex = {
            '0', '1', '2', '3',
            '4', '5', '6', '7',
            '8', '9', 'a', 'b',
            'c', 'd', 'e', 'f'
    };

    private final Writer writer;

    // I/O components are usually implemented in not thread-safe manner
    // so we can save some time on constructing a single UTF-16 escape
    private final char[] escape = { '\', 'u', 0, 0, 0, 0 };

    EscapedWriter(final Writer writer) {
        this.writer = writer;
    }

    // This implementation is not very efficient and is open for enhancements:
    // * constructing a single "normalized" buffer character array so that it could be passed to the downstream writer
    //   rather than writing characters one by one
    // * etc...
    @Override
    public void write(final char[] buffer, final int offset, final int length)
            throws IOException {
        for ( int i = offset; i < length; i++ ) {
            final int ch = buffer[i];
            if ( ch < 128 ) {
                writer.write(ch);
            } else {
                escape[2] = hex[(ch & 0xF000) >> 12];
                escape[3] = hex[(ch & 0x0F00) >> 8];
                escape[4] = hex[(ch & 0x00F0) >> 4];
                escape[5] = hex[ch & 0x000F];
                writer.write(escape);
            }
        }
    }

    @Override
    public void flush()
            throws IOException {
        writer.flush();
    }

    @Override
    public void close()
            throws IOException {
        writer.close();
    }

    // Some java.io.Writer subclasses may use java.lang.Object.toString() to materialize their accumulated state by design
    // so it has to be overridden and forwarded as well
    @Override
    public String toString() {
        return writer.toString();
    }

}

This writer is NOT well-tested, and does not respect u2028 and u2029. And then just configure the output destination when invoking the toJson method:

final String input = "{"description":"Tikrovi\u0161kai para\u0161ytas k\u016brinys"}";
final Book book = gson.fromJson(input, Book.class);
final Writer output = new EscapedWriter(new StringWriter());
gson.toJson(book, output);
System.out.println(input);
System.out.println(output);

Output:

{"description":"Tikroviu0161kai parau0161ytas ku016brinys"}
{"description":"Tikroviu0161kai parau0161ytas ku016brinys"}

It's an interesting problem, and you might also raise an issue on google/gson to add a string writing configuration option - or at least to get some comments from the development team. I do believe they are very aware of such a behavior and made it work like that by design, however they could also shed some light on it (the only one I could think of now is that currently they have some more performance not making an additional transformation before writing a string, but it's a weak guess though).


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...