serialization - Zero-garbage large String deserialization in Java, Humongous object issue

I am looking for a way to deserialize a String from a byte[] in Java with as little garbage produced as possible. Because I am creating my own serializer and de-serializer, I have complete freedom to implement any solution on the server-side (i.e. when serializing data), and on the client-side (i.e. when de-serializing data).

I have managed to efficiently serialize a String without incurring any garbage overhead by iterating through the String's chars (String.charAt(i)) and converting each char (16-bit value) to 2x 8-bit value. There is a nice debate regarding this here. An alternative is to use Reflection to access String's underlying char[] directly, but this in outside the scope of the problem.

However, it seems impossible for me to deserialize the byte[] without creating the char[] twice, which seems, well, weird.

The procedure:

Create char[]
Iterate through byte[] and fill-in the char[]
Create String with String(char[]) constructor

Because of Java's String immutability rules, the constructor copies the char[], creating 2x GC overhead. I can always use mechanisms to circumvent this (Unsafe String allocation + Reflection to set the char[] instance), but I just wanted to ask if there are any consequences to this other than me breaking every convention on String's immutability.

Of course, the wisest response to this would be "come on, stop doing this and have trust in GC, the original char[] will be extremely short-lived and G1 will get rid of it momentarily", which actually makes sense, if the char[] is smaller than 1/2 of the G1's region size. If it is larger, the char[] will be directly allocated as a humongous object (i.e. automatically propagated outside of the G1's region). Such objects are extremely hard to be efficiently garbage collected in G1. That's why each allocation matters.

Any ideas on how to tackle the issue?

Many thanks.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

replyed Oct 24, 2021 by 深蓝 (71.8m points)

Such objects are extremely hard to be efficiently garbage collected in G1.

This may not be true any longer, but you will have to evaluate it for your own application. JDK Bugs 8027959 and 8048179 introduce new mechanisms for collecting humongous, short-lived objects. According to the bug flags you might have to run with jdk versions ≥8u40 and ≥8u60 to reap their respective benefits.

Experimental option of interest:

-XX:+G1ReclaimDeadHumongousObjectsAtYoungGC

Tracing:

-XX:+G1TraceReclaimDeadHumongousObjectsAtYoungGC

For further advice and questions regarding those features I would recommend hitting the hotspot-gc-use mailing list.

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

Categories

serialization - Zero-garbage large String deserialization in Java, Humongous object issue

serialization - Zero-garbage large String deserialization in Java, Humongous object issue

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags