And what we might do about it…
Brian Goetz and I gave this presentation in Antwerp, Belgium on November 7, 2019. The original title of this talk (as posted on the conference program) was “Why We Hate Java Serialization And What We’re Doing About It.” We made a slight adjustment to the title just before the presentation.
I initially proposed this talk to Brian because I felt we needed to correct the record about Java serialization. It’s very easy to criticize Java serialization in retrospect. We hear a lot of comments like “Just get rid of it!” but in fact serialization was introduced because it solves — and continues to solve — a very important problem. Like many complex systems, it has flaws, not because its designers were stupid, but because of typical software project difficulties: disagreements over the fundamental goals, being designed and implemented in a hurry, and a healthy dose of corporate politics.
We wanted to document very precisely where we think Java serialization’s flaws are: at the binding to the object model. In addition, Brian and the Java team had been thinking a lot about what the future of serialization would be, and we wanted to present that as well. Those ideas are described in more detail on Towards Better Serialization (June, 2019).
(This is a backdated article, posted on April 2, 2021. I’ve represented to the best of my ability my perspective at the time the presentation was given in November, 2019.)
Hi Stuart, one of the great things that just work with the existing serialization libraries is serialization of not just data but also of anonymous inner classes and lambdas. This allows frameworks like Apache Spark to serialize the computation graph (expressed as a series of lambda transforms) to its worker nodes and the worker nodes are able to unload that class and execute the logic. The benefit here of course is lower latency and efficiency because the code is moved closer to the data rather than the other way around. Will this work in the new serialization technique?
I don’t think we’ve gotten quite far enough with “new serialization” to understand its impact on serialization of lambdas and anonymous inner classes (AICs). I think it’s important to understand, though, that serializing them does not actually move code around: the code must already exist in the JVM where the lambda or AIC is deserialized. What does get deserialized are things like captured values and (in the case of AICs) the enclosing instance.
For lambdas, since serialization support was explicitly designed (see j.l.invoke.SerializedLambda) it seems possible to migrate or adapt that to “new serialization” somehow. But for AICs, there is the magic in the old serialization and magic (hidden fields, hidden constructor args, undefined naming) in AICs. Old serialization only worked for AICs under certain circumstances where the different magics were compatible. It might be difficult to migrate all of that to new serialization.