This article is from: Le Byte

This article mainly explains: Java commonly used serialization framework

For more information related to JAVA can pay attention to the public number “Le Byte” send: 999

I. Background Introduction

Serialization and deserialization are common techniques used in daily data persistence and network transmission, but various serialization frameworks are confusing at present. It is not clear which one is used in which scenario. In this paper, open source serialization frameworks in the industry will be compared and tested from five aspects of generality, ease of use, extensibility, performance and data type and Java syntax support.

The following tests compare JDK Serializable, FST, Kryo, Protobuf, Thrift, Hession, and Avro.

Binary framework

1 JDK Serializable

Java.io.Serializable or Java.io.Externalizable interface, you can use the Serializable mechanism that comes with Java. Implementing the serialization interface simply means that the class can be serialized/deserialized. We also need to serialize and deserialize objects with ObjectInputStream and ObjectOutputStream of I/O operations.

generality

Due to the Java built-in serialization framework, cross-language serialization and deserialization is not supported by itself.

Ease of use

As Java’s built-in serialization framework, unordered reference to any external dependency completes the serialization task. However, the JDK Serializable is much more difficult to use than the open source framework. As you can see, the above codec use is very rigid and requires the ByteArrayOutputStream and ByteArrayInputStream to complete the byte conversion.

scalability

JDK version by serialVersionUID control class Serializable, if the serialization and deserialization versions, will throw Java. IO. InvalidClassException abnormal information, prompt the serialization and deserialization SUID inconsistencies.

performance

JDK Serializable is the Java serialization framework that comes with it, but it’s nothing like its own in terms of performance. The following test case is one of the test entities we will use throughout this article. We serialized the test case 10 million times and then calculated the sum of The Times:

Again, we’ll compare this with other serialization frameworks later.

Data types and syntactic structure support

Since the JDK Serializable is a Java syntax native serialization framework, Java data types and syntax are almost always supported.

WeakHashMap does not implement the Serializable interface.

2 FST serialization framework

FST(Fast-Serialization) is a Java serialization framework that is fully compatible with the JDK serialization protocol. It can reach 10 times the speed of the JDK serialization, and the serialization result is only 1/3 of the JDK. The current version of FST is 2.56, with Android support after 2.17.

generality

FST is also a serialization framework developed for Java, so there is no cross-language feature.

Ease of use

In terms of ease of use, FST can be said to be several blocks away from the JDK Serializable, the syntax is extremely concise, and FSTConfiguration encapsulates most of the methods.

scalability

FST enables new fields to be compatible with older data streams through @Version annotations. All newly added fields need to be identified with an @Version annotation. No Version annotation means Version 0.

Note:

Overall, FST supports scalability, but is cumbersome to use.

performance

The above test case was serialized using FST to a size of: 172, which is nearly a third less than the JDK’s serialized size of 432. Now let’s look at the time overhead of serialization versus deserialization.

Data types and syntactic structure support

FST was developed based on the JDK serialization framework, so it is consistent with Java support in terms of data types and syntax.

3. Kryo serialization framework

Kryo is a fast and effective Java binary serialization framework, which relies on the underlying ASM library for bytecode generation, so it has a relatively good runtime speed. The goal of Kryo is to provide a serialization framework with fast serialization, small result size, and easy-to-use APIs. Kryo supports automatic deep/shallow copying, which is a process that goes directly through the object-> object’s deep copy, rather than the object-> byte-> object.

generality

Firstly, Kryo’s official website says that it is a Java binary serialization framework. Secondly, I searched the Internet and did not find the cross-language use of Kryo. Some articles mentioned that cross-language use is very complex, but I did not find the relevant implementation of other languages.

Ease of use

In terms of usage, the API provided by Kryo is also very simple and easy to use. Input and Output encapsulate almost any stream operation you can think of. Kryo provides a wealth of flexible configurations, such as custom serializers, setting default serializers, and so on, that are quite laborious to use.

scalability

Kryo default serializer FiledSerializer does not support field extensions. If you want to use the extension serializer, you need to configure another default serializer.

performance

Using Kryo to test the above test case, the size of bytes serialized by Kryo is 172, which is the same as the unoptimized size of FST. The time cost is as follows:

We also turn off the circular reference configuration and pre-registration of the serialized class, and the size of the serialized class is 120 bytes, because at this point the class serialized is identified by the number, not the full class name. The time used is as follows:

Data types and syntactic structure support

Kryo’s basic requirement for a serialized class is to include a no-argument constructor, since the object is created using the no-argument constructor during deserialization.

4 Protocol buffer

Protocol Buffer is a language-neutral, platform-independent, and extensible serialization framework. The Protocol Buffer, in contrast to the previous serialization frameworks, requires a predefined Schema.

generality

ProtoBuf was originally designed to be able to design a language-independent serialization framework. It currently supports Java, Python, C++, Go, C#, and many other languages with third-party packages. So in terms of versatility, ProtoBuf is very powerful.

Ease of use

PROTOBUF requires IDL to define the Schema description file. Once the description file is defined, you can use PROTOC directly to generate the serialization and deserialization code. So, in use only need to write a simple description file, you can use ProtoBuf.

scalability

Extensibility is also one of ProtoBuf’s original design goals, and can be easily modified in.proto files. New fields: For new fields, we must make sure that the new field has a corresponding default value so that it can interact with the old code. The corresponding new protocol generates messages that can be parsed by the old protocol. Delete field: delete field needs to note that the corresponding field, label can not be used in the subsequent update. To avoid mistakes, it can be avoided by reserved.

ProtoBuf is also very data friendly, int32, unit32, int64, unit64, and bool are fully compatible, so we can change the type as needed. As seen above, ProtoBuf does a lot of extensibility and supports protocol extensions in a friendly way.

performance

We also used the above example for a performance test. The size of the serialized bytes using ProtoBuf is 192. Here is the corresponding time overhead.

As you can see, ProtoBuf’s deserialization performance is worse than that of FST and Kryo.

Data type and syntax structure support

ProtoBuf uses IDL to define Schema so it does not support defining Java methods. The following test serializes variables:

Note: List, Set, and Queue are tested by protobuf. Repeated lists can be used by any class that implements the Iterable interface.

5 Thrift serialization framework

Thrift is an efficient, multilingual Remote service invocation framework, known as RPC(Remote Procedure Call), implemented by Facebook. Later Facebook made Thrift open source to Apache. You can see that Thrift is an RPC framework, but because Thrift provides RPC services across multiple languages, it is used in serialization a lot of the time.

There are three main steps to implementing serialization using Thrift: creating the Thrift IDL file, compiling and generating Java code, and using Tserializer and Tdeserializer for serialization and deserialization.

generality

Thrift is similar to Protobuf in that both require the use of IDL definition description files, which is currently an efficient way to implement cross-language serialization /RPC. Thrift currently supports C++, Java, Python, PHP, Ruby, Erlang, Perl, Haskell, C#, Cocoa, JavaScript, Node.js, Smalltalk, Ocaml, Delphi, etc. So you can see that there is a lot of versatility in Thrift.

Ease of use

Thrift is similar to ProtoBuf in terms of ease of use in that it requires three steps: writing the Thrift files using IDL, compiling and generating Java code, and calling the serialization and deserialization methods. ProtoBuf already has the serialization and deserialization methods built into the generated classes, and Thrift requires a separate call to the built-in serializer to do the codec.

scalability

Thrift supports field extensions, and there are a few things you need to be aware of in the process of extending fields:

performance

In the test case above, the size of bytes after serialization using Thrift was 257, and here is the corresponding serialization and deserialization time overhead:

Thrift is about the same as Protobuf in terms of the combined time overhead of serialization and deserialization, and Protobuf has the advantage in terms of serialization time, whereas Thrift has the advantage in terms of deserialization.

Data type and syntax structure support

Data type support: Since Thrift uses IDL to define serialized classes, the only data types that can be supported are the Thrift data types. The Java data types that Thrift can support:

Thrift also does not support defining Java methods.

6 Hessian serialization framework

Hessian is a lightweight RPC(Remote Procedure Call) framework developed by Caucho. It uses the HTTP protocol for transport and uses Hessian binary serialization. Hessian is often used by serialization frameworks because of its support for a cross-language, efficient binary serialization protocol. The Hessian serialization protocol is divided into Hessian1.0 and Hessian2.0. The Hessian2.0 protocol optimizes the serialization process, and has a significant performance improvement compared to Hessian1.0. Hessian serialization is very simple. You can serialize objects using HessianInput and HessianOutput. Here is the Demo of Hessian serialization:

generality

Hessian, like Protobuf and Thrift, supports cross-language RPC communication. A major advantage of Hessian over other cross-language PRC frameworks is that instead of using IDL to define data and services, it uses self-description to complete the definition of services. Hessian has been implemented in languages including Java, Flash/Flex, Python, C++,.NET /C#, D, Erlang, PHP, Ruby, and Object-C.

Ease of use

Compared to Protobuf and Thrift, it is easier to use because Hessian does not need to define data and services through IDL and only needs to implement the Serializable interface for serialized data.

scalability

The Hession Serializable class, although it needs to implement the Serializable interface, is not affected by the SerialVersionUID and can easily support field extensions.

performance

The above test case was serialized using the Hessian1.0 protocol with a serialized result size of 277. Using the Hessian2.0 serialization protocol, the serialization result size is 178.

The time overhead of serialization and deserialization is as follows:

As you can see, Hessian1.0 is significantly different from Hessian2.0 in both serialization size and serialization time.

Data type and syntax structure support

Since Hession uses Java self-describing serialized classes, Java native data types, collection classes, custom classes, enumerations, etc., are supported (SynchronousQueue does not), and Java syntax structures are well supported.

7. Avro serialization framework

Avro is a data serialization framework. It is a sub-project under Apache Hadoop, with Doug Cutting leading the data serialization framework developed during Hadoop. Designed to support data-intensive applications, Avro is ideal for remote or local large-scale data exchange and storage.

generality

Avro defines data structures through Schema, and currently supports Java, C, C++, C#, Python, PHP, and Ruby languages, so Avro has good commonality among these languages.

Ease of use

Avro doesn’t need to generate code for dynamic languages, but for static languages like Java, you still need to use avro-tools.jar to compile and generate Java code. Compared with Thrift and Protobuf, Schema writing is more complex in my opinion.

scalability

performance

After serializing the generated code with Avro, the result is: 111. Here is the time spent using Avro serialization:

Data type and syntax structure support

Avro needs to use the data types that Avro supports to write the Schema information, so the Java data types that can be supported are the data types that Avro supports. Avro supports basic data types (NULL, Boolean, Int, Long, Float, Double, Bytes, String) and complex data types (Record, Enum, Array, Map, Union, Fixed).

Avro automatically generates code, or uses Schema directly, and does not support defining Java methods in serialized classes.

Three summary

1 general

The following is a comparison of the various serialization frameworks in terms of generality. ProtoBuf is the best in generality and supports several major converging languages.

2 ease of use

The following is a comparison of the various serialization frameworks in terms of the ease of use of the API. It can be said that all of the serialization frameworks with the exception of the JDK Serializer provide a good way to use the API.

3 Scalability

Below is a comparison of the extensibility of each serialization framework, and you can see that ProtoBuf’s extensibility is the most convenient and natural. Other serialization frameworks require some configuration, annotations, and so on.

4 performance

Serialization size comparison

Comparing the serialized data sizes across the serialization frameworks, you can see that both Kryo Preregister (pre-registered serialization classes) and Avro serialization results look good. So, if you have a need for serialization size, you can choose either Kryo or Avro.

Serialized time overhead comparison

Below is the serialization and deserialization time. Both Kryo Preregister and FST Preregister provide excellent performance, with the FST Pre serialization time being the best and the Kryo Pre roughly matching the serialization and deserialization time. So, if serialization time is a major consideration, you can choose Kryo or FST, both of which provide a good performance experience.

Data type and syntax structure support

Java data types supported by serialization frameworks:

Note: The collection type test covers almost all the corresponding implementation classes.

The following is a summary of the data types and syntax supported by the above serialization framework based on the tests.

Because Protobuf and Thrift are IDL definition class files, they then use their respective compilers to generate Java code. IDL does not provide syntax for defining STAIC inner classes, non-static inner classes, and so on, so these functions cannot be tested.

Thank you for your recognition and support, Xiaobian will continue to forward “Le Byte” quality articles