Serialization and deserialization are essential steps in the development process. Simply put, serialization is the process of converting an object into a byte stream, and deserialization is the process of converting a byte stream back into an object. The relationship between the two is as follows:

Serialization and deserialization is a standard (see XDR: External Data Representation standard RFC 1014) that is common to programming languages, but some are built in (Java, PHP, etc.) and some are implemented through third-party libraries (C/C++).

Usage scenarios

  • Persistence of objects (saving object contents to a database or file)
  • Remote data transfer (sending objects to other computer systems)

Why serialization and serialization?

Serialization and serialization mainly solve the problem of data consistency. Simply put, the input data is the same as the output data.

For local persistence of data, it is only necessary to convert data into strings for storage. However, for remote data transmission, due to differences in operating system and hardware, problems such as memory size end and memory alignment will occur, resulting in the failure of the receiving end to parse data correctly. In order to solve this problem, Sun Microsystems proposed the XDR specification in the 1980s, which became an IETF standard in 1995.

Serialization and deserialization in Java

Serialization and deserialization are built into the Java language and are implemented through the Serializable interface.

public class Account implements Serializable {

	private int age;
	private long birthday;
	private String name;
}
Copy the code

Serialization compatibility

Serialization compatibility refers to the impact of structural changes to an object (such as adding or deleting fields, modifying fields, changing field modifiers, etc.) on serialization. To be able to recognize changes to an object’s structure, Serializable uses the serialVersionUID field to identify the object’s structure. By default, it is automatically generated based on the object’s data structure, and its value changes when the structure changes. The virtual machine checks the value of serialVersionUID during deserialization. If the serialVersionUID in the bytecode is not the same as the serialVersionUID of the type to be converted, deserialization cannot take place properly.

Example: Save the Account object to a file, then add the Address field to the Account class, and read the previously saved contents from the file.

FileOutputStream fos = new FileOutputStream(file); ObjectOutputStream oos = new ObjectOutputStream(fos); oos.writeObject(account); oos.flush(); Public class Account implements Serializable {private int age; private long birthday; private String name; private String address; public Account(int age, String name) { this.age = age; this.name = name; FileInputStream fis = new FileInputStream(file); ObjectInputStream ois = new ObjectInputStream(fis); Account account2 = (Account)ois.readObject();Copy the code

Because the structure of the Account object is modified after saving the Account object, the value of serialVersionUID changes, and an error occurs when reading the file (deserialization). So for better compatibility, it is best to set the serialVersionUID value to a fixed value at serialization time.

public class Account implements Serializable {

    private static final long serialVersionUID = 1L;
    
    private int age;
    private long birthday;
    private String name;
}
Copy the code

Storage rules for serialization

Serialization in Java optimizes identical objects to save disk space when objects are persisted (serialized). When you save the same object multiple times, you are only saving a reference to the first object.

// Save the account object twice and change its username the second time. Account Account = new Account("Freeman"); FileOutputStream fos = new FileOutputStream(file); ObjectOutputStream oos = new ObjectOutputStream(fos); oos.writeObject(account); System.out.println("fileSize=" +file.length()); account.setUserName("Tom"); oos.writeObject(account); System.out.println("fileSize=" +file.length()); FileInputStream fis = new FileInputStream(file); ObjectInputStream ois = new ObjectInputStream(fis); Account account2 = (Account)ois.readObject(); Account account3 = (Account)ois.readObject(); System.out.println("account2.name=" + account2.getUserName() + "\n account3.name=" + account3.getUserName() + "\naccount2==account3 -> " + account2.equals(account3));Copy the code

Output result:

account2.name=Freeman  
account3.name=Freeman 
account2==account3 -> true
Copy the code

Therefore, when serializing the same object multiple times, it is best to clone a new object and then serialize it.

The impact of serialization on singletons

When deserializing, the JVM constructs a new object from the serialized content, which is equivalent to opening the constructor for a singleton class that implements Serializable. To ensure the uniqueness of a singleton instance, we need to override the resolveObject method.

/** * called while deserializing * @return returns a new Object created from bytecode * @throws ObjectStreamException */ Private Object readResolve()throws ObjectStreamException { return instance; }Copy the code

Control serialization process

Although it’s convenient to use Serializable directly, sometimes we don’t want to serialize all of the fields, such as the isSelected field that identifies the selected state, or the Password field that addresses security issues, etc. In this case, you can use the following methods:

  1. Add a static or TRANSIENT modifier to a field that you do not want to serialize:

Serialization in Java holds only the member variables of an object, and does not include either static members (which belong to classes) or member methods. To make serialization more flexible, Java provides the TRANSIENT keyword to turn off the serialization of fields.

public class Account implements Serializable {

    private static final long serialVersionUID = 1L;

    private String userName;
    private static String idcard;
    private transient String password;
}
Copy the code
  1. Use the Externalizable interface directly to control the serialization process:

Externalizable is also a serialization interface provided by Java. Unlike Serializable, it does not serialize any member variables by default. All serialization and deserialization must be done manually.

public class Account implements Externalizable { private static final long serialVersionUID = 1L; private String userName; private String idcard; private String password; @Override public void writeExternal(ObjectOutput out) throws IOException { out.writeObject(userName); } @Override public void readExternal(ObjectInput in) throws IOException, ClassNotFoundException { userName = (String) in.readObject(); }}Copy the code
  1. Implement your own serialization/deserialization process

    public class Account implements Serializable {

    private static final long serialVersionUID = 1L; private String userName; private transient String idcard; private String password; Private void writeObject(ObjectOutputStream OOS)throws IOException {// Call the default serialization method, Serialize the non-transient /static field oos. DefaultWriteObject (); oos.writeObject(idcard); } private void readObject(ObjectInputStream ois) throws IOException, ClassNotFoundException {// Call the default deserialization method, Ois.defaultreadobject (); idcard = (String)ois.readObject(); }Copy the code

    }

For a detailed introduction to the Java serialization algorithm, see: Java Serialization Algorithm Dialysis

Java serialization considerations

  1. When Serializable objects are deserialized, they are directly constructed from bytecode without calling the constructor of the object.
  2. Through the Serializable Serializable subclass, if the parent class does not implement the Serializable interface, then the parent class needs to provide a default constructor, or thrown when deserialize Java. IO. NotSerializableException exception;
  3. When serialization is implemented via Externalizale, the default constructor of the object is called for deserialization.
  4. Since Externalizale does not serialize any member variables by default, the TRANSIENT keyword can only be used in Serializable serialization;

Data exchange protocol

Serialization and deserialization make it possible to exchange data, but they are not readable because they are passing bytecode. It is not easy to debug during application layer development. To solve this problem, the most straightforward idea is to transfer the content of an object into a string. The specific transmission format can be customized, but the customized format has a big problem — compatibility. If the module of other systems is introduced, it is necessary to transform the data format. When maintaining other systems, it is necessary to understand its serialization method first. In order to unify the format of data transmission, there are several data exchange protocols, such as: JSON, Protobuf, XML. These data exchange protocols can be thought of as application level serialization/deserialization.

JSON

JSON (JavaScript Object Notation) is a lightweight, language-independent data interchange format. At present, it is widely used in front and back end data interaction.

grammar

Elements in JSON are key-value pairs in the form of — key:value, which are separated by “:”. Each key must be enclosed in double quotation marks (“:”). The types of value include object, array and value. Each type has a different syntax.

object

The object is an unordered collection of key-value pairs. Start with “{” and end with “}”. Each member is separated by “,”. Such as:

"value" : {
    "name": "Freeman",
    "gender": 1
}
Copy the code
An array of

An array is an ordered collection that begins with a “[” and ends with a “]”, separated by a “,”. Such as:

"value" : [
    {
        "name": "zhangsan",
        "gender": 1
    },
    {
        "name": "lisi",
        "gender": 2
    }
]
Copy the code
value

Value types represent the basic types in JSON, including String, Number(byte, short, int, long, float, double), and Boolean.

"name": "Freeman"
"gender": 1
"registered": false
"article": null
Copy the code

== Note that == : objects, arrays, and values can be nested with each other!

{
    "code": 1,
    "msg": "success",
    "data": [
        {
            "name": "zhangsan",
            "gender": 1
        },
        {
            "name": "lisi",
            "gender": 2
        }
    ]
}
Copy the code

For JSON, popular third-party libraries include Gson, FastJSON: For details on Gson, see the Gson tutorial

Protobuf

Protobuf is a language-independent, platform-independent, extensible serialization method implemented by Google that is smaller, faster, and easier to use than XML.

Protobuf is very efficient and supports almost all major development languages, refer to the Protobuf development documentation for details.

To use Protobuf on Android, you need the protobuf-gradle-plugin. See the project description for details.

XML

XML (Extensible Markup Language) describes data through tags. The following is an example:

<? The XML version = "1.0" encoding = "utf-8"? > <person> <name>Freeman</name> <gender>1</gender> </person>Copy the code

When data is transferred in this way, you only need to convert the object into this label form, and when the data is received, it is converted into the corresponding object.

XML parsing in JAVA development can be explained by referring to four methods for generating and parsing XML documents

How to choose data exchange protocol

In terms of performance, data size and readability, the results are as follows:

agreement performance Data size readability
JSON good good optimal
Protobuf optimal optimal poor
XML In the In the In the

For interactions where the data volume is not very large and the real-time performance is not particularly high, JSON can fully meet the requirements. After all, it is highly readable and easy to locate problems (note: it is the mainstream protocol used by the front-end, APP and back-end to exchange data). The Protobuf protocol can be used in scenarios that require high real-time performance or large data volume. Specific data exchange protocols can be compared at github.com/eishay/jvm-…