Java IO series:

First, sort out the huge Java IO system

Path, Files, RondomAccessFile

Serialization and deserialization

Java native serialization

Save and load serialized objects

1. Basic use

//java.io.ObjectOutputStream

// Create an 0bjectOutputStream so that you can write objects to the specified OutputStream.
ObjectOutputStream(OutputStream out)
  
// Write the specified object to 0bjectoutputStream. This method stores the specified object's class, its signature, and the values of all non-static and non-transient fields in the class and its superclasses.
void writeobject ( object obj )
Copy the code

//java.io.ObjectInputStream

// Create a 0bjectInputStream to read back object information from the specified InputStream.
ObjectInputStream(InputStream in)
  
// Read an object from ObjectInputStream. In particular, this method reads back the class of the object, the signature of the class, and the values of all non-static and non-transient fields in the class and its superclasses. The deserialization it performs allows multiple object references to be recovered.
Object readobject(a)
Copy the code

At the same time, serialized classes must implement the Serializable interface

class Employee implements serializable {... }Copy the code

Serialization examples:

// 1. Open an ObjectOutputStream
ObjectOutputStream out = new ObjectOutputStream(new FileOutputStream("employee.dat"));
// 2. Call writeObject
Employee harry = new Employee ("Harry Hacker" , 50000 , 1989 , 10 , 1 );
Manager boss=new Manager("Carl Cracker".80000.1987.12.15);
out.writeObject(harry);
out.writeObject(boss);
Copy the code

Deserialization example:

1. Open an ObjectInputStream object
ObjectInputStream in = new ObjectInputStream(new FileInputStream("employee.dat"));
// 2. Call readObject()
Employee el = (Employee) in.readObject();
Employee e2 = (Employee) in.readObject();
Copy the code

2. How to handle serialization dependencies

class Manager extends Employee {...privateEmployee secretary; . }Copy the code

It is certainly not possible to save only Secretary’s memory address, since when the object is reloaded it may occupy a completely different memory address.

Java’s solution to this problem is that each object is stored with a serial number, which is why this mechanism is called object serialization. Here’s the algorithm:

When writing an object (serialization) :

Associate a serial number with each object reference you encounter (as shown in Figure 2-6).
For each object, when first encountered, the object data is saved to the output stream.
If an object has been saved before, just write “same as a previously saved object with serial number X.”

When reading objects (deserialization), the process is reversed:

For an object in the object input stream, when it first encounters its serial number, build it and start with data from the stream

Initialize it, and then record the association between this sequence number and the new object.
When the “same as a previously saved object with serial number X” flag is encountered, get associated with this serial number

Object reference to.

Change the default serialization mechanism

1. transient

Certain data fields cannot be serialized, for example, storing integer values for file handles or window handles that are meaningful only to local methods, information that is not useful later when the object is reloaded or transferred to another machine, and may cause system errors.

Java has a very simple mechanism to prevent such fields from being serialized by marking them as transient

class Manager extends Employee {...private transient String name;
  privateEmployee secretary; . }Copy the code

2. Custom serialization

The serialization mechanism provides a way for individual classes to add validation or any other desired behavior to the default read/write behavior. Serializable classes can define methods with the following signatures:

private void readObject(ObjectInputStream in) throws IOException, ClassNotFoundException;
private void writeObject(ObjectOutputStrean out) throws IOException;
Copy the code

After that, the data fields are no longer automatically serialized; instead, these methods are called.

Transient can be bypassed by these two methods.

Example:

public class LabeledPoint implements Serializable {
  private String label ;
	private transient Point2D.Double point;
}
Copy the code

private void writeObject (ObjectOutputStream out) throws IOException{
  out.defaultWriteObject(); // Write the object descriptor and the String field lable first
	out.writeDouble(point.getX());
	out.writeDouble(point.getY());
}
Copy the code

// In the readObject method, do the same in reverse.
private void readObject(0bjectInputStream in) throws IOException {
  in.defaultReadObject();
	double X = in.readDouble();
	double y = in.readDouble();
	point = new Point2D.Double(x,y);
}
Copy the code

Serialization singleton and type-safe enumeration

When serializing and deserializing, you have to be extra careful if the target object is unique, which usually happens when implementing singletons and type-safe enumerations.

If you use the Java language’s enum structure, then you don’t have to worry about serialization, and it works just fine.

But if it’s ancient code, handwritten enumeration types of privatized constructors, be careful. Such as:

public class Orientation {
  public static final Orientation HORIZONTAL = new Orientation(1);
  public static final Orientation VERTICAL = new Orientation(2);
  private int values;
  private Orientation(int v) {value = v;}
}
Copy the code

For this case, we need ReadResolve’s special serialization method.

If the readResolve method is defined, it will be called after the object is deserialized, and it must return an object that will later become the return value of the readObject.

In the above case, the readResolve method checks the value field and returns the appropriate enumerated type

protected Object readResolve(a) throws ObjectStreamException {
	if(value=1)return Orientation.HORIZONTAL;
	if(value==2)return Orientation.VERTICAL;
	throw new ObjectStreamException()://this shouldn't happen
}
Copy the code

Version management of serialization

Normally, if two classes are even remotely different, serialization fails. In real development, however, minor changes may be made to a class. To serialize compatibility with previous or later versions, we need a static data member: serialVersionUID

class Employee implements Serializable 
{...public static final long serialVersionUID - -1814239825517340645L; . }Copy the code

If a class has a static data member named serialVersionUID, it no longer needs to manually calculate its fingerprint, but simply uses the value directly. Once the static data member is placed inside a class, the serialization system can read in different versions of the class’s objects.

If the class’s methods change, there should be no problem reading new object data.

If the data field has changed:

If the names match but the types do not, the object input stream will not attempt to convert one type to the other because the two objects are incompatible.
If the serialized object has a data field that is not present in the current version, the object input stream ignores this additional data;
If the current version has data fields that are not present in the serialized object, these newly added fields will be set to their default values (NU11 for objects, 0 for numbers, and false for Boolean).

Serialization applications – deep cloning

public class Test {
    public static void main(String[] args) throws IOException, ClassNotFoundException {
        Person p1 = new Person("JonyJava".18);
        Person p2 = (Person) p1.clone();
        p2.age++;
        System.out.println("p1 = " + p1);
        System.out.println("p2 = "+ p2); }}class Person implements Serializable.Cloneable {
    public String name;
    public Integer age;

    public Person(String name, Integer age) {
        this.name = name;
        this.age = age;
    }

    @Override
    protected Object clone(a) {
        try {
            ByteArrayOutputStream bout = new ByteArrayOutputStream();
            ObjectOutputStream out = new ObjectOutputStream(bout);
            out.writeObject(this);

            ByteArrayInputStream bin = new ByteArrayInputStream(bout.toByteArray());
            ObjectInputStream in = new ObjectInputStream(bin);
            return in.readObject();
        } catch (IOException | ClassNotFoundException e) {
            e.printStackTrace();
            return null; }}@Override
    public String toString(a) {
        return "Person{" +
                "name='" + name + '\' ' +
                ", age=" + age +
                '} '; }}Copy the code

Serialization in a distributed environment

Performance requirements for serialization in a distributed environment:

Serialized computing performance
Packet size
cross-language

At this point, Java native serialization is very limited. There are also many excellent serialization tools in the market, you can refer to dubbo source code, support serialization methods:

A brief introduction to various serialization techniques

XML serialization framework

The benefits of XML serialization are readability, ease of reading, and debugging. However, serialized bytecode files are relatively large and inefficient, which is suitable for scenarios of data exchange between enterprise internal systems with low performance and low QPS. Meanwhile, XML has language independence, so it can also be used for data exchange and protocol between heterogeneous systems. The familiar Webservice, for example, serializes data in XML format. XML serialization/deserialization can be implemented in many ways, including XStream and Java’s XML serialization and deserialization

JSON serialization framework

JSON (JavaScript Object Notation) is a lightweight data interchange format that has a smaller byte stream than XML and is very readable. JSON data format is now the most common JSON serialization in the enterprise and there are many open source tools available

Jackson (github.com/FasterXML/j…
Ali open Source FastJson (github.com/alibaba/fas…
Google GSON (github.com/google/gson)

Of these json serialization tools, Jackson and FastJSON have better performance than GSON, but Jackson and GSON have better stability than FastJSON. The advantage of Fastjson is that it provides an API that is very easy to use

Hessian serialization framework

Hessian is a binary serialization protocol that supports cross-language transfer. Compared to the Java default serialization mechanism, Hessian has better performance and ease of use, and supports many different languages. But Dubbo has refactored Hessian for better performance

Avro serialization

Avro is a data serialization system designed for applications that support large volume data exchange. Its main features are: support binary serialization, can be convenient, fast processing of a large number of data; Dynamic languages are friendly, and Avro provides mechanisms that make it easy for dynamic languages to process Avro data.

Kyro serialization framework

Kryo is a very mature serialization implementation that is already widely used in Hive and Storm, but it does not cross languages. Dubbo now supports kyro serialization in version 2.6. It performs better than the previous Hessian2

Protobuf serialization framework

Protobuf is a Google data exchange format that is language – and platform-independent. Google provides multiple languages to implement, such as Java, C, Go, Python, each implementation contains the corresponding language compiler and library files, Protobuf is a pure presentation layer protocol, can be used with various transport layer protocols. Protobuf is widely used because of its low space overhead and high performance, making it ideal for intra-company RPC calls that require high performance. In addition, due to the high parsing performance, the amount of data after serialization is relatively small, so it can also be used in the object persistence scenario, but to use Protobuf is relatively troublesome, because it has its own syntax, has its own compiler, if necessary, you have to invest in the learning of this technology

A drawback of protobuf is that each class structure to be transferred must generate a corresponding PROto file, and if a class changes, the corresponding proTO file must be regenerated for that class

Selection reference of serialization technology

The technical level

Serialization space overhead, which is the size of the results of serialization, affects transport performance
Duration of serialization. A long serialization duration affects the service response time
Whether the serialization protocol supports cross-platform, cross-language. Because today’s architectures are more flexible, this must be considered if there is a need for heterogeneous systems to communicate
Scalability/compatibility, in the actual business development, the system often need as demand rapid iteration to achieve quick update, this requires that we adopt serialization protocol based on good scalability/compatibility, such as in the existing serialized data structure. A new business fields, will not affect the existing services
The popularity of technology, the more popular technology means the use of more companies, so many potholes have been poured and solved, technical solutions are relatively mature
Learning difficulty and ease of use

Selection Suggestions

The XML-based SOAP protocol can be used in scenarios that do not require high performance
Hessian, Protobuf, Thrift, and Avro can be used for scenarios that require high performance and indirectness.
Based on the front and back end separation, or independent external API services, choose JSON is better, for debugging, readable is very good
Avro is designed to be a dynamically typed language, so this kind of scenario is ok with Avro

This address has performance comparisons for different serialization technologies: github.com/eishay/jvms…

Serialization and deserialization