Java IO series:
- First, sort out the huge Java IO system
- Path, Files, RondomAccessFile
- Serialization and deserialization
Java native serialization
Save and load serialized objects
1. Basic use
//java.io.ObjectOutputStream
// Create an 0bjectOutputStream so that you can write objects to the specified OutputStream.
ObjectOutputStream(OutputStream out)
// Write the specified object to 0bjectoutputStream. This method stores the specified object's class, its signature, and the values of all non-static and non-transient fields in the class and its superclasses.
void writeobject ( object obj )
Copy the code
//java.io.ObjectInputStream
// Create a 0bjectInputStream to read back object information from the specified InputStream.
ObjectInputStream(InputStream in)
// Read an object from ObjectInputStream. In particular, this method reads back the class of the object, the signature of the class, and the values of all non-static and non-transient fields in the class and its superclasses. The deserialization it performs allows multiple object references to be recovered.
Object readobject(a)
Copy the code
At the same time, serialized classes must implement the Serializable interface
class Employee implements serializable {... }Copy the code
Serialization examples:
// 1. Open an ObjectOutputStream
ObjectOutputStream out = new ObjectOutputStream(new FileOutputStream("employee.dat"));
// 2. Call writeObject
Employee harry = new Employee ("Harry Hacker" , 50000 , 1989 , 10 , 1 );
Manager boss=new Manager("Carl Cracker".80000.1987.12.15);
out.writeObject(harry);
out.writeObject(boss);
Copy the code
Deserialization example:
1. Open an ObjectInputStream object
ObjectInputStream in = new ObjectInputStream(new FileInputStream("employee.dat"));
// 2. Call readObject()
Employee el = (Employee) in.readObject();
Employee e2 = (Employee) in.readObject();
Copy the code
2. How to handle serialization dependencies
class Manager extends Employee {...privateEmployee secretary; . }Copy the code
It is certainly not possible to save only Secretary’s memory address, since when the object is reloaded it may occupy a completely different memory address.
Java’s solution to this problem is that each object is stored with a serial number, which is why this mechanism is called object serialization. Here’s the algorithm:
When writing an object (serialization) :
- Associate a serial number with each object reference you encounter (as shown in Figure 2-6).
- For each object, when first encountered, the object data is saved to the output stream.
- If an object has been saved before, just write “same as a previously saved object with serial number X.”
When reading objects (deserialization), the process is reversed:
-
For an object in the object input stream, when it first encounters its serial number, build it and start with data from the stream
Initialize it, and then record the association between this sequence number and the new object.
-
When the “same as a previously saved object with serial number X” flag is encountered, get associated with this serial number
Object reference to.
Change the default serialization mechanism
1. transient
Certain data fields cannot be serialized, for example, storing integer values for file handles or window handles that are meaningful only to local methods, information that is not useful later when the object is reloaded or transferred to another machine, and may cause system errors.
Java has a very simple mechanism to prevent such fields from being serialized by marking them as transient
class Manager extends Employee {...private transient String name;
privateEmployee secretary; . }Copy the code
2. Custom serialization
The serialization mechanism provides a way for individual classes to add validation or any other desired behavior to the default read/write behavior. Serializable classes can define methods with the following signatures:
private void readObject(ObjectInputStream in) throws IOException, ClassNotFoundException;
private void writeObject(ObjectOutputStrean out) throws IOException;
Copy the code
After that, the data fields are no longer automatically serialized; instead, these methods are called.
Transient can be bypassed by these two methods.
Example:
public class LabeledPoint implements Serializable {
private String label ;
private transient Point2D.Double point;
}
Copy the code
private void writeObject (ObjectOutputStream out) throws IOException{
out.defaultWriteObject(); // Write the object descriptor and the String field lable first
out.writeDouble(point.getX());
out.writeDouble(point.getY());
}
Copy the code
// In the readObject method, do the same in reverse.
private void readObject(0bjectInputStream in) throws IOException {
in.defaultReadObject();
double X = in.readDouble();
double y = in.readDouble();
point = new Point2D.Double(x,y);
}
Copy the code
Serialization singleton and type-safe enumeration
When serializing and deserializing, you have to be extra careful if the target object is unique, which usually happens when implementing singletons and type-safe enumerations.
If you use the Java language’s enum structure, then you don’t have to worry about serialization, and it works just fine.
But if it’s ancient code, handwritten enumeration types of privatized constructors, be careful. Such as:
public class Orientation {
public static final Orientation HORIZONTAL = new Orientation(1);
public static final Orientation VERTICAL = new Orientation(2);
private int values;
private Orientation(int v) {value = v;}
}
Copy the code
For this case, we need ReadResolve’s special serialization method.
If the readResolve method is defined, it will be called after the object is deserialized, and it must return an object that will later become the return value of the readObject.
In the above case, the readResolve method checks the value field and returns the appropriate enumerated type
protected Object readResolve(a) throws ObjectStreamException {
if(value=1)return Orientation.HORIZONTAL;
if(value==2)return Orientation.VERTICAL;
throw new ObjectStreamException()://this shouldn't happen
}
Copy the code
Version management of serialization
Normally, if two classes are even remotely different, serialization fails. In real development, however, minor changes may be made to a class. To serialize compatibility with previous or later versions, we need a static data member: serialVersionUID
class Employee implements Serializable
{...public static final long serialVersionUID - -1814239825517340645L; . }Copy the code
If a class has a static data member named serialVersionUID, it no longer needs to manually calculate its fingerprint, but simply uses the value directly. Once the static data member is placed inside a class, the serialization system can read in different versions of the class’s objects.
If the class’s methods change, there should be no problem reading new object data.
If the data field has changed:
- If the names match but the types do not, the object input stream will not attempt to convert one type to the other because the two objects are incompatible.
- If the serialized object has a data field that is not present in the current version, the object input stream ignores this additional data;
- If the current version has data fields that are not present in the serialized object, these newly added fields will be set to their default values (NU11 for objects, 0 for numbers, and false for Boolean).
Serialization applications – deep cloning
public class Test {
public static void main(String[] args) throws IOException, ClassNotFoundException {
Person p1 = new Person("JonyJava".18);
Person p2 = (Person) p1.clone();
p2.age++;
System.out.println("p1 = " + p1);
System.out.println("p2 = "+ p2); }}class Person implements Serializable.Cloneable {
public String name;
public Integer age;
public Person(String name, Integer age) {
this.name = name;
this.age = age;
}
@Override
protected Object clone(a) {
try {
ByteArrayOutputStream bout = new ByteArrayOutputStream();
ObjectOutputStream out = new ObjectOutputStream(bout);
out.writeObject(this);
ByteArrayInputStream bin = new ByteArrayInputStream(bout.toByteArray());
ObjectInputStream in = new ObjectInputStream(bin);
return in.readObject();
} catch (IOException | ClassNotFoundException e) {
e.printStackTrace();
return null; }}@Override
public String toString(a) {
return "Person{" +
"name='" + name + '\' ' +
", age=" + age +
'} '; }}Copy the code
Serialization in a distributed environment
Performance requirements for serialization in a distributed environment:
- Serialized computing performance
- Packet size
- cross-language
At this point, Java native serialization is very limited. There are also many excellent serialization tools in the market, you can refer to dubbo source code, support serialization methods:
A brief introduction to various serialization techniques
XML serialization framework
The benefits of XML serialization are readability, ease of reading, and debugging. However, serialized bytecode files are relatively large and inefficient, which is suitable for scenarios of data exchange between enterprise internal systems with low performance and low QPS. Meanwhile, XML has language independence, so it can also be used for data exchange and protocol between heterogeneous systems. The familiar Webservice, for example, serializes data in XML format. XML serialization/deserialization can be implemented in many ways, including XStream and Java’s XML serialization and deserialization
JSON serialization framework
JSON (JavaScript Object Notation) is a lightweight data interchange format that has a smaller byte stream than XML and is very readable. JSON data format is now the most common JSON serialization in the enterprise and there are many open source tools available
- Jackson (github.com/FasterXML/j…
- Ali open Source FastJson (github.com/alibaba/fas…
- Google GSON (github.com/google/gson)
Of these json serialization tools, Jackson and FastJSON have better performance than GSON, but Jackson and GSON have better stability than FastJSON. The advantage of Fastjson is that it provides an API that is very easy to use
Hessian serialization framework
Hessian is a binary serialization protocol that supports cross-language transfer. Compared to the Java default serialization mechanism, Hessian has better performance and ease of use, and supports many different languages. But Dubbo has refactored Hessian for better performance
Avro serialization
Avro is a data serialization system designed for applications that support large volume data exchange. Its main features are: support binary serialization, can be convenient, fast processing of a large number of data; Dynamic languages are friendly, and Avro provides mechanisms that make it easy for dynamic languages to process Avro data.
Kyro serialization framework
Kryo is a very mature serialization implementation that is already widely used in Hive and Storm, but it does not cross languages. Dubbo now supports kyro serialization in version 2.6. It performs better than the previous Hessian2
Protobuf serialization framework
Protobuf is a Google data exchange format that is language – and platform-independent. Google provides multiple languages to implement, such as Java, C, Go, Python, each implementation contains the corresponding language compiler and library files, Protobuf is a pure presentation layer protocol, can be used with various transport layer protocols. Protobuf is widely used because of its low space overhead and high performance, making it ideal for intra-company RPC calls that require high performance. In addition, due to the high parsing performance, the amount of data after serialization is relatively small, so it can also be used in the object persistence scenario, but to use Protobuf is relatively troublesome, because it has its own syntax, has its own compiler, if necessary, you have to invest in the learning of this technology
A drawback of protobuf is that each class structure to be transferred must generate a corresponding PROto file, and if a class changes, the corresponding proTO file must be regenerated for that class
Selection reference of serialization technology
The technical level
- Serialization space overhead, which is the size of the results of serialization, affects transport performance
- Duration of serialization. A long serialization duration affects the service response time
- Whether the serialization protocol supports cross-platform, cross-language. Because today’s architectures are more flexible, this must be considered if there is a need for heterogeneous systems to communicate
- Scalability/compatibility, in the actual business development, the system often need as demand rapid iteration to achieve quick update, this requires that we adopt serialization protocol based on good scalability/compatibility, such as in the existing serialized data structure. A new business fields, will not affect the existing services
- The popularity of technology, the more popular technology means the use of more companies, so many potholes have been poured and solved, technical solutions are relatively mature
- Learning difficulty and ease of use
Selection Suggestions
- The XML-based SOAP protocol can be used in scenarios that do not require high performance
- Hessian, Protobuf, Thrift, and Avro can be used for scenarios that require high performance and indirectness.
- Based on the front and back end separation, or independent external API services, choose JSON is better, for debugging, readable is very good
- Avro is designed to be a dynamically typed language, so this kind of scenario is ok with Avro
This address has performance comparisons for different serialization technologies: github.com/eishay/jvms…