directory

JDK serialization is not easy to serializeCopy the code

Serialization of what, why, and how

** What is serialization? ** Simply encodes objects into byte streams according to the serialization protocol. The reverse process is called deserialization. For example, the common JSON serialization:

public class A {
	   private int x = 1;
	   private String y = "2";
}
Copy the code

After JSON serialization:

{
	"x" : 1."y" : "2"
}
Copy the code

** Why serialization? In simple terms, objects are transferred and stored in a compressed space, and do language independent. The communication parties only need to serialize/deserialize according to the agreed serialization protocol, regardless of the language used by the other party.

How to serialize **? ** There are many existing serialization protocols, such as XML, JSON, FastJSON, Protobuf, Protostuff, etc. In addition, there is JDK serialization, which Java often touches (JDK serialization is not cross-language). These serialization methods have their own strengths and weaknesses and are not the focus of this article, but you can take a look at the comparison of several popular serialization protocols.

JDK serialization is not easy

JDK serialization is as simple as adding implements Serializable to the class declaration. But because of its simplicity, it is often seen to be abused everywhere. Actually JDK serialization is complex, and the overhead for serialization is long-term.

Why is that?

First, the flexibility of the class is reduced and the evolution of the class is limited. Once serializable, its serialized byte stream is like a part of the API, you must always support serialization/deserialization, if one of the communicators changes the class structure and publishes it, there will be incompatibations, which will lead to errors.

In addition, there is a serial version UID(SERIAL version UID) in the class. When deserialization, the version is confirmed first based on the UID. If the version is inconsistent, deserialization fails and InvalidClassException is thrown. This UID, if not provided by display, is computed at run time with the class name and all public and protected member names. This is why it is recommended to implement a serializable display that provides UIds, because if one of the communicators adds an irrelevant variable or method to the class, the implicitly generated UIds will also be inconsistent, leading to incompatible exceptions, and the computation of implicitly generated Uids is also expensive.

Second, increase the possibility of bugs and security vulnerabilities. Deserialization mechanism is like an “implicit constructor”. If certain measures are not taken to ensure it, it is easy to be used by attackers to construct constraints that violate the “real constructor”.

Third, as new releases implement serializable classes, the testing burden increases.

Serialization attacks

Since serialization converts an object to a byte stream and deserialization restores that byte stream to an object, can the intermediate byte stream be forged? The answer is yes:

For example, our object Period restricts the member date variable start to precede end:

public class Period implements Serializable {
    private static final long serialVersionUID = 4647424730390249716L;
    private Date start;
    private Date end;
    public Period(Date start, Date end) {
        if (start.after(end)) {
            throw new IllegalArgumentException();
        }
        this.start = start;
        this.end = end;
    }
    @Override
    public String toString(a) {
        return "PeriodA{" +
                "start=" + start +
                ", end=" + end +
                '} '; }}Copy the code

Now we have forged the following byte stream:

public class SerializeTest {
    private static final byte[] serializedForm = new byte[] {(byte)0xac, (byte)0xed.0x00.0x05.0x73.0x72.0x00.0x06.0x50.0x65.0x72.0x69.0x6f.0x64.0x40.0x7e, (byte)0xf8.0x2b.0x4f.0x46, (byte)0xc0, (byte)0xf4.0x02.0x00.0x02.0x4c.0x00.0x03.0x65.0x6e.0x64.0x74.0x00.0x10.0x4c.0x6a.0x61.0x76.0x61.0x2f.0x75.0x74.0x69.0x6c.0x2f.0x44.0x61.0x74.0x65.0x3b.0x4c.0x00.0x05.0x73.0x74.0x61.0x72.0x74.0x71.0x00.0x7e.0x00.0x01.0x78.0x70.0x73.0x72.0x00.0x0e.0x6a.0x61.0x76.0x61.0x2e.0x75.0x74.0x69.0x6c.0x2e.0x44.0x61.0x74.0x65.0x68.0x6a,
            (byte)0x81.0x01.0x4b.0x59.0x74.0x19.0x03.0x00.0x00.0x78.0x70.0x77.0x08.0x00.0x00.0x00.0x66, (byte)0xdf.0x6e.0x1e.0x00.0x78.0x73.0x71.0x00.0x7e.0x00.0x03.0x77.0x08.0x00.0x00.0x00, (byte)0xd5.0x17.0x69.0x22.0x00.0x78
    };
    public static void main(String[] args) throws IOException, ClassNotFoundException {
        Period p = (Period) deserialize(serializedForm);
        System.out.println(p);
    }
    public static Object deserialize(byte[] sf) {
        try {
            InputStream is = new ByteArrayInputStream(sf);
            ObjectInputStream ois = new ObjectInputStream(is);
            return ois.readObject();
        } catch (Exception e) {
            throw newIllegalArgumentException(e.toString()); }}}Copy the code

The result of deserialization is:

PeriodA{start=Sat Jan 02 04:00:00 CST 1999, end=Mon Jan 02 04:00:00 CST 1984}
Copy the code

The “implicit constructor” of deserialization, as described earlier, builds an object that violates our construction constraints by starting later than end, which can be dangerous for the program. For how this byte stream can be forged, see the Java Object Serialization Specification, which describes the Serialization format.

Therefore, effetive Java emphasizes many times that implementing serializable classes must write readObject methods and ensure constraint relationships.

private void readObject(ObjectInputStream stream) throws IOException,ClassNotFoundException{
   stream.defaultReadObject();
   if (start.after(end)) {
     throw newIllegalArgumentException(); }}Copy the code

However, the constraint can still be broken by forgery of the byte stream, which provides a valid Period object with two references to instances of two member variables that can be manipulated arbitrarily after instantiation. The following demonstration:

public class MutablePeriod {
    // The valid period object
    public final Period period;
    // Two additional references
    public final Date start;
    public final Date end;

    public MutablePeriod(a) {
        try {
            ByteArrayOutputStream bos = new ByteArrayOutputStream();
            ObjectOutputStream out = new ObjectOutputStream(bos);
            out.writeObject(new Period(new Date(), new Date()));
            // Attach additional references
            byte[] ref = { 0x71.0.0x7e.0.5 };
            bos.write(ref);
            ref[4] = 4;
            bos.write(ref);

            ObjectInputStream in = new ObjectInputStream(
                    new ByteArrayInputStream(bos.toByteArray()));
            period = (Period) in.readObject();
            start = (Date) in.readObject();
            end = (Date) in.readObject();
        } catch (Exception e) {
            throw newAssertionError(e); }}public static void main(String[] args) {
        MutablePeriod mp = new MutablePeriod();
        Period p = mp.period;
        Date pEnd = mp.end;
      
        pEnd.setYear(78);
        System.out.println(p);
        pEnd.setYear(69); System.out.println(p); }}Copy the code

The result is:

PeriodA{start=Fri Aug 23 12:26:53 CST 2019, end=Wed Aug 23 12:26:53 CST 1978}
PeriodA{start=Fri Aug 23 12:26:53 CST 2019, end=Sat Aug 23 12:26:53 CST 1969}
Copy the code

The root cause of the problem is that the readObject method does not make a protective copy of the newly created member variable object, and then makes a protective copy of the deserialized object into the newly created member variable object, so that the attacker’s two additional references do not modify the variables in the instantiated object:

 private void readObject(ObjectInputStream stream) throws IOException,ClassNotFoundException {
   stream.defaultReadObject();
   // Protective copy
   start = new Date(start.getTime());
   end = new Date(end.getTime());
   if (start.after(end)) {
     throw newIllegalArgumentException(); }}Copy the code

It should be noted that the protective copy should be tested before the constraint relationship, and the shallow copy such as clone is not used.

A more common approach is to serialize the proxy pattern, as described below.

Serialized proxy mode

A serialization proxy is a simple shell of a serializable private static class. This shell is called a serializable proxy. It has a constructor whose parameters are the proxied class, which copies the proxied class’s parameters when constructed. Serialization actually serializes proxy classes by providing writeReplace, and the readObject interface rejects direct serialization, allowing only proxy deserialization. Proxy classes are deserialized to proxied classes by providing readResolve. See the code for details:

public class Period implements Serializable {

    private static final long serialVersionUID = 4647424730390249716L;
    private Date start;
    private Date end;

    public Period(Date start, Date end) {
        if (start.after(end)) {
            throw new IllegalArgumentException();
        }
        this.start = start;
        this.end = end;
    }

    @Override
    public String toString(a) {
        return "PeriodA{" +
                "start=" + start +
                ", end=" + end +
                '} ';
    }
    public Date getStart(a) {
        return start;
    }
    public Date getEnd(a) {
        return end;
    }

    private void readObject(ObjectInputStream stream) throws IOException, ClassNotFoundException {
        // Direct deserialization is not allowed, only instantiation via deserialization proxy
        throw new InvalidObjectException("Deserialization by proxy only");
    }
    private Object writeReplace(a) {
        // Serialize the proxy
        return new SerializeProxy(this);
    }
		
    // Serialize the proxy class
    private class SerializeProxy implements Serializable {
        private final Date start;
        private final Date end;
        // Copy proxy class variables by constructing them
        public SerializeProxy(Period period) {
            this.start = period.getStart();
            this.end = period.getEnd();
        }
        // Deserialize to the proxied class
        private Object readResolve(a) {
            return newPeriod(start, end); }}}Copy the code

reference

[1] Effective Java Chapter 11