An overview,

1, the preface

  • Common serialization protocols:
    • JSON
    • XML
    • Hessian
    • Thrift
  • Advantages of ProtoBuf:
    • Fast parsing of Protobuf (i.e. fast serialization and deserialization),

    • Small space, and good compatibility, suitable for data storage or network communication between data transmission.

2. JDK native serialization

package protobuf;

public class Teacher implements Serializable {

    private long teacherId;
    private String name;
    private int age;
    private List<String> courses = new ArrayList<>();

    public Teacher(long teacherId, String name, int age) {
        this.teacherId = teacherId;
        this.name = name;
        this.age = age;
    }

    // getter and setter...
    @Override
    public String toString(a) {
        return "Teacher{" +
                "teacherId=" + teacherId +
                ", name='" + name + ' '' + ", age=" + age + ", courses=" + courses + '}'; }}Copy the code
public class Test_JDK {
    public static void main(String[] args) throws Exception {
        Teacher tim = new Teacher(1L."Tim".34);
        tim.setCourses(new ArrayList<>(Arrays.asList("aaaa"."aaaa")));
        / / the serialization
        byte[] byteArray = serialize(tim);
        System.out.println(Arrays.toString(byteArray));
        // deserialize
        Teacher teacher = deserialize(byteArray);
        System.out.println(teacher);
    }

    / / the serialization
    private static byte[] serialize(Teacher tim) throws IOException {
        ByteArrayOutputStream bos = new ByteArrayOutputStream();
        ObjectOutputStream oos = new ObjectOutputStream(bos);
        oos.writeObject(tim);
        return bos.toByteArray();
    }

    // deserialize
    private static Teacher deserialize(byte[] bytes) throws Exception {
        ObjectInputStream ois = new ObjectInputStream(new ByteArrayInputStream(bytes));
        return(Teacher) ois.readObject(); }}Copy the code
  • The output
[-84, -19.0.5.115.114.0.16.112.114.111.116.111.98.117.102.46.84.101.97.99.104.101.114, -109.117, -76.44.106.50, -50, -61.2.0.4.73.0.3.97.103.101.74.0.9.116.101.97.99.104.101.114.73.100.76.0.7.99.111.117.114.115.101.115.116.0.16.76.106.97.118.97.47.117.116.105.108.47.76.105.115.116.59.76.0.4.110.97.109.101.116.0.18.76.106.97.118.97.47.108.97.110.103.47.83.116.114.105.110.103.59.120.112.0.0.0.34.0.0.0.0.0.0.0.1.115.114.0.19.106.97.118.97.46.117.116.105.108.46.65.114.114.97.121.76.105.115.116.120, -127, -46.29, -103, 
-57.97, -99.3.0.1.73.0.4.115.105.122.101.120.112.0.0.0.2.119.4.0.0.0.2.116.0.4.97.97.97.97.113.0.126.0.6.120.116.0.3.84.105.109]

Teacher{teacherId=1, name='Tim', age=34, courses=[aaaa, aaaa]}
Copy the code

Serialization via Protobuf

  • First download:
https:/ / github.com/protocolbuffers/protobuf/releases/download/v3.7.0/protobuf-java-3.7.0.zip
https:/ / github.com/protocolbuffers/protobuf/releases/download/v3.7.0/protoc-3.7.0-win64.zip
Copy the code
  • A teacher. Proto needs to be defined
syntax = "proto2";
option java_package = "edu.xpu";
option java_outer_classname = "TeacherSerializer";
message Teacher{
	required int64 teacherId = 1;
	required int32 age = 2;
	required string name = 3;
	repeated string courses = 4;
}

message xxx {
  // Field rules: Required -> The field must appear only once
  // Field rule: optional -> Field can occur 0 or 1 times
  A field can be repeated -> as many times as you like (including 0).
  // Type: int32, int64, sint32, sint64, string, 32-bit....
  // Field numbers: 0 ~ 536870911 (excluding 19000 to 19999)Field rule type name = Field number; }Copy the code

  • The classes used in the currently generated Java files also require us to introduce Protobuf dependencies:
<dependency>
    <groupId>com.google.protobuf</groupId>
    <artifactId>protobuf-java</artifactId>
    <version>3.13.0</version>
</dependency>
Copy the code
  • Copy the generated Java files into the project and test the serialization and deserialization
import java.util.Arrays;
public class ProtobufTest {
    public static void main(String[] args) throws Exception {
        byte[] bytes = serialize();
        System.out.println(Arrays.toString(bytes));

        TeacherSerializer.Teacher teacher = deserialize(bytes);
        System.out.println(teacher);
    }
    / / the serialization
    private static byte[] serialize(){
        // construct Teacher
        TeacherSerializer.Teacher.Builder builder = TeacherSerializer.Teacher.newBuilder();
        builder.setName("Tim")
                .setAge(34)
                .setTeacherId(1L)
                .addCourses("Java");
        TeacherSerializer.Teacher teacher = builder.build();
        return teacher.toByteArray();
    }
    // deserialize
    private static TeacherSerializer.Teacher deserialize(byte[] bytes) throws Exception {
        returnTeacherSerializer.Teacher.parseFrom(bytes); }}Copy the code
  • JavaBean objects with the same properties, but the cost of Protobuf serialization and deserialization is much lower, in stark contrast to the size of Java native serialization:

Protobuf characteristics and basic principles

1. Reduce the field length

  • For a piece of information,jsonIs expressed as:
{ "age": 30."name": "zhangsan"."height": 175.33."weight": 140 }
Copy the code
  • There is a lot of redundant data
  • If you store it this way, you can save a lot of data

2. Compress further

  • Assuming thatheightThis field isnull, we actually do not need to pass this field, at this time we need to pass the data is:

  • The tag technology

  • We used the tag for each field | value way to store, record two kinds of information in the tag, one is the value corresponding to the serial number of fields, the other is the value of the data type (such as plastic or string, etc.), because of the tag number field information, So even if the value of the height field is not passed, the pair will be matched correctly by number.

  • In traditional JSON serialization, tags store strings

  • With a protobuf serialization, a tag stores a binary number and typically takes only one byte

3. Protobuf Indicates the supported field type

Since a tag usually takes up one byte and the overhead is relatively small, the overall storage footprint of a protobuf is relatively small.

4. Compress further

  • In the actual transmission process, integers will be passed. We know that integers occupy 4 bytes in the computer, but most of the integers, such as price, inventory, etc., are relatively small integers, which do not actually use 4 bytes. 00000000 00000000 00000000 01111111 (4 bytes 32 bits)

  • Can be stored in the last byte. Protobuf defines Varint as a data type that can store integers of different lengths, further compressing the data.

  • But there is a problem. Negative numbers in computers are represented by a complement. For -1, the binary representation is: 11111111 11111111 11111111 11111111 11111111 (4 bytes 32 bits) 11111111.

  • But -1 is a relatively simple number, and we can use an algorithm to compress negative numbers further, eventually using 2 bytes to represent -1.

5, faster

  • Although the data is now very small, there is still a lot of room for improvement in parsing speed because each field is usedtag|valueTo represent, in thetagcontainsvalueWhile different data types have different sizes, such as ifvalueisboolType, we know it must take up a byte, the program fromtagAfter reading a byte directly can be resolvedvalue, very fast, whilejsonString parsing is required to do this.

reference

  • www.jianshu.com/p/72108f0ae…