ProtoBuf, as a cross-platform, language-independent and extensible method for serializing structured data, has been widely used in network data exchange and storage. With the development of the Internet, the heterogeneity of the system will be more prominent, cross-language demand will be more obvious, at the same time GRPC also has a potential to replace RESTful, and Protobuf as G RPC cross-language, high-performance weapon, it is necessary for our technical people

An in-depth understanding of ProtoBuf principles will lay the foundation for future technology updates and selection.

I will be the past learning process and practical experience, summarized into a series of articles, with you to discuss the study, I hope you can gain, of course, there are incorrect places welcome everyone to criticize and correct.

This series of articles mainly includes:

  1. An in-depth understanding of ProtoBuf principles and engineering practices (Overview)
  2. In-depth understanding of ProtoBuf principles and engineering practices (coding)
  3. In-depth understanding of ProtoBuf principles and engineering practices (serialization)
  4. In-depth understanding of ProtoBuf principles and engineering practices (Engineering Practice)

What is a ProtoBuf

ProtoBuf(Protocol Buffers) is a cross-platform, language-independent, extensible method of serializing structured data for network data exchange and storage.

In the serialization of structured data mechanism, Protobuf is flexible, efficient, and automatic, relatively common XML, JSON, describing the same information, Protobuf serialization after the data volume is smaller, serialization/deserialization faster, easier.

Once you have defined the data structure for the data you want to work with, you can use ProtoBuf’s code generation tool to generate the relevant code. Using ProtoBuf to describe a data structure once, you can easily read and write your structured data in a variety of languages (Proto3 supports C++, Java, Python, Go, Ruby, Objective-C, C#) or from a variety of streams.

Why Protobuf

While you might think that Google invented ProtoBuf to address serialization speed, that’s not really the case.

ProtoBuf was originally used by Google to solve the request/response protocol for indexing servers. Before ProtoBuf, Google already had a request/response format that was used to manually codec a request/response. It also supports multiple versions of the protocol, but the code is less elegant:

if (protocolVersion=1) { doSomething(); } else if (protocolVersion=2) { doOtherThing(); }...

A very explicit formatting protocol can make the new protocol very complicated. This is because the developer must ensure that the new protocol is understood by all servers between the request initiator and the actual server processing the request before the switch can be switched to start using the new protocol.

This is the problem that every server developer has encountered with lower version compatibility and compatibility with old and new protocols.

To solve these problems, ProtoBuf was born.

ProtoBuf was originally designed for the following two features:

  • It’s easier to introduce new fields, and an intermediate server that doesn’t need to check the data can simply parse and pass the data without knowing all the fields.
  • The data format is more self-descriptive and can be handled in a variety of languages (C++, Java, etc.).

This version of ProtoBuf still needs to be parsed by hand.

But as the system evolves, ProtoBuf gets more features:

  • Automatically generated serialization and deserialization code eliminates the need for manual parsing. (official provide automatic generation of code tools, each language platform has the basic).
  • In addition to being used for data interchange, ProtoBuf is used as a convenient, self-describing format for persisting data.

ProtoBuf is now Google’s common language for data exchange and storage. There are 48,162 different message types defined in the Google code tree, including 12,183.proto files. They are used both for RPC systems and for persisting data in various storage systems.

ProtoBuf was originally created to address server-side compatibility between old and new protocols (high and low versions), and was thoughtfully named “Protocol Buffer.” It was only later developed to transmit data.

Protocol Buffers are named for:

Why the name “Protocol Buffers”?

The name originates from the early days of the format, before we had the protocol buffer compiler to generate classes for us. At the time, there was a class called ProtocolBuffer which actually acted as a buffer for an individual method. Users would add tag/value pairs to this buffer individually by calling methods like AddValue(tag, value). The raw bytes were stored in a buffer which could then be written out once the message had been constructed.

Since that time, the “buffers” part of the name has lost its meaning, but it is still the name we use. Today, people usually use the term “protocol message” to refer to a message in an abstract sense, “protocol buffer” to refer to a serialized copy of a message, and “protocol message object” to refer to an in-memory object representing the parsed message.

How to use ProtoBuf

3.1 ProtoBuf protocol workflow

As you can see, for the serialization protocol, the user only needs to focus on the business object itself, the IDL definition, and the serialization and deserialization code needs to be generated only through the tool.

3.2 ProtoBuf message definition

The ProtoBuf message is described in the IDL file (.proto). Here is the message descriptor Customer.proto used in this example:

syntax = "proto3";

package domain;

option java_package = "com.protobuf.generated.domain";
option java_outer_classname = "CustomerProtos";

message Customers {
    repeated Customer customer = 1;
}

message Customer {
    int32 id = 1;
    string firstName = 2;
    string lastName = 3;

    enum EmailType {
        PRIVATE = 0;
        PROFESSIONAL = 1;
    }

    message EmailAddress {
        string email = 1;
        EmailType type = 2;
    }

    repeated EmailAddress email = 5;
}

The above message is simple. Customers contains multiple Customers, and Customer contains an ID field, a FirstName field, a LastName field, and a collection of Email.

In addition to these definitions, there are three lines at the top of the file that help the code generator:

  1. First, syntax = “proto3” for the IDL syntax version, currently there are two versions proto2 and proto3, the two versions of the syntax are not compatible, if not specified, the default syntax is proto2. Proto3 is used in this article because it supports many more languages and has a more concise syntax than Proto2.
  2. Secondly, there is a package domain; Definition. This configuration is used to nest generated classes/objects.
  3. There is an option JAVA_PACKAGE definition. The generator also uses this configuration to nest the generated sources. The difference here is that this only applies to Java. When you create code in Java and when you create code in JavaScript, two configurations are used to make the generator behave differently. In other words, the Java class is in the package. Com. Protobuf generated. To create under the domain, and the the JavaScript object is created under the packet domain.

ProtoBuf provides more options and data types than we’ll cover in detail in this article, but if you’re interested, see here.

3.3 Code generation

First install the ProtoBuf compiler, Protoc, here is a detailed installation tutorial, after the installation is complete, you can use the following commands to generate Java source code:

protoc --java_out=./src/main/java ./src/main/idl/customer.proto

Execute the command from the root path of the project and add two arguments: java_out, which defines./ SRC /main/ Java/as the output directory for Java code; The. / SRC/main/idl/customer. The proto is. Proto file directory.

The generated code is very complex, but fortunately it is very simple to use.

CustomerProtos.Customer.EmailAddress email = CustomerProtos.Customer.EmailAddress.newBuilder() .setType(CustomerProtos.Customer.EmailType.PROFESSIONAL) .setEmail("[email protected]").build(); CustomerProtos.Customer customer = CustomerProtos.Customer.newBuilder() .setId(1) .setFirstName("Lee") .setLastName("Richardson") .addEmail(email) .build(); // Serialized Byte [] BinaryInfo = Customer. toByteArray(); System.out.println(bytes_String16(binaryInfo)); System.out.println(customer.toByteArray().length); / / deserialization CustomerProtos. Customer anotherCustomer = CustomerProtos. Customer. ParseFrom (binaryInfo); System.out.println(anotherCustomer.toString());

3.4 Performance data

We simply take Customers as the model to construct and select small objects, ordinary objects and large objects respectively for performance comparison.

Serialization time and data size comparison after serialization

Deserialization time

See the official Benchmark for more performance data

Four,

ProtoBuf ProtoBuf: ProtoBuf ProtoBuf: ProtoBuf ProtoBuf: ProtoBuf ProtoBuf: ProtoBuf ProtoBuf: ProtoBuf ProtoBuf

Advantages:

1. High efficiency

From the point of view of data volume after serialization, compared with XML, JSON and other text protocols, ProtoBuf encodes by T-(L)-V (Tag – Length – Value), without the need for “, {,}, : and other delimiters to structure information, and uses VARINT compression at the coding level. So describe the same information, PROTOBUF serialized volume is much smaller, in the network transmission consumption of network traffic is less, and then for network resources, performance requirements are very high scene, PROTOBUF protocol is a good choice.

// Let's make a simple comparison // To describe the following JSON data {"id":1,"firstName":"Chris","lastName":"Richardson","email":[{"type":"PROFESSIONAL","email":"[email protected]"}]} # The size of the data serialized with JSON is 118byte 7b226964223a312c2266697273744e616d65223a224368726973222c226c6173744e616d65223a2252696368617264736f6e222c22656d61696c223a 5b7b2274797065223a2250524f46455353494f4e414c222c22656d61696c223a226372696368617264736f6e40656d61696c2e636f6d227d5d7d # The data serialized using ProtoBuf is 48byte 0801120543687269731a0a52696368617264736f6e2a190a156372696368617264736f6e40656d61696c2e636f6d1001

In terms of serialization/deserialization speed, ProtoBuf can serialize/deserialize faster than XML or JSON, 20-100 times faster than XML.

2. Cross-platform and multi-language support

Protobuf is platform-independent, and can be used for barrier-free communication between Android and PC, as well as C# and Java.

Proto3 supports C++, Java, Python, Go, Ruby, Objective-C, and C#.

3. Good expansibility and compatibility

ProtoBuf is backward compatible, which allows you to update the data structure without changing the old version. This is a problem ProtoBuf was originally designed to solve. Because the compiler will not recognize the new field will not handle.

4. Easy to use

ProtoBuf provides a set of compilation tools that automatically generate serialization and deserialization boilerboard code, allowing developers to focus only on the business data IDL, simplifying the complexity of encoding, decoding and multi-language interaction.

Disadvantages:

Poor readability and lack of self-description

XML and JSON are self-describing, while ProtoBuf is not.

ProtoBuf is a binary protocol. The encoded data is unreadable. If you do not have an IDL file, you will not be able to understand the binary data stream and will not be debug-friendly.

However, Charles already supports the Protobuf Protocol, so you can import a description file for the data. See Charles Protocol Buffers for details

In addition, because there is no IDL file to parse the binary data stream, ProtoBuf can protect the data to a certain extent, raising the threshold of core data to be hacked, reducing the risk of core data theft and crawling.

Five, the reference

  1. wikipedia
  2. Serialization and deserialization
  3. The official Benchmark
  4. Charles Protocol Buffers
  5. choose-protocol-buffers

Author: Li Guanyun