Protobuf basic tutorial

Recently, I am interested in RPC serialization, but I found that there are not much information on Protobuf, so I found the introductory guide on Java using Protocol Buffer on the official website, and translated it in poor English for my fellow users. The original address

Example Start: Define the Protocol Format

Example: for a simple addressbook, the.proto file is addressbook.proto.

syntax = "proto2";

package tutorial;

option java_package = "com.example.tutorial";
option java_outer_classname = "AddressBookProtos";

message Person {
    required string name = 1;
    required int32 id = 2;
    optional string email = 3;

    enum PhoneType {
        MOBILE = 0;
        HOME = 1;
        WORK = 2;
    }

    message PhoneNumber {
        required string number = 1;
        optional PhoneType type = 2 [default = HOME];
    }

    repeated PhoneNumber phones = 4;
}

message AddressBook {
    repeated Person people = 1;
}
Copy the code

The.proto file begins by declaring packages to avoid naming conflicts. In JAVA, the package name can be used as a package in JAVA unless you specify javA_Package specifically, which we did in addressbook.proto.

Even if you specify a JAVA_package, you should need to define a regular package in order to cause conflicts in the Protocol Buffers namespace.

After the package definition declaration, you also see the options in the two Java specifications: JAVA_Package, javA_outer_className. Java_package specifies which package the generated Java classes need to be placed under, and if this value is not specified, the value specified by the package is used. The javA_outer_className option defines the classname, which includes this. All classes in the proto file, if not explicitly specified, will have the name of the class named in camel case. For example, my_proto. Proto generates the MyProto class name by default.

Next, there is the message definition.

A message is an aggregation of fields of type.

A number of standard simple data types are available for fields, including:

  • bool
  • int32
  • float
  • double
  • string

You can also add additional structure types for use by your message field types. In the previous example, the Person message contains the PhoneNumber message, and the AddressBook message contains the Person message. You can also define the enumerated type enum, which you can use if you want the possible values of your fields to be in a predetermined list. In this phone book example, there are three types of phone numbers: MOBILE, HOME, and WORK.

=1, =2 represent the unique “token” used in binary encoding for the identification field on each element. Tag numbers 1-15 require fewer bytes than the high ones, so you can use these numbers for common or frequently reused tags, and the remaining 16 to higher tags are used less frequently in optional elements. Each element in a repeating field needs to be re-encoded with the tag number tag, so repeating fields are the best choice for this optimization.

Each field must be annotated with one of the following modifiers:

  • Required: This field must be provided otherwise the message is considered “uninitialized”. Attempting to build an uninitialized message throws a RuntimeException. Parsing an uninitialized message throws an IOException. Otherwise, the Required field behaves exactly like the Optional field.

  • Optional: Indicates an optional field with or without a value. If an optional field does not have a set value, it is initialized with its default value. For simple types, you can explicitly specify your own default values, as we did in the example (phoneNumber’s Type field). Otherwise, the system defaults: the value type defaults to 0, the character type defaults to empty string, and the boo type defaults to false. For embedded messages, the default value is always “default instance” or “prototype” of the message, without setting any fields.

  • Repeated: This field can be repeated as many times as you want. The order of repeated values is preserved in the Protocol buffer. You can think of repeating fields as dynamic arrays.

Required Is Forever You need to be very careful to mark the field as a Required modifier. If at some point you want to stop writing or sending a required field, you may have problems changing the field to optional — the old Reader will reject or discard the message because it doesn’t have the value. You should consider writing application-specific validation routines for buffers. Some Google engineers speculated that using Required would do more harm than good. They prefer opyional and repeated. In any case, this view is not widespread.

You can also read the full tutorial in the Protocol Buffer language guide. Don’t try to find something like inheritance; the Protocol Buffer does not support this.

Compile your Protocol Buffers

Now that you have a.proto file, the next thing you need to do is generate an AddressBook class that you will read and write to. Therefore, you need to run the POtocol buffer compiler protoc handler. Proto:

  • If you do not have a compiler installed, download the installation package and install it according to README.

Protocol Compilation and Installation

The Protocol compiler is written in C++. If you use C++, follow the C++ installation instructions to install protoc. For non-C ++ users, the easiest way to install the Protocol compiler is to download the pre-built binary from the Release page: github.com/protocolbuf…

Protoc -$VERSION-$platform.zip Contains binary Protoc files as well as a series of standard.proto files distributed with Protobuf. If you also want to find the old version, can be found at https://repo1.maven.org/maven2/com/google/protobuf/protoc/.

These pre-built binaries are only available in distributions.

Protobuf runtime installation

Protobuf supports several different programming languages. For each language, you can refer to the various language instructions in the source code.

  • Now to run the compiler, you need to specify the source directory (where the application source code resides – default to the current directory if you don’t provide a value), the destination directory (where you want the code to be generated, usually something like $SRC_DIR), and the path to.proto. In this case, you can do the following:

protoc -I=$SRC_DIR --java_out=$DST_DIR $SRC_DIR/addressbook.proto

Because you want to generate Java classes, you see the — JAVA_out option, a similar option available in other programming languages.

This will be your specified target directory to generate com/example/tutorial/AddressBookProtos Java.

Protocol Buffer API

Let’s take a look at some of the generated code and look at some of the classes and methods created for you by the compiler. If you look at the AddressBookProtos.java class, you can see that a class called AddressBookProtos is defined, embedded with the classes that you specify for each message in the AddressBooProto file. Each class has its own Builder class that you can use to create a corresponding instance of the class. You can see more details about Builders vs. Messages below.

Messages and Builders have automatically generated access methods for each field of the message. Message has only getters methods, builders have getters, setters methods. Here are some ways to access the Person class (the implementation has been ignored for brevity) :

// required string name = 1;
public boolean hasName(a);
public String getName(a);

// required int32 id = 2;
public boolean hasId(a);
public int getId(a);

// optional string email = 3;
public boolean hasEmail(a);
public String getEmail(a);

// repeated .tutorial.Person.PhoneNumber phones = 4;
public List<PhoneNumber> getPhonesList(a);
public int getPhonesCount(a);
public PhoneNumber getPhones(int index);
Copy the code

Meanwhile, the Person.Builder inner class has getters, setters:

// required string name = 1;
public boolean hasName(a);
public java.lang.String getName(a);
public Builder setName(String value);
public Builder clearName(a);

// required int32 id = 2;
public boolean hasId(a);
public int getId(a);
public Builder setId(int value);
public Builder clearId(a);

// optional string email = 3;
public boolean hasEmail(a);
public String getEmail(a);
public Builder setEmail(String value);
public Builder clearEmail(a);

// repeated .tutorial.Person.PhoneNumber phones = 4;
public List<PhoneNumber> getPhonesList(a);
public int getPhonesCount(a);
public PhoneNumber getPhones(int index);
public Builder setPhones(int index, PhoneNumber value);
public Builder addPhones(PhoneNumber value);
public Builder addAllPhones(Iterable<PhoneNumber> value);
public Builder clearPhones(a);
Copy the code

As you can see, each field has simple Java Bean-style getter and setter methods. Each field also has its has method, which returns true if the field value is set. Finally, each field has a clear method that returns the field to its blank state.

Repeated fields have some additional methods:

  • The getXXXCount method is used to get the size of the tears.
  • Added get and set methods to get elements by element index subscript (public PhoneNumber getPhones(int index); And public Builder setPhones(int index, PhoneNumber value); .
  • *The add and addAllThe appends a new element (list) to the list.

Notice that all of these access methods use camel name, even though.proto files use lowercase and underscore. This transformation is done automatically by the Protocol compiler, so the generated classes conform to the Java style standard specification. You should always name fields in.proto with lowercase and underscore names. Refer to the style guide for more good.proto naming styles. For more detailed information on specific fields generated by the compiler, see the Java Generation Code Reference Guide

Enumeration and inner classes

The generated code contains an enumerated PhoneType nested in the Person class:

public static enum PhoneType {
  MOBILE(0.0),
  HOME(1.1),
  WORK(2.2),; . }Copy the code

The inner class Person.phonenumber is also generated, as you would expect, as an inner class of Person.

Builders vs. Messages

The Classes generated by the Protocol Buffer compiler are immutable. Once a Message object has been created, it cannot be changed, similar to Java’s String class. To build a message, you must build a Builder and set the values of any fields you want to set, then call builders’ Builder method. (For those of you who have used Lombok, this is very similar to the @Builder annotation)

You may also notice that each Builder’s method returns another Builder. The object returned is the same Builder from which you called the method. This is a convenient way to string multiple setters together in one line of code.

Here’s an example of creating an instance of Person:

Person john =
  Person.newBuilder()
    .setId(1234)
    .setName("John Doe")
    .setEmail("[email protected]")
    .addPhones(
      Person.PhoneNumber.newBuilder()
        .setNumber("555-4321")
        .setType(Person.PhoneType.HOME)
        .build())
    .build();
Copy the code

The standard message method

Each Message and Builder class also contains a series of other methods that let you examine and manipulate messages:

  • IsInitialized () : Checks that all required fields are already set /
  • ToString () : Returns a readable message representation, often especially useful for debugging.
  • MergeFrom (Message Other) : (available only in Builder) merges other into this Message
  • Clear () : (only in Builder) clears all fields and restores them to their original empty state

Parsing and serialization

Finally, every Proto Buffer class has some methods for reading and writing binary.

  • Byte [] toByteArray() : serializes message, returns an array of bytes.
  • Static Person parseFrom(byte[] data) : Parse the given byte array into a message.
  • Void writeTo(OutputStream Output) : Serializes the message and writes it to an output stream.
  • Static Person parseFrom(InputStream Input) : Reads an input stream and parses a message from it.

These are just two sets of operations provided for serialization and parsing (deserialization). For a complete list, see the Message API help documentation.

Protocol Buffers and the object-oriented Protocol Buffer classes are basically dumb data holders (like structs in C); In the object model, they are not first-class citizens. If you want to add richer behavior to the generated classes, the best way to do this is to wrap the classes generated by the Protocol Buffer in an application-specific class. Wrapping protocol buffers is also a good idea if you have no control over the design of.proto files (for example, if you are reusing a file from another project). In this case, you can use wrapper classes to create interfaces that are better suited to your application’s unique environment: hide some data and methods, expose convenient functions, and so on. This breaks internal mechanics and is not good object-oriented practice by any means.

Write a Message

Now try using the Protocol Buffer class. First of all, I hope that the address book can write personal details in the address book file. To do this, you need to create and populate class instances of the Protocol Buffer and write them to an output stream.

Here is a program that reads an AddressBook from a file, adds a new Person to the AddressBook based on the user’s input, and writes the new AddressBook back to the file.


import com.example.tutorial.AddressBookProtos.AddressBook;
import com.example.tutorial.AddressBookProtos.Person;
import java.io.BufferedReader;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.InputStreamReader;
import java.io.IOException;
import java.io.PrintStream;

class AddPerson {
  // This function fills in a Person message based on user input.
  static Person PromptForAddress(BufferedReader stdin, PrintStream stdout) throws IOException {
    Person.Builder person = Person.newBuilder();

    stdout.print("Enter person ID: ");
    person.setId(Integer.valueOf(stdin.readLine()));

    stdout.print("Enter name: ");
    person.setName(stdin.readLine());

    stdout.print("Enter email address (blank for none): ");
    String email = stdin.readLine();
    if (email.length() > 0) {
      person.setEmail(email);
    }

    while (true) {
      stdout.print("Enter a phone number (or leave blank to finish): ");
      String number = stdin.readLine();
      if (number.length() == 0) {
        break;
      }

      Person.PhoneNumber.Builder phoneNumber =
        Person.PhoneNumber.newBuilder().setNumber(number);

      stdout.print("Is this a mobile, home, or work phone? ");
      String type = stdin.readLine();
      if (type.equals("mobile")) {
        phoneNumber.setType(Person.PhoneType.MOBILE);
      } else if (type.equals("home")) {
        phoneNumber.setType(Person.PhoneType.HOME);
      } else if (type.equals("work")) {
        phoneNumber.setType(Person.PhoneType.WORK);
      } else {
        stdout.println("Unknown phone type. Using default.");
      }

      person.addPhones(phoneNumber);
    }

    return person.build();
  }

  // Main function: Reads the entire address book from a file,
  // adds one person based on user input, then writes it back out to the same
  // file.
  public static void main(String[] args) throws Exception {
    if(args.length ! =1) {
      System.err.println("Usage: AddPerson ADDRESS_BOOK_FILE");
      System.exit(-1);
    }

    AddressBook.Builder addressBook = AddressBook.newBuilder();

    // Read the existing address book.
    try {
      addressBook.mergeFrom(new FileInputStream(args[0]));
    } catch (FileNotFoundException e) {
      System.out.println(args[0] + ": File not found. Creating a new file.");
    }

    // Add an address.
    addressBook.addPeople(
      PromptForAddress(new BufferedReader(new InputStreamReader(System.in)),
                       System.out));

    // Write the new address book back to disk.
    FileOutputStream output = new FileOutputStream(args[0]); addressBook.build().writeTo(output); output.close(); }}Copy the code

Read a Message

Of course, if you can’t get any information out of the address book, it won’t help. This example shows reading the file created by the previous example and printing all the information:

import com.example.tutorial.AddressBookProtos.AddressBook;
import com.example.tutorial.AddressBookProtos.Person;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.PrintStream;

class ListPeople {
  // Iterates though all people in the AddressBook and prints info about them.
  static void Print(AddressBook addressBook) {
    for (Person person: addressBook.getPeopleList()) {
      System.out.println("Person ID: " + person.getId());
      System.out.println(" Name: " + person.getName());
      if (person.hasEmail()) {
        System.out.println(" E-mail address: " + person.getEmail());
      }

      for (Person.PhoneNumber phoneNumber : person.getPhonesList()) {
        switch (phoneNumber.getType()) {
          case MOBILE:
            System.out.print(" Mobile phone #: ");
            break;
          case HOME:
            System.out.print(" Home phone #: ");
            break;
          case WORK:
            System.out.print(" Work phone #: ");
            break; } System.out.println(phoneNumber.getNumber()); }}}// Main function: Reads the entire address book from a file and prints all
  // the information inside.
  public static void main(String[] args) throws Exception {
    if(args.length ! =1) {
      System.err.println("Usage: ListPeople ADDRESS_BOOK_FILE");
      System.exit(-1);
    }

    // Read the existing address book.
    AddressBook addressBook =
      AddressBook.parseFrom(new FileInputStream(args[0])); Print(addressBook); }}Copy the code

Extended Protocol Buffer

When you publish code using Protocol Buffer, you will no doubt want to improve the definition of protocol Buffer. If you want your new buffer to be backward compatible and your old buffer to be forward compatible — and you definitely want this — there are some rules to follow. In new versions of the Protocol buffer you need to follow:

  • You cannot change the tag number of an existing field
  • You cannot add or remove any required fields
  • You can remove optional or repeated fields
  • You can add new optional or repeated fields, but then you must use the new tag number (that is, the tag number is not used in the protocol buffer or has been removed).

If you follow these rules, the old code can kindly read the new message and ignore the new fields. In old code, the optional fields that are deleted will use their default values, and the repeated fields will be empty. The new code obviously reads the old message. However, remember that new optional fields do not appear in old messages, so you need to check to see if they have values, using has_, Or provide a default value for the field in your.proto file using [default = value] after the TAB number for the field. If the optional field does not specify a default value, it is automatically assigned based on that type: an empty string for string, false for Boolean, and 0 for numeric. Note that if you add a repeated field, your new code will have no way of knowing whether the field is empty (new code) or was never set to a value (old code) because it has no has_ method.

Advanced usage

Protocol Buffers do more than provide simple access and serialization capabilities. Access to the Java API help documentation.

The Protocol Message class embraces a key feature — reflection. You can iterate through all the fields of Message and manipulate their values without having to write code for the specified message type. One useful use of reflection is to convert protocol messages between various encodings, such as XML or JSON. A more advanced use of reflection is to find the difference between two messages of the same type, or to develop a “regular expression” for protocol messages that you can write to match some message content. Using Protocol Buffers can be applied to a wider range of problems, as you would expect, if you let your imagination run wild.