In Kafka, messages are sent as byte arrays, so you need a method to serialize message objects to byte arrays and then deserialize them to objects on the consumer side. The most common serialization format is JSON. While JSON is human-friendly, the easier format for machines to serialize and deserialize is binary.

Protobuf (Protocol buffers) is a binary Protocol developed by Google for serializing and deserializing structured data. This format takes up less space, is simpler and easier to maintain, and has better performance. Protobuf is a better alternative to formats such as XML and JSON for communication between machines.

The use of Protobuf is more complex, requiring writing.proto files to define the data format, which are then compiled into the appropriate language by the Protoc compiler, and referenced in the program.

use

The installation

The first step is to install the Protoc compiler, which can go directly to Github to download the binary package, unzip it to the appropriate location and set the PATH.

Install the client for the corresponding language. For golang, execute the following two statements:

go get github.com/golang/protobuf/proto
go get github.com/golang/protobuf/protoc-gen-go
Copy the code

compile

Create a new.proto file in which to define the format of the message:

syntax = "proto3";
package test;

option go_package = "proto/test";

message Award {
    int64 uid = 1;
    int64 awardId = 2;
    string userName = 3;
}
Copy the code

Then use the Protoc compiler to compile it into a GO file:

protoc --go_out=. proto/*.proto
Copy the code

Serialization and deserialization can be done by introducing the generated package into the GO program:

import pb "proto/test"

func marshal(a) {
    award := &pb.Award{
		Uid:      628,
		AwardId:  1,
		UserName: "Haruka",
	}
	msg, err := proto.Marshal(award)
}

func unmarshal(a) {
    award := &pb.Award{}
    iferr := proto.Unmarshal(msg.Value, award); err ! =nil {
        panic(err)
    }
}
Copy the code

proto3language

The first line in the.proto file is syntax = “proto3”; To declare that the file is proto3. You can then declare the package to avoid naming conflicts, and finally you can define message.

Field ids

Each field of message is assigned a unique ID. The minimum ID is 1 and the maximum ID is 2^ 29-1. The ID 19000-19999 cannot be used. Ids can be assigned arbitrarily, but 1 to 15 will take up only 1 byte, and 16 to 2047 will take up 2 bytes, so start as small as possible and assign the smallest to the most frequently occurring fields.

You can use reserved to reserve field names and IDS. It is generally used to reserve new fields in the future, or reserve deleted fields to avoid future use of fields and prevent conflicts:

message Foo {
  reserved 2.15.9 to 11;
  reserved "foo"."bar";
}
Copy the code

The field type

The default type of the field is singular, which can occur only once or zero times, or multiple times if you prefix the field declaration with repeated.

There are several scalar types for protobuf: Double FLOAT INT32 INT64 uint32 SinT32 SINT64 FIXed32 Fixed64 SFIXED32 SFIXED64 bool String bytes.

Enumerations and compound types are also available with Protobuf. For example can import “Google/protobuf/timestamp. The proto type to use timestamp.

The default value rules for protobuf datatypes are as follows: if a field is set to default, it will not be serialized:

  • string bytesType is null by default
  • boolThe default type isfalse
  • The default value is0
  • The enumeration type defaults to the first, i.e0
  • Compound types depend on the language

Enumerated type

message SearchRequest {
  enum Corpus {
    option allow_alias = true;
    UNIVERSAL = 0;
    WEB = 1;
    TEST = 1;
    IMAGES = 2;
  }
  Corpus corpus = 4;
}
Copy the code

To declare an enumeration with an enum, note that the enumeration must have a value of 0 as the default value, and 0 should be the first element to be compatible with Proto2. Option allow_alias = true; To allow two enumerated elements to have the same value, that is, the two elements can be substituted for each other.

The compound type

Within a message, you can embed other messages as compound types:

message SearchResponse {
  repeated Result results = 1;
}

message Result {
  string url = 1;
  string title = 2;
  repeated string snippets = 3;
}
Copy the code

There are other advanced types in Protobuf, such as Any Oneof Map, which are not covered in detail.