Protocol Buffer (Protobuf) is a high-performance, cross-language, cross-platform serialization library from Google.

For better hardware efficiency, numbers in computers are often represented using fixed length intergers. But in a world of microservices, a more flexible way to transmit numbers is needed to save bandwidth. Varint (Variable Length integers) is a method for encoding integer numbers that allows you to flexibly adjust the amount of space required for integer values.

Varint coding

Varint in Protobuf is variably binary encoded according to the size of the integer, less than 128 () takes up 1 byte after encoding, less than 16384 (), and so on, up to 10 bytes can be used to indicate greater than or equal toThe integer; Its implementation principle is shown in the figure below:

For each encoded byte, the first byte identifies whether it is a tail, and the next seven bits record the binary of the original number, for example, 299 under int32

00000000 00000000 00000001 00101011

The encoded result is 10101011 00000010, or 2 bytes saved per transfer.

Since only 7 bits per byte in the Varint encoding result are significant bits (storing raw data), for less thanFor int32 or int64, Varint encoding can play a compression effect. Of course, small numbers are often used much more often than large numbers, so Varint encoding can be compressed for most scenarios.

Varint encoding is as follows:

const maxVarintBytes = 10 // maximum length of a varint

// EncodeVarint returns the varint encoding of x.
// This is the format for the
// int32, int64, uint32, uint64, bool, and enum
// protocol buffer types.
// Not used by the package itself, but helpful to clients
// wishing to use the same encoding.
func EncodeVarint(x uint64) []byte {
	var buf [maxVarintBytes]byte
	var n int
	for n = 0; x > 127; n++ {
		// Write the first 7 bits of the original number beginning with the lowest digit
		buf[n] = 0x80 | uint8(x&0x7F)
		// Remove the 7 bits recorded
		x >>= 7
	}
	// The first digit is 0
	buf[n] = uint8(x)
	n++
	return buf[0:n]
}
Copy the code

I use the two magic number, see they can understand the binary, & 7 bit 0 x7f, | 0 x80 will first marked as 1.

0x80 => 0000000010000000
0x7f => 0000000001111111
Copy the code

Therefore, the deserialization of Varint is to take the last 7 bits of each byte in reverse order, as shown in the following figure (taking the encoding result of 299 as an example) :

The source code is as follows:

// DecodeVarint reads a varint-encoded integer from the slice.
// It returns the integer and the number of bytes consumed, or
// zero if there is not enough.
// This is the format for the
// int32, int64, uint32, uint64, bool, and enum
// protocol buffer types.
func DecodeVarint(buf []byte) (x uint64, n int) {
	for shift := uint(0); shift < 64; shift += 7 {
		if n >= len(buf) {
			return 0.0
		}
		b := uint64(buf[n])
		n++
		// Discard the first digit, take 7 bits and add back x
		x |= (b & 0x7F) << shift
		// The first digit is 0
		if (b & 0x80) = =0 {
			return x, n
		}
	}

	// The number is too large to represent in a 64-bit value.
	return 0.0
}
Copy the code

EncodeVarint and DecodeVarint both handle uint64 types. What if we need to handle negative numbers? Let’s see what we get when we directly encode negative numbers as uint64:

fmt.Println(EncodeVarint(uint64(- 299.)))
// output:
// [213 253 255 255 255 255 255 255 255 1]
Copy the code

The result is a 10-byte encoding, expressed in 64-bit, because the value of uint64(-299) complement is 299. The maximum length of varint encoding is obtained. The source code for varint encoding is as follows:

// SizeVarint returns the varint encoding size of an integer.
func SizeVarint(x uint64) int {
	switch {
	case x < 1<<7:
		return 1
	case x < 1<<14:
		return 2
	case x < 1<<21:
		return 3
	case x < 1<<28:
		return 4
	case x < 1<<35:
		return 5
	case x < 1<<42:
		return 6
	case x < 1<<49:
		return 7
	case x < 1<<56:
		return 8
	case x < 1<<63:
		return 9
	}
	return 10
}
Copy the code

Zigzag

Zigzag encoding maps signed to unsigned integers, and as the name suggests, the encoded values oscillate between positive and negative integers, as shown in the following table:

Signed Original Encoded As
0 0
– 1 1
1 2
2 – 3
2147483647 4294967294
– 2147483648. 4294967295

Its implementation is as follows:

func Zigzag64(x uint64) uint64 {
	// Move XOR to the left one bit (-1/0 64-bit complement)
	return (x << 1) ^ uint64(int64(x) >> 63)}Copy the code

And the thing to notice here is that if x is negative, the left-hand side of XOR is -x’s complement moved one bit to the left. The following figure takes -299 as an example. The complement of 299 is calculated first, followed by the complement of XOR symbol (-1/0), and the result is 597. If it is positive, Zigzag results in twice the original number.

summary

While writing this article, I reviewed a wave of basic knowledge, such as why Varint encodes negative numbers and the result is 10 bytes. The reason is that negative numbers are stored in the form of complement. So what seems like an introductory concept in college is something you’ll encounter in actual programming, and there’s a long way to go

Update: 2020.01.22 Fixes the error description