In conjunction with the previous article “Learning Protobuf, what is Varint?” We learned that encoding integers by Varint does not have the compression advantage if we encounter negative numbers or large integers. With the introduction of MSB, not only does it not have good compression, but it also has more storage, which is obviously not what we want. Here’s how to solve these problems.

This article is also about learningProtobufThe algorithm is simple and short in length. The reading time is expected to be 8 minutes. If it is helpful to you, please feel free to comment.Ask for likes, ask for comments, ask for retweets.

Before we talk about ZigZag algorithm, let’s talk about base, source code, inverse code, complement related knowledge points, if you understand, you can skip to the next page.

What is base?

The so-called base is when the information on a certain bit is full, need to carry forward. In decimal, for example, a digit is carried when it reaches ten; And when the number in one place is full two, the carry is binary, and so on.

For example, decimal: 10 → binary: 1010 → hexadecimal: A

I read an answer earlier that said: Why is the decimal system universal? Since we humans only have 10 fingers, counting exactly 10 times, the decimal system became the default base. So if humans have 11 fingers, maybe it’s hexadecimal.

Later, with the emergence of computers, whether or not a data is the most natural information carrying unit, so the binary system composed of 0 and 1 naturally becomes the base system of computers. – old seedlings

In the computer system, source code, inverse code and complement code are defined for binary. For a simpler understanding, we will use 1 Byte=8 bits to explain the following.

What’s the source code?

Definition: Use the first bit to represent symbols (0 is non-negative, 1 is negative) and the remaining bits to represent values, as follows:

  • +8 -> Source code: 0000 1000
  • -8 -> Source code: 1000 1000

With the representation method of the original code, logarithmic algorithm can be carried out, but it was soon found that the result of multiplication and division by using the original code with signed bits was correct, but there were problems in addition and subtraction operations, as follows:

Decimal: +8 * (-8) = -64 Source code: 0000 1000 * 1000 1000 = 1100 0000 = -64 Source code: +8 + (-8) = 0 Source code: 0000 1000 + 1000 1000 = 1000 0000 = -16Copy the code

It looks like there’s nothing wrong with addition, but it’s the sign bits, right? So the computer gurus introduced inverse code.

What did the reverse code do?

Definition: Use the first bit to represent the symbol (0 is non-negative, 1 is negative), and the rest of the bits, the non-negative bits remain unchanged, and the negative bits are reversed as follows:

  • +8 -> Original code: 0000 1000 -> Inverse code: 0000 1000
  • -8 -> Source code: 1000 1000 -> Reverse code: 1111 0111

Let’s continue with the addition

  • Decimal: +8 + (-8) = 0
  • 0000 1000 + 1111 0111 = 1111 1111 = -0

The result was -0, which was totally unexpected!!

It is found that, on the face of it, binary computation looks fine if it is represented by source code + complement. But a closer look reveals two problems:

First, 0 can actually be represented in two codes, +0 and -0:

  • +0 -> < p style = “max-width: 100%; clear: both
  • -0 -> Source code: 1000 0000 -> Inverse code: 1111 1111

Second, the computer does not know the existence of the sign bit, so after participating in the operation, the result will be a phenomenon like -0.

Strange as it may seem, to solve these problems, computer giants have introduced complementary codes

What’s the use of the complement?

Definition: the symbol is represented by the first digit (0 is non-negative, 1 is negative), the remaining non-negative bits remain unchanged, negative bits are reversed and one is added to the end.

  • +8 -> Source: 0000 1000 -> Complement: 0000 1000
  • -8 -> Source code: 1000 1000 -> Complement code: 1111 1000

Now let’s go ahead and see, what happens when you plug in the sign bit?

-> 8 + (-8)
-> 0000 1000 + 1111 1000
-> 0000 0000
-> 0
Copy the code

Obviously, by introducing the complement code, we have solved this kind of problem. In the process of computer operation, we do not need to care about the symbol problem, and we can uniformly deal with it according to the full binary rule

Okay, so that’s it for the little facts, but then, let’s get to the real topic.

What is ZigZag?

In most computer systems, we usually use fixed length intergers to represent values. Such as:

  • with4 bytessaidInt32
  • with8 bytessaidInt64

Why is that? This will facilitate our computer processing, speed up the processing.

But in system network communication (RPC), in order to transmit a 1, we need to transmit 32 bits 00000000 00000000 00000000 00000001. What a waste of T&M with so many characters and only 1 bit of valuable data!

So what to do? ZigZag algorithm was born from this.

The principle of the ZigZag

Coding is introduced

ZigZag coding maps signed integers to unsigned integers so that numbers with smaller absolute values correspond to smaller coded values, such as: -1 -> 1, 1 -> 2, as shown in the figure below:

The original number coding
0 0
– 1 1
1 2
2 – 3
2 4
. .
– (2 ^ 31-1) 2^32 – 3
2 ^ 31-1 2 ^ 32-2

As mentioned above, this method is usually encoded by a zigzag between positive and negative integers, which is fun to watch. Is that why computer gurus have given the name ZigZag??

Encoding rules

  • A. Non-negative integer, the sign bit moves backward
  • B. Negative integer, sign bit moved backward, data bit inverted by bit

On most computer systems, integers (Int32, Int64) are expressed as 4 Bytes and 8 Bytes. Let’s choose Int32 for a simple demo, as follows:

  • Decimal: 0\

    • Complement: 00000000 00000000 00000000\

    • ZigZag: 00000000 00000000 00000000 00000000 00000000

  • Decimal: 1\

    • Complement: 00000000 00000000 00000000 00000001\

    • ZigZag: 00000000 00000000 00000000 00000010

  • Decimal: -1\

    • Complement: 111111111111 11111111 11111111\

    • ZigZag: 00000000 00000000 00000000 00000001

The decoding rules

  • Similar coding, reverse operation can be

ZigZag Coding Implementation (Python)

def int32_to_zigzag(n):
    return (n << 1) ^ (n >> 31)
Copy the code

ZigZag Decoding implementation (Python)

def zigzag_to_int32(zz):
    return (zz >> 1) ^ -(zz & 1)
Copy the code

To summarize

In most cases, ZigZag coding combined with Varint algorithm has a good compression effect on integers, but if integers with large absolute values are encountered, there is no compression advantage.

However, the integers we usually use tend to be small.

Reference documentation

  • en.wikipedia.org/wiki/Zigzag
  • Developers.google.com/protocol-bu…
  • Studygolang.com/articles/35…

❤️❤️❤️ Every love from readers is my motivation! I am thirty-one, thank each friend: beg point praise, beg comment, beg forward, everybody next period see!