The information entropy

In information theory, Shannon invented information entropy.

The formula is as follows, which represents the entropy of source S:

So what exactly is information entropy? It’s a measure of information content. Don’t think of uncertainty any more. It’s abstract and makes people want to smoke it. Think of it as diversity of information, which has two practical meanings:

  1. The average amount of information of source S
  2. The average number of digits required to encode all the symbols S so what is entropy coding? Coding within the limits of information entropy is entropy coding. For example, if the entropy of information is calculated to be 3 bits per character, if you encode it with 4 bits per character, it’s entropy encoding, if you encode it with 2 bits per character, it’s not entropy encoding, because in that case, it’s distorted. It can also be seen from here that source entropy is the minimum number of bits required to encode the average of this source. So, entropy coding is lossless compression.

For a video sequence, the original signal always has redundancy, which is called representation redundancy. If the shortest piece of data can be found to represent the amount of data represented by the whole video, the shortest data broken is entropy. The basic idea of entropy coding is that symbols of high probability allocate short code words and symbols of low probability allocate long code words. Finally, the source code length is the shortest. Huffman codes can be uniquely decoded correctly because no one Huffman code word is a prefix to any other code word.

There are many kinds of entropy coding: Huffman coding (Arithmetic coding) Stroke coding (RLE) Context-based adaptive variable-length coding (CAVLC) Context-based adaptive binary arithmetic coding (CABAC)

Huffman coding

Huffman coding is a variable length coding method which relies on the probability of code words to construct the shortest average length coding method. The key step is to establish a binary tree conforming to Huffman coding rules, namely, Huffman tree; Huffman tree:

  • A special binary tree in which the number of leaf nodes is equal to the number of code elements and each terminal node has its own weight;
  • The weighted path length, i.e. the length of the path from root to leaf multiplied by the sum of the weights, is minimum. Note: After saving the code, the Huffman code table should also be saved. Different source, each symbol number is different! You can’t decode without a code table.

The article about Huffman coding in the data structure budget Algorithm column [[DSA] tree – Huffman tree detail (3)] has described the detailed process.

Huffman’s faults

  • The decoder needs to know the structure of the Huffman tree, so the encoder must save or transmit the Huffman tree for the decoder, increasing the requirement of storage space;
  • The traditional Huffman code word decoding method is to read bits in the code stream until the corresponding code word is found in the Huffman tree, which increases the complexity of the decoder calculation.

The improvement of the overall efficiency of the video compression system depends on the performance gain of each technical module in the system. Although both predictive coding and transform coding can effectively remove the redundancy of video data entropy coding is the real compression effect. In video compression system, entropy coding is always located at the end of the system, which is responsible for encoding the transformation coefficient, motion vector and other information generated in the process of coding, and completes the organization of the final compressed code stream.

Besides DCT coefficient of entropy coding, in a modern block based motion compensation in video compression system, the entropy coding is also responsible for the motion vector and other auxiliary information coding, but in most cases, the transform coefficient coding bits of stack for most of the general stream, so most video entropy coding technique research focuses on the transform coefficient coding.