Data structure and algorithm - Huffman tree & Huffman coding

preface

In the last article, we learned some knowledge of binary trees based on clues. In this article, we learned about Huffman trees and Huffman coding.

First, let’s look at a classic question,

level	good	good	medium	Pass the	Don’t pass the
Test scores	90 <= score <= 100	80 <= score < 90	70 <= score < 80	60 <= score < 70	< = 0 score < 60

In general, we judge:

So, in a binary tree it looks like this

In general, each class has a certain probability of passing or failing

This will lead to a lot of steps in judging medium and good grades. When there are many grades, there will be efficiency problems.

So how can we optimize to solve this problem? In fact, the predecessors have solved this problem, so let’s look at it

1. Huffman tree

In the grade algorithm above,

The path of node D is 4,

The total path of the tree is 1 + 1 + 2 + 2 + 3 + 3 + 4 + 4 = 20.

The WPL of this tree is: WPL = 1 * 5 + 2 * 15 + 3 * 40 + 4 * 30 + 4 * 10 = 315

Suppose we adjust the order of judging results as follows:

So the path length of node D is 2

The total path of the tree is 1 + 2 + 3 + 3 + 2 + 1 + 2 + 2 = 16

The WPL of this tree is: WPL = 5 * 3 + 3 * 15 + 2 * 40 + 2 * 30 + 2 * 10 = 220

The second case is obviously better than the first case.

So how do we build an optimal tree? Let’s look at building Huffman trees.

Suppose: the weights of A, B, C, D and E are as follows:

A	B	C	D	E
5	15	40	30	10

First, find the two A and E with the minimum weight to form A binary tree:

At this point, the weight of N1 is equivalent to 15. Find the smallest B in turn and build a binary tree:

At this point, the weight of N2 is equal to 30, and the smallest D is successively found to build a binary tree:

At this point, the weight of N3 is equal to 60, and the smallest C is successively found to build the binary tree (the smallest one is in the left subtree position) :

WPL of the binary tree: 40 * 1 + 30 * 2 + 15 * 3 + 10 * 4 + 5 * 4 = 205

The WPL of the binary tree formed by this arrangement is obviously smaller than the above two. The binary tree arranged in this way is called Huffman tree

Summary: Firstly, the weight of the given nodes is sorted to find the two smallest nodes in order to form a binary tree, and the smallest node is located in the left subtreeCopy the code

2. Huffman coding

The hypothesis is as follows:

character	A	B	C	D	E	F
The weight	27	8	15	15	30	5
coding	000	001	010	011	100	101

As shown in the above table, the encoding digits of each character are the same, but the weight of each character is different, which will cause the space waste of characters with low weight

We can use Huffman tree to find the best binary tree arrangement, and then generate Huffman code to solve this problem.

First, generate a Huffman tree for the above characters:

Find B and F with the lowest weights and form a binary tree (smaller ones on the left)

Find C and form a binary tree (smaller ones on the left)

In turn, to find the

The final Huffman tree is as follows:

Then, begin Huffman coding

Huffman coding principle: the weight of the left subtree is replaced by 0, the weight of the right subtree is replaced by 1

Then, from the root node to each leaf node, the code of each leaf node is composed as follows:

character	A	B	C	D	E	F
The weight	27	8	15	15	30	5
coding	01	1001	101	00	11	1000

So the old and new binary encoding for the same string BADCADFEED looks like this:

Old code: 001 000 011 010 000 011 101 100 100 011

New encoding: 1001 01 00 101 01 00 1001 11 11 00

It is obvious that the new encoding (Huffman encoding) has higher storage utilization than the old encoding.

Summary: Mr. Huffman tree starts from the root node, the weight of the left subtree is 0, and the weight of the right subtree is 1. From the root node to the leaf node, the coding of the leaf node is composedCopy the code

3. Huffman tree & Implementation of Huffman coding

The nodes that can define a Huffman tree (order) are as follows:

const int MaxValue = 10000; Const int MaxBit = 4; Const int MaxN = 10; Typedef struct HaffNode{int weight; // weight int flag; // mark whether the node is merged into the Huffman tree 0: unmerged, 1: merged int parent; // Parent node subscript int leftChild; // left child subscript int rightChild; }HaffNode; Typef struct Code {int bit[MaxBit]; // array int start; // Start index int weight; }Code; Void Haffman(int weight[],int n,HaffNode *haffTree){int j, m1, m2, x1, x2; // Huffman tree initializes // n leaf nodes with (2n-1) nodesfor (int i = 0; i < 2*n - 1; i++) {
        if(i < n) { haffTree[i].weight = weight[i]; // assign weight to leaf node}else{ haffTree[i].weight = 0; } haffTree[i].parent = 0; // No parent, subscript 0 haffTree[I]. Flag = 0; HaffTree [I]. LeftChild = -1; // no left or rightChild, give special in -1 haffTree[I]. RightChild = -1; } // Build the Huffman treefor(int i = 0; i < n - 1; i++) { m1 = m2 = MaxValue; x1 = x2 = 0; // Loop to find the two smallest ownership weightsfor (j = 0; j < n + i; j++) {
            if (haffTree[j].weight < m1 && haffTree[j].flag == 0) {
                m2 = m1;
                x2 = x1;
                m1 = haffTree[j].weight;
                x1 = j;
            } else if(haffTree[j].weight < m2 && haffTree[j].flag == 0){ m2 = haffTree[j].weight; x2 = j; HaffTree [x1]. Parent = n + I; haffTree[x2].parent = n + i; haffTree[x1].flag = 1; haffTree[x2].flag = 1; HaffTree [n+ I].weight = haffTree[x1].weight + haffTree[x2].weight; HaffTree [n+ I]. LeftChild = x1; haffTree[n+i].rightChild = x2; Void HaffmanCode(HaffNode haffTree[], int n, Code haffCode[]) {// Create a node Code *cd= (Code *)malloc(sizeof(Code)); int child, parent; // Find the Huffman encoding of n nodesfor(int i = 0; i < n; I++) {// count from 0cd->start = 0; // Get the character encoding the corresponding weightcd->weight = haffTree[i].weight; // The current leaf node is I child = I; Parent = haffTree[I].parent; // From the leaf to the rootwhile(parent ! = 0) {if (haffTree[parent].leftChild == child) {
                cd->bit[cd->start] = 0; // left child node encoding 0}else {
                cd->bit[cd->start] = 1; // Right child node encoding 1} // encoding incrementcd->start++; // Change the child node to the current parent node (look up) child = parent; Parent = haffTree[child]. Parent; } // start = 3 int temp = 0;for (int j = cd->start - 1; j >= 0; j--) {
            temp = cd->start-j-1;
            haffCode[i].bit[temp] = cd->bit[j]; } / /cd// Save the starting bit and weight of haffCode; haffCode[i].start =cd->start; HaffCode [I].weight =cd->weight;
    }
}

// 调用
void show{
        int i, j, n = 4, m = 0;
        
        //权值
        int weight[] = {2,4,5,7};
        
        //初始化哈夫曼树, 哈夫曼编码
        HaffNode *myHaffTree = malloc(sizeof(HaffNode)*2*n-1);
        Code *myHaffCode = malloc(sizeof(Code)*n);
        
        //当前n > MaxN,表示超界. 无法处理.
        if (n>MaxN)
        {
            printf("Defined n out of bounds, modify MaxN!");
            exit(0); } //1. Build Haffman(weight, n, myHaffTree); HaffmanCode(myHaffTree, n, myHaffCode); / / 3.for (i = 0; i<n; i++)
        {
            printf("Weight = %d\n",myHaffCode[i].weight);
            for (j = 0; j<myHaffCode[i].start; j++)
                printf("%d",myHaffCode[i].bit[j]);
            m = m + myHaffCode[i].weight*myHaffCode[i].start;
             printf("\n");
        }
        printf("Huffman's WPS is:%d\n",m);
}


Copy the code

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Data structure and algorithm — Huffman tree & Huffman coding

preface

1. Huffman tree

2. Huffman coding

3. Huffman tree & Implementation of Huffman coding

Data structure and algorithm — Huffman tree & Huffman coding

preface

1. Huffman tree

2. Huffman coding

3. Huffman tree & Implementation of Huffman coding

Related Posts

Introduction to iOS Graphics: OpenGL ES, Metal, Core Graphics, Core Image, GPUImage, OpenCV, etc

IOS reverse (6) dynamic debugging

SimulateIDFA: The next generation iOS device AD tracking solution