Algorithm + data structure = program
takeaway
1.1 Why learn data structures and algorithms
Many people think that learning data structure and algorithm is just to cope with the interview, and there is basically no chance to use it in daily development. Some students also brush the questions, but find that sometimes these questions will cost a lot of money at a glance. There are even some students who are directly opposed to the term data structure and algorithm.
A big part of this is because you don’t really understand data structures. Have you ever wondered why big companies require data structures and algorithms? Why does the technology pass but always hang in the algorithm?
Learning data structures and algorithms is not just about learning the classical structures of arrays, linked lists, queues, stacks, trees, graphs, etc., nor is it just about learning the algorithms for interview.
In fact, data structure is one of the important links to test whether your basic skills are solid. The most important thing is that you should learn an idea: how to turn real problems into expressions in computer language.
Algorithm + data structure = program
This article will introduce the basic concepts of data structure. After understanding the basic concepts, we will do some related exercises to deepen our understanding of them. I hope I can get you some help.
This topic will also continue to add content, if there are pictures or description of the wrong place or unclear place, please point out in time, I will correct.
The basic concepts of data structure have been completed so far. Related algorithm questions will be added in the algorithm section. Thank you for your attention.
1.2 What are data structures and algorithms
Data structures are the way computers store and organize data. A data structure is a collection of data elements that have one or more specific relationships with each other. More often than not, a carefully chosen data structure can lead to higher running or storage efficiency. Data structures are often associated with efficient retrieval algorithms and indexing techniques.
Data structures include:
 Linear structure: linear tables (arrays, linked lists, queues, stacks, hashes)
 Tree structure: binary tree, AVL tree, red black tree, B tree, heap, Trie, Huffman tree, and search set
 Graph structure: adjacency matrix, adjacency list
Algorithm refers to the accurate and complete description of problem solving scheme and a series of clear instructions to solve problems. Algorithm represents the strategy mechanism to solve problems in a systematic way
Algorithms include: recursive method, recursive method, exhaustive method, greedy algorithm, divide and conquer, dynamic programming method, iterative method, branch boundary method, backtracking method
I. Basic concepts of algorithm
Algorithm refers to the accurate and complete description of problem solving scheme and a series of clear instructions to solve problems. Algorithm represents the strategy mechanism to solve problems in a systematic way
The advantages and disadvantages of an algorithm can be measured by space complexity and time complexity.
Features:
 Finiteness: The fineness of an algorithm is that it must be able to terminate after a finite number of steps are performed.
 Definiteness: Each step of an algorithm must have a definite definition.
 Input: : An algorithm has zero or more inputs to describe the initial condition of the operation object. The socalled zero Input means that the algorithm itself determines the initial condition.
 Outputs: An algorithm has one or more outputs that reflect the results of processing the input data. An algorithm without output is meaningless.
 Effectiveness: Any computational step performed in an algorithm is one that can be decomposed into basic executable operational steps, that is, each computational step can be completed within a finite time (also known as Effectiveness).
1.1 Time Complexity
The time complexity of an algorithm refers to the computational effort required to perform the algorithm. That is, the number of times the program needs to be executed
Big O notation
The large O notation is generally used to describe the degree of responsibility, which refers to the complexity of data size N

Ignore constants, coefficients, low order
Number of executions The complexity of the Informal terms 12 O(1) Constant of the order 2n+3 O(n) Linear order 4n^2+2n+6 O(n^2) Square order 4log2n+25 O(logn) The logarithmic order 3n+2nlog3n+15 O(nlogn) Nlogn order 4n^3+3n^2+n+12 O(n^3) Cubic order 2^n O(2^n) The index order
O(1) < O(logn) < O(n) < O(nlogn) < O(n^2) < O(n^3) < O(2^n) < O(n!) < O(n^n)
1.2 Space Complexity
Space Complexity is a measure of the amount of storage Space temporarily occupied by an algorithm while it is running, denoted by S(n)=O(f(n)). For example, the time complexity of direct insertion sort is O(n^2), and the space complexity is O(1). A normal recursive algorithm would have order n space complexity, because every time you recurse, you have to store the information that comes back. The merits and demerits of an algorithm are mainly measured from the execution time and storage space of the algorithm.
Ii. Data StructureLinear Structure ()
A linear list is a finite sequence of n elements of the same type (n>=0)
The index0 1 2 3 n
a1 >> a2 >> a3 >> a4 >> ... >> an
Copy the code
A1 is the first node/element and an is the last node/element
A1 is the precursor and successor of A2
Common linear tables Linear tables (arrays, linked lists, queues, hashes)
2.1 Array
An Array is a linear list that is stored sequentially, with contiguous memory addresses for all elements.
Related algorithms problem number 1
2.2 Linked List
A linked list is a nonsequential storage structure on a physical storage unit. The logical order of data elements is realized by linking the order of Pointers in the linked list
2.2.1 Unidirectional linked lists
Leetcode – Removes nodes in the linked list
2.2.1 Circular linked lists
Leetcode – Circular linked list
2.2.1 Bidirectional linked lists
Leetcode – Bidirectional linked list
2.3 Stack
A stack is a special kind of linear table that can only be operated on at one end
 The operation of adding an element to the stack is called push
 To remove elements from the stack, this is called pop.
 The stack follows the principle of Last In First Out LIFO
Leetcode – Stack implementation with queues
Leetcode – Implement queues with stacks
2.4 Queue
A queue is a special kind of linear table, you can only do things at the top and the bottom and what’s special about a queue is that it only allows you to delete at the front end of the table, and insert at the back end of the table, and like a stack, a queue is a limited linear table. The end that inserts is called the end of the queue, and the end that deletes is called the head of the queue.
 Rear: You can only add elements from the rear of the queue, usually called enQueue
 Queue head (front) : Only elements can be removed from the queue head, usually called deQueue
Problem set for related algorithms
2.4.1 Dualended Queue
A twoend queue is an operation that can be added or deleted at both ends of a queue
2.4.2 Circular queue
Think of vector space as a ring connected end to end, and call such vectors cyclic vectors. The queues stored in them are called Circular queues. A circular queue is a circular queue that connects sequential queues end to end and logically treats the table storing queue elements as a ring.
2.5 Hash Tables/Hash Tables
A Hash table (also called a Hash table) is a data structure that is directly accessed based on Key values. That is, it accesses records by mapping key values to a location in the table to speed up lookups. This mapping function is called a hash function, and the array of records is called a hash table.
A hash table is essentially an array, and each element in the array becomes a box that holds keyvalue pairs. Retrieves value from the array based on index. Index is obtained by a fixed hash function that converts the key to index. No matter how well a hash function is designed, it is possible for different keys to get the same hash value after being hashed, which requires handling hash conflicts.
Hash function:
A hash function is a function that maps the key values of an element in a hash table to its storage location. In a common sense, it is a function that directly calculates the location of the key in the table.
The main steps of hash function implementation:
 Hash of Key (must be an integer)
 The hash of the key is then computed in relation to the size of the array to generate an index
Load factor: total number of nodes (total number of keyvalue pairs)/hash bucket array length
Load factor, which is used to measure the degree of empty/full hash table, can also reflect the efficiency of query to a certain extent. A larger load factor means that the hash table is more full, more likely to cause conflicts, and lower performance. Therefore, in general, when the load factor is greater than a constant (typically 0.75), the hash table will automatically expand. When a hash table is expanded, it typically creates twice the size of the original array. Therefore, even if the hash value of key does not change, the result of modulating the number of arrays will change as the number of arrays increases, so the positions of key and value pairs may change, which is also called rehashing
2.5.1 Construction method of hash function
1. Direct customization method:
Take the keyword or a linear function value of the keyword as a hash address. That is, H(key)=key or H(key)= a·key + b, where a and b are constants (such hash functions) are called self functions). If H(key) already has a value, go to the next one until H(key) has no value, and put it in.
Example: H(key) = A ·key + B A = 1/10 B = 5
Key  Hash(Key) 

1630  168 
80  13 
4780  483 
1820  187 
2. Numerical analysis:
Analysis of a set of data, such as a set of the employee’s date of birth, then we found the same date of birth of the first few digits, in this case, the chances of conflict will be very big, but we found that date back several month and date number difference is very big, if use the back of the Numbers to form a hash address, will significantly reduce their risk of conflict. Therefore, numerical analysis method is to find out the rule of numbers and use these data to construct the hash address with low probability of conflict as far as possible
3. Square to middle method:
When it is impossible to determine which bits of the keyword are evenly distributed, the square value of the keyword can be calculated first, and then the middle bits of the square value can be taken as the hash address. This is because the middle bits after the square are related to each bit in the keyword, so different keywords will produce different hash addresses with a high probability.
Example: We use the alphabetic position of an English letter as the internal code for that letter. For example, the internal code of K is 11, E is 05, Y is 25, A is 01, and B is 02. Thus, the internal code of the keyword “KEYA” is 11052501. Similarly, we can get the internal code of the keywords “KYAB”, “AKEY” and “BKEY”. After the keyword is squared, bits 7 to 9 are taken out as the hash address of the keyword, as shown in the following table:
The keyword  The internal encoding  The square value of the internal code  Hash address of the H(k) keyword 

KEYA  11052501  122157778355001  778 
KYAB  11250102  126564795010404  795 
AKEY  01110525  001233265775625  265 
BKEY  02110525  004454315775625  315 
5. Folding:
Divide the keyword into parts with the same number of digits, the last number of digits can be different, and then take the sum of these parts (minus the carry) as the hash address. There are two methods of digital superposition: shift superposition and interboundary superposition. Shift stacking is to align the lowest order of each part after segmentation, and then add; Interboundary stacking is when you fold back and forth along the partition boundary from one end to the other, and then align and add.
5. Random number method:
Select a random number and take the random value of the keyword as the hash address, that is, H(key)=random(key) where random is a random function and is usually used when the keyword length is different.
6. Division and remainder method:
Take the remainder of the keyword divided by a number p not larger than the length m of the hash table as the hash address. H(key) = key MOD p,p<=m. Not only can the key directly modular, but also after folding, square take medium operations modular. The choice of P is very important, generally take prime number or m, if the choice of P is not good, it is easy to produce synonyms.
Example:
H = key % p (p<=m)
But it’s not always that simple, the key might go through some other calculation and then mod.
No matter how well a hash function is designed, it is possible for different keys to get the same hash value after being hashed, which requires handling hash conflicts.
2.5.2 Hashing Conflict Resolution
1. Zipper hair:
A combination of linked lists and arrays
A linked list stored at the location of each bucket in the hash table, with next pointing to the current element if duplicate hash values occur.
In JDK1.8, the default is to use a oneway linked list to string elements. When adding elements, the singlenecklace list will be converted to a redblack tree to store elements, such as hash table size >64 && singlenecklace table node number greater than 8.
When the number of redblack tree nodes is small enough, it will turn into a single linked list.
2. Rehash (Doha Hash) :
By designing two or more hash functions, you can avoid conflict, but there is still a chance of conflict, and the better or more hash functions you design, the less likely you are to clash (unless you are very bad, it is almost impossible to clash).
Open address method:
The open address method has a formula: Hi=(H(key)+di) MOD m I =1,2… , k (k < = m – 1)
Where, m is the length of the hash table. Di is the sequence of increments at the time of the conflict.

Linear Probing
If di values are likely to be 1,2,3… M – 1 constant
Probe one by one until you find one that’s empty and then store the data in the bucket

Quadratic detection
D I = a * I 2 (I <= m/2) m is the total number of Key sets. A is a constant.
Detect if the location of interval I2 cells is empty, if so, store the address in it.

Pseudorandom detection
d i = random(Key);
The detection interval is a pseudorandom number.
3.5.3 Related algorithms
Leetcode – 1. Sum of two numbers
Leetcode – 3. The oldest string without repeating characters
3. Data Structure – Tree Structure
A tree structure is a hierarchy of nested structures. The outer and inner layers of a tree structure have similar structures, so the structure can be recursively represented. The various tree graphs in classical data structures are typical tree structures: a tree can be simply represented as a root, a left subtree, and a right subtree. The left and right subtrees have their own subtrees. – wikipedia
The basic concept

Tree species:
 Ordered tree: Any node in a tree is in no order
 Unordered tree: There is order between any nodes in the tree
 Multifork tree/Nfork tree: Each node of a tree can have more than two child nodes
 Binary tree: binary tree
 Full binary tree: Full binary tree
 Complete binary tree: Complete binary tree
 Huffman tree: Huffman tree

Node: Each element in the tree is called a node

Empty tree: A tree without any nodes is called empty

Root node: A node that has no parent (also called a tree with a single root node)

Parent node: A node that has at least one child node directly subordinate to it.

Child node: the node at the next level from the parent node

Sibling: Children of the same parent are siblings of each other

Subtree: A tree consisting of a child node and all its descendants

Left subtree: A tree consisting of the left child node and all the children of the child node

Right subtree: The tree consisting of the right child node and all the descendants of the child node

Degree of nodes: the number of subtrees

Tree degree: The maximum degree of all nodes is the tree degree

Leaf node: node whose degree is 0

Nonleaf node: a node whose degree is not 0

The number of layers of the tree: the first layer (or layer 0) of the root node and the second layer of the child node and so on

Node depth: the total number of nodes on the path from the root node to the current node

Height: The total number of nodes along the path from the current node to the farthest leaf node

Tree depth: the maximum depth of all nodes

Tree height: The total number of nodes along the path from the following node to the furthest leaf node

Forest: a collection of m(m>=0) trees that do not intersect each other is called a forest.
3.1 Binary Tree
A binary tree is an ordered tree whose nodes have a degree of 2 or less. It is the simplest and most important tree. The recursion of binary tree is defined as: binary tree is an empty tree, or a nonempty tree consisting of a root node and two disjoint left and right subtrees called root respectively. Both left and right subtrees are binary trees [2].
Features of binary trees:
 Each node has a maximum of 2 subtrees (degree Max. 2)
 All subtrees are ordered
 Even if there is only one subtree there are two subtrees
Properties of binary trees:

There are at most 2^i1 (I ≥1) nodes on the ith layer of a binary tree

A binary tree of depth H contains at most 2^ H1 nodes

If in any binary tree, there are n0 leaf nodes and n2 nodes of degree 2, then there must be n0 = n2+1

A complete binary tree with n nodes has a depth of log2x+1 (where x represents the largest integer not greater than n)

If a complete binary tree with n nodes is numbered sequentially (1 ≤ I ≤ n), then, for nodes numbered I (I ≥ 1) :
When I =1, the node is the root, and it has no parent node when I >1Is, the parent node of the node is I /2 若2If I ≤ n, it is numbered as2Left node of I, otherwise there is no left node if2i+1≤ n, the number is2i+1Otherwise, there is no right nodeCopy the code
True binary tree: a binary tree has only nodes of degree 0 and degree 2 (if any nodes of degree 1 are not true).
Full binary tree: in a binary tree, there are only nodes with degree 0 and degree 2, and the nodes with degree 0 are all in the last layer. Full binary tree is also true binary tree
Complete binary tree: depth of k, there are n nodes of a binary tree if and only if its every node and the depth of full binary tree of k in a pair of Numbers from 1 to n nodes of the season, is called the complete binary tree, leaf node will only appear in the final two layers, and the last one layer of leaf nodes are aligned on the left, from the root node to the bottom tier 2 must be a full binary tree
Properties of complete binary trees:

Nodes of degree 1 have only left subtrees

Nodes of degree 1 have either 1 or 0

For a binary tree with the same number of nodes, the height of a complete binary tree is the smallest (a common binary tree with 15 nodes can have 5 layers, but a full binary tree must have only 4 layers).

Assume that the height of the complete binary tree is h (h >= 1)

At least 2^(h1) nodes (2^0 + 2^1 + 2^2 +… +2^(h2) + 1 ) = 2^(h1)

A maximum of (2^h) – 1 node (2^0 + 2^1 + 2^2 +… Plus 2 to the h minus 1 is 2 to the h minus 1, full binary tree

If the total number of nodes in a complete binary tree is n.

Number of leaf nodes
If the leaf node (degree is0) is n0 and the degree is1Is n1, and the degree is2If n = n0+ N1 +n2 in any binary tree, there are N0 leaf nodes and n2 degree is2, there must be n0 = n2+1N0 = n2 +1, n2 = n0 1 n = n0+n1+n0 1 = 2n0+n1 1Complete binary tree properties: degree is1The nodes of the1a either0So if n1 is equal to1Then n is even and n is equal to2n0+1 1 = 2N0, n0 is equal to n over PI2, so if n1 =0Then n is odd, and n is equal to2n0+0 1 = 2n0 1N0 = (n +1) /2So odd and even combine with the formula: n0 = floor(n+1) // round downCode: leaf node = (n+1) > >1 Copy the code

Number of nonleaf nodes
Number of nonleaf nodes: n1+n2: leaf node: if n1=1N0 = n /2, leaf node: if n1=0N0 = (n +1) /2Then: If n1 =1Then n is even, n1+n2 = n(n/)2) = (2nn)/2 = n/2If the n1 =0Then n is odd, n1+n2 = n(n+)1) /2 = (2nn 1) /2 = (n 1) /2So: n1+n2 = floor(n 1) /2 // round down Copy the code

The height of a complete binary tree
According to: a complete binary tree of height h has at least one2^(h 1) nodes, at most (2^h) 1So:2^(h 1) <= n < 2^h h 1 <= log2n < h // logarithm eg: log2n = 5.5, h1 <= 5.5 < h, h = 6 h = floor(log2n)+1 // Round down the summary formula Copy the code


A complete binary tree with n nodes (n>0) numbered from top to bottom from left to right starting with 0/1 for any I nodes
Number from x (x=0 or 1) 0 start number x = 0 1 start number x = 1 If I is equal to x It’s the root node It’s the root node If I is greater than x The parent node is numbered floor(I /2) The parent node is numbered floor((i1)/2) If 2i plus x is less than n minus x The left child node is numbered 2i The left child node is numbered 2i+1 If 2i plus x is greater than n minus x It has no left child It has no left child If 2i plus 1 plus x is less than = n minus x The number of the right child node is 2i+1 The number of the right child node is 2i+2 If 2i plus 1 plus x is greater than n minus x It has no right child It has no right child Notice that if you start numbering from 0 then the formula will respond

Binary Search Tree
Binary search tree (binary search tree, binary sort tree) it is either an empty tree, or a binary tree with the following properties:
 If its left subtree is not empty, the value of any node in the left subtree is less than the value of its root node.
 If its right subtree is not empty, the value of any node in the right subtree is greater than the value of its root node.
 Its left and right subtrees are binary search trees respectively.
As a classical data structure, binary search tree has the advantages of quick insertion and deletion of linked list and quick search of array.
It is widely used in file systems and database systems for efficient sorting and retrieval operations.
Binary search tree elements must be comparable such as eg: int double if you specify a type, you need to specify a comparison method
For example, an ordered dynamic search data in the array [18] 3,4,6,8,9,7,10,11,14.16.
 The worst time for binary search is O(logn) but the worst time for add and remove is O(n).
 The worst time complexity of binary search tree to search a number, add and delete is O(logn).
3.2.1 Preorder traversal
Root node – left subtree – right subtree
Preorder traversal – recursive and iterative method implementation
3.2.2 Order traversal
Left subtree – root node – right subtree
In order traversal – recursive and iterative method implementation
3.2.3 Postorder traversal
Left subtree – right subtree – root node
Post – order traversal – recursive and iterative method implementation
3.2.4 Sequence traversal
Access each node once from top to bottom and left to right
Sequence traversal – recursive and iterative method implementation
3.2.5 Binary search tree correlation algorithm
The height of the binary tree
Leetcode – 94. Middle order traversal of binary trees
3.3 AVL Tree
In computer science, AVL tree is the first selfbalanced binary search tree invented. The maximum difference in height between two subtrees of any node in an AVL tree is 1, so it is also called a height balanced tree. Additions and deletions may require one or more tree rotations to rebalance the tree.
The AVL tree takes its name from its inventors g. M. AdelsonVelsky and E. M. Landis, who published it in their 1962 paper An Algorithm for the Organization of Information.
3.3.1 Basic concepts and features
The basic concept
 Balance factor: the height of a node and the left and right subtrees
Characteristics of AVL trees

Itself is first and foremost a binary search tree.

With equilibrium conditions: the equilibrium factor for the difference in height between the left and right subtrees of each node can only be 1, 0, 1. The absolute value is 1

If the absolute value of the height difference of each node is greater than 1, the tree is out of balance.

Search, add, delete time complexity is O(logn)
Examples of imbalances:
The tree on the left is a balanced binary search tree. Add a 14 at node 13
The result is that both node 15 and parent node 18 are out of balance, and in the worst case, all the ancestor nodes are out of balance
3.3.2 Restore balance: LL Right rotation
3.3.3 Restore balance: RR Left rotation
3.3.4 Restoring balance: LR First left and then right
3.3.5 Restoring balance: RL First right then left
3.3.6 AVL summary
add
 All ancestor nodes may be out of balance
 As long as you get the lowestheight point back to equilibrium, the whole tree is back to equilibrium in order one adjustment
delete
 Parent or ancestor nodes may be out of balance (only one node is out of balance)
 After the balance is restored, the ancestor nodes at higher levels may be out of balance, so it takes at most O(logn) adjustments to cycle to the root node for detection
Average complexity
 Search O (logn)
 Add: O(logn) Requires only O(1) adjustment operations
 Delete: O(logn) A maximum of O(logn) adjustments are required
3.3.6 AVL related algorithms
Problem set for related algorithms
Problem set for related algorithms
3.4 B Tree
3.4.1 Basic Concepts
To find a given keyword in a Btree, first take the root node, which contains the keyword K1,… Kn searches for the given keyword (sequential search or binary search method can be used). If the keyword equal to the given value is found, the search succeeds. Otherwise, it must be determined that the keyword to be searched is between Ki and Ki+1, and Pi is the pointer to the child root node. At this point, the node indicated by the pointer Pi is taken to continue searching until it is found, or the search fails when the pointer Pi is empty.
B tree is a balanced multipath search tree, which is used for file system and database implementation
 A node can store more than two elements and can have more than two child nodes
 It has some properties of binary search trees
 Balanced, many subtrees of each node have the same height
 Compared to
B tree properties
 Assume that the number of elements stored in a node is x m as order (m>=2)
 Number of elements: x
 Root node: 1 <= x <= m1
 Nonroot node: ⌈m/2⌉1 <= x <= m1 (⌈⌉ : rounded up)
 If there are children, the number of children is y = x+1
 Root node: 2 <= y <= m
 Nonroot node: ⌈m/2⌉ <= y <= m
 M = 3, 2 <= y <= 3 so it can be called (2,3) tree, 23 tree
 M = 4, 2 <= y <= 4 so you could call it a (2,4) tree, a twothreefour tree
 .
 Number of elements: x
B trees and binary search trees are logically equivalent
 A super node can be obtained by merging multiple generations of nodes
 The supernode merged with generation 2 has at most 4 child nodes which are at least 4thorder Btree
 The supernode merged with 3 generations has at most 8 child nodes and at least 8 order B trees
 The supernode merged in N generation has at most 2^ N child nodes and at least B tree of order 2^ N
M order B tree, at most log2m generation merge
3.4.2 Add – Overflow
Properties:
 The new element must be added to the leaf node
 The node element must be equal to m(order)
 Assume that the middle element of the overflowed node is K and split the child nodes
 Merges the element at position K upward with the parent node
 Split the elements at [0,k1] and [k+1,m1] into 2 child nodes.
 After a split, it is possible to overflow the parent node and loop through the appeal method to the root node at worst
3.4.3 Delete – Underflow

If you need to delete a leaf node, just delete it

If you want to delete the element in the nonleaf node
 Find out the precursor or successor nodes and cover the nodes to be deleted
 Then delete the precursor or successor nodes

Remove underflow – sibling nodes borrow elements
 The nodes of the element must be equal to
⌈ ⌉  2 m / 2
 At least one sibling of an underflow node, if any
⌉ ⌈ m / 2
You can borrow one element  Inserts the parent node element into the minimum position of the underflow node
 Replace the element of the parent node with the element of the sibling node
 That’s the rotation
 The nodes of the element must be equal to

Delete underflow – Merge left and right child nodes
 If the node overflows after the number of elements of the neighboring sibling node
⌈ ⌉  1 m / 2
个  The element of the parent node is removed and merged with the left and right child nodes
 This operation can cause underflow to propagate all the way up
 If the node overflows after the number of elements of the neighboring sibling node
3.4.4 Animation presentation recommended
In order to better understand the B tree to recommend a website: www.cs.usfca.edu/~galles/vis…
3.4.5 Btree correlation algorithm
Problem set for related algorithms
3.5 RedBlack TREE
Red – black tree (Red – black tree) is a selfbalanced binary search, is used in computer science, a data structure, the typical use is to implement associative arrays. It was invented by Rudolf Bell in 1972 and is known as the “symmetric binary B tree”, and its modern name comes from a paper written by Leo J. Guibas and Robert Sedgwick in 1978. Redblack trees have a complex structure, but their operations have a good worstcase runtime and are efficient in practice: it can do lookup, insert, and delete in O(logn) time, where n is the number of elements in the tree.
Redblack is the equivalent of a twothreefour tree. In other words, for every 234 tree, there exists at least one redblack tree whose data elements are in the same order.
The following redblack tree will also involve some concepts of Btree. If you have not seen the suggestions of Btree, first scroll up to browse through the basic operations of btree, such as adding and deleting.
3.5.1 Properties of redblack trees
A redblack tree is a binary lookup tree where each node has a color attribute, either red or black. In addition to the general mandatory requirements for binary lookup trees, we add the following additional requirements for any valid redblack tree:

Nodes are red or black

The root node must be black

All the leaf nodes are black. (Leaves are null nodes)

Two children of each red node are black.
 The parent of the red node must be black
 There cannot be two consecutive red nodes on all paths from each leaf node to the following

All paths from any node to each leaf node have the same number of black nodes.
All we need to do to maintain a redblack tree is satisfy these five properties.
In fact, redblack trees can also be converted to twothreefour trees, that is, fourthorder Btrees. The following operations, such as adding and deleting, will be clearer if we combine them with twothreefour trees to understand them
3.5.2 Add – Nature Maintenance
Due to the nature of Btree, the new element must be added to the leaf node and the number of elements in the fourthorder Btree is 1 to 3 (nonleaf nodes will be replaced by the precursor or successor nodes of the node and then deleted).
We first add the node as a binary search tree and mark it red. (If set to black, there will be an extra black node on the root to leaf path, which is difficult to adjust. However, setting it to red may result in a collision between two consecutive red nodes, which can be adjusted by color flips and tree rotation. What you do next depends on the color of the other neighboring nodes. As in the human family tree, we will use the term uncle node to refer to the siblings of a node’s parent. Note:
 Properties one and three are always the same.
 Property 4 is only threatened when adding red nodes, redrawing black nodes to red, or doing rotations.
 Property 5 is only threatened when adding black nodes, repainting red nodes to black, or doing rotations.
 Color the root node black if added
The addition of redblack trees is the following13A position I will base onColor order
Before this, we need to understand the network between nodes of a binary tree, as shown in the figure below:
Pay attention to
All new nodes default to red
func afterAdd(_ node: Any){
guard let n = node as? TreeNode<Any> else { return } // If node is empty
insert_case1(n)
}
Copy the code
Case 1:
Assume that the new node, node, follows the tree and has no parent.
2. The nature of a violation:
 Property 2: The root node must be black
Solution:
func insert_case1(_ node:TreeNode<Any>) {
if node.parent = = nil {
dyed_black(node)
}else{
insert_case2(node)
}
}
Copy the code
Situation 2:
Assume that the parent of the new node is black (node is red), so all RBTree properties are satisfied
2. The nature of a violation:
 There is no
Solution:
func insert_case2(_ node:TreeNode<Any>) {
if isBlack(node.parent) {
return
}else{
insert_case3(node)
}
}
Copy the code
Case 3:
Assume that the parent and uncle of the new node are red (the newly inserted node is red)
The following example is the new node 10 that was added
2. The nature of a violation:
 Property 4: Two children of each red node are black.
 The parent of the red node must be black
 There cannot be two consecutive red nodes on all paths from each leaf node to the following
Solution:

Color the parent node and uncle node black
In this case, adding nodes to the Btree does not meet the characteristics of the Btree, and the middle nodes need to be merged and split upward

Guard is merged and disaggregated upward

And that might cause the parent node to become something that doesn’t fit the redblack tree and you just treat grand as if it’s a new node that’s added and you just do it again
In this case, the root node was insert_case1(dyed_red(node.grand)), which dyed the root node red
func insert_case3(_ node:TreeNode<Any>) {
// Parent must be red
if isRed(node.uncle){ // Parent and uncle are red
dyed_black(node.parent)
dyed_black(node.uncle)
afterAdd(dyed_red(node.grand))
}else{ // Uncle is not red or missing
insert_case4(node)
}
}
Copy the code
Situation 4:
Suppose the parent node is red and the uncle node is black or missing,
Node is the left child of the parent node and the parent node is the right child of the Grand
or
Node is the right child of the parent node and the parent node is the left child of the Grand
2. The nature of a violation:
 Property 4: Two children of each red node are black.
 The parent of the red node must be black
 There cannot be two consecutive red nodes on all paths from each leaf node to the following
Solution:
If: node is the left child of parent and the parent node is the right child of grand
Rotate the node parent to the right
If: node is the right child of parent and the parent node is the left child of grand
Rotate the node parent to the left
And then we’re going to be in case 5 and we’re just going to do what we did in case 5
Pay attention to
At this point, the node has changed and the node needs to be rotated
func insert_case4(_ node:TreeNode<Any>) {
// Uncle is not red or missing
var tempNode = node
if isLeftChild(node) && isRightChild(node.parent){ // LR
rotate_left(node.parent)
tempNode = node.left!
}else if isRightChild(node) && isLeftChild(node.parent){ // RL
rotate_right(node.parent)
tempNode = node.right!
}
// Note that the node value has changed in case 5
insert_case5(tempNode)
}
Copy the code
Situation 5:
Suppose the parent node is red and the uncle node is black or missing,
Node is the left child of the parent node and the parent node is the left child of grand
or
Node is the right child of parent and the parent node is the right child of grand
This is going to be the same thing as case 4 after rotation
2. The nature of a violation:
 Property 4: Two children of each red node are black.
 The parent of the red node must be black
 There cannot be two consecutive red nodes on all paths from each leaf node to the following
Solution:
Color the node parent black and the node grand red
If: node is the right child of parent and the parent node is the right child of grand
The Node grand is rotated to the left
If: node is the left child of parent and the parent node is the left child of grand
The Node grand rotates to the right
func insert_case5(_ node:TreeNode<Any>) {
// Uncle is not red or missing
dyed_black(node.parent)
dyed_red(node.grand)
if isLeftChild(node) && isLeftChild(node.parent){ // LL
rotate_right(node.grand)
}else if isRightChild(node) && isRightChild(node.parent){ // RR
rotate_left(node.grand)
}
}
Copy the code
3.5.3 Delete – Nature Maintenance
Deleting nodes also maintains five properties of redblack trees
In a 234 tree (fourthorder Btree) our deletion operation will always be at the leaf node (when deleting nonleaf nodes, we will find the precursor or successor nodes to replace and delete). So if you think about it in terms of deleting a redblack tree, it’s the same thing that deleting a node is done on a leaf node.
Note: afterRemove has been modified for nodes with degree 1 in the binary tree
AfterRemove (node) if the afterRemove is passed by nonleaf nodes; The value of node is a precursor or successor of the current node
func remove(_ n:BSNode<Any>? {
guard var node = n else { return } // If node is empty
size  = 1
if node.hasTwoChild() {
let succeeding = successor(node) // Subsequent nodes
node.value = succeeding.value
node = succeeding
}
let replacemrnt = node.left ! = nil ? node.left : node.right
if replacemrnt ! = nil { // Node is a node of degree 1 and only appears at level 2
replacemrnt?.parent = node.parent
/ /... code
afterRemove(replacemrnt!)}else if node.parent = = nil { // Node is a leaf node and a root node
/ /... code
afterRemove(node) // Other tree processing methods
}else{ // Node is a leaf node but not a root node
/ /... code
afterRemove(node)
}
}
/// RBTree red black tree to handle all cases after deletion
func afterRemove(_ node: Any){
guard let n = node as? TreeNode<Any> else { return } // If node is empty
delete_case1(n) // Start the situation in the redblack tree
}
Copy the code
Case 1:
Assume that the child node passed in for replacement is red (the deleted node is red or the deleted node is black but degree 2)
2. The nature of a violation:
 Property 2: None
Solution:
If 25 is deleted, it must be his precursor or successor, 17 or 33. Both nodes are red, so just black them.
func delete_case1(_ node:TreeNode<Any>) {
if isRed(node){
dyed_black(node)
return
}else{
delete_case2(node)
}
}
Copy the code
Situation 2:
Assume that the root node is deleted
2. The nature of a violation:
 Property 2: None
Solution:
There must be only root nodes in a redblack tree if the node is the root node.
func delete_case2(_ node:TreeNode<Any>) {
if node.parent = = nil {
return
}else{
delete_case3(node)
}
}
Copy the code
Case 3:
Assumptions:
 Remove the black child node
 The child node on the right is deleted
 The sibling of the deleted node is red
2. The nature of a violation:
 Property 5: All paths from any node to each leaf node have the same number of black nodes.
Solution:
Dye Sibling black and parent red first
Then rotate the parent right so that the child of 55 is 88 and the left child of 88 is 76
And then we’re in case 4 or case 5 and we’re just going to do it in case 4 or case 5
// Delete the black child && delete the right child && delete the red sibling of the node
func delete_case3(_ node:TreeNode<Any>) {
// The node has already been deleted. So you can't get it by calling Sibling directly
// Is the deleted node left or right
/ / if the parent right = = nil Indicates that the node is deleted   deleted if right child nodes
let isRight = node.parent?.right = = nil  isRightChild(node)
// Sibling nodes
var sibling = isRight ? node.parent?.left: node.parent?.right
if isRight { // Delete the node on the right
// Sibling nodes are red
if isRed(sibling) {
dyed_black(sibling)
dyed_red(node.parent)
rotate_right(node.parent)
sibling = node.parent?.left
}
delete_case4(node,sibling!)}else{
delete_case5(node, sibling!)}}Copy the code
Situation 4:
Assumptions:
 Remove the black child node
 The child node on the right is deleted
 The sibling of the deleted node is black
 The children of sibling nodes are black
2. The nature of a violation:
2. The nature of a violation:
 Property 5: All paths from any node to each leaf node have the same number of black nodes.
Solution:
Dye parent black and Sibling red
If the original parent is black, it will cause the parent to overflow. In this case, the parent should be treated as the newly deleted node.
func delete_case4(_ node:TreeNode<Any>, _ s:TreeNode<Any>) {
// The children of sibling nodes are black
var sibling = s
// The left and right sides of sibling nodes are black or sibling nodes are children
if isBlack(sibling.left) && isBlack(sibling.right) {
let parentBlack = isBlack(node.parent)
dyed_black(node.parent)
dyed_red(sibling)
if parentBlack {
afterRemove(node.parent!)}}else{
// There are red children around the sibling node
delete_case5(node, sibling)
}
}
Copy the code
Situation 5:
Assumptions:
 Remove the black child node
 The child node on the right is deleted
 The sibling of the deleted node is black
 Sibling nodes have red child nodes
2. The nature of a violation:
 Property 5: All paths from any node to each leaf node have the same number of black nodes.
Solution:
And we can think about it in terms of two or threefour order B trees and the way that we’re going to do it in B trees is we’re going to do the same thing with underflows
Example 1: After the overspill, we make a right rotation and inherit the color of the rotated gold node from the color of the parent node
Case 2: After the overflow in the figure, we first make a left rotation and then a right rotation and inherit the color of the rotated golden node from the color of the parent node
Case 3: The situation shown here can be chosen to be similar to case 1 or case 2 because case 1 rotates once so we choose to deal with it in the same way as case 1.
func delete_case5(_ node:TreeNode<Any>, _ s:TreeNode<Any>) {
var sibling = s
if isBlack(sibling.left) {
rotate_left(sibling)
sibling = (node.parent?.left)!
}
if node.parent?.color = = RED {
dyed_red(sibling)
}else{
dyed_black(sibling)
}
dyed_black(sibling.left)
dyed_black(node.parent)
rotate_right(node.parent)
}
Copy the code
Situation 6:
Assumptions,
 Remove the black child node
 The child node on the left is deleted
 The sibling of the deleted node is black
 The children of sibling nodes are black
The only difference from case 3 is whether the left or right child is deleted.
As opposed to the rotation and sibling fetch in case 4, I’ll just attach the code here.
2. The nature of a violation:
 Property 5: All paths from any node to each leaf node have the same number of black nodes.
Solution:
// Delete the left side
func delete_case6(_ node:TreeNode<Any>, _ s:TreeNode<Any>) {
var sibling = s
// Sibling nodes are red
if isRed(sibling) {
dyed_black(sibling)
dyed_red(node.parent)
rotate_left(node.parent)
sibling = (node.parent?.right)!
}
delete_case6(node, sibling)
}
Copy the code
Case 7:
Assumptions:
 Remove the black child node
 The child node on the left is deleted
 The sibling of the deleted node is black
 The children of sibling nodes are black
The only difference from case 4 is whether the left or right child is deleted.
As opposed to the rotation and sibling fetch in case 4, I’ll just attach the code here.
2. The nature of a violation:
 Property 5: All paths from any node to each leaf node have the same number of black nodes.
Solution:
func delete_case7(_ node:TreeNode<Any>, _ s:TreeNode<Any>) {
let sibling = s
// The sibling node is black
if isBlack(sibling.left) && isBlack(sibling.right) {
let parentBlack = isBlack(node.parent)
dyed_black(node.parent)
dyed_red(sibling)
if parentBlack {
afterRemove(node.parent!)}}else{
delete_case8(node, sibling)
}
}
Copy the code
Case 8:
Assumptions:
 Remove the black child node
 The child node on the right is deleted
 The sibling of the deleted node is black
 Sibling nodes have red child nodes
This differs from case 5 only in whether the left or right child is deleted.
As opposed to the rotation and sibling fetch in case 4, I’ll just attach the code here.
2. The nature of a violation:
 Property 5: All paths from any node to each leaf node have the same number of black nodes.
Solution:
func delete_case8(_ node:TreeNode<Any>, _ s:TreeNode<Any>) {
var sibling = s
if isBlack(sibling.right) {
rotate_left(sibling)
sibling = (node.parent?.right)!
}
if node.parent?.color = = RED {
dyed_red(sibling)
}else{
dyed_black(sibling)
}
dyed_black(sibling.right)
dyed_black(node.parent)
rotate_left(node.parent)
}
Copy the code
3.5.4 Complexity analysis of redblack tree
AVL tree:
 The balance criterion is strict: the height difference between the left and right subtrees cannot exceed 1 (balance factor).
 Maximum height: 1.44 * log2(N +2) 1.328 (100W nodes, maximum height of AVL tree is 28)
 Search, add, and delete are all O(logn) complexity, where add requires only O(1) rotation adjustments, but delete requires at most O(logn) rotation adjustments
Red and black tree:
 The balance criteria are loose: no path is twice as large as any other
 Maximum height 2 * log2(n+1) (100W nodes, maximum height of redblack tree is 40)
 Search, add, and delete are all O(logn) complex, where add and remove require only O(1) rotation adjustments.
How to choose:
 The number of searches is much greater than the number of insertions and deletions
 Search, insert, delete almost the same number of times, choose red black tree
 Compared with AVL trees, red black trees sacrifice the criterion of balance to recover the rotation times when the region is balanced is better than AVL trees
 The average statistical performance of redblack trees is better than that of AVL trees, and most of them are selected in practical applications.
3.5.5 Redblack tree correlation algorithm
Problem set for related algorithms
3.6 Binary Heap
A binary heap is a special kind of heap. A binary heap is either a complete binary tree (binary tree) or a nearly complete binary tree (binary tree).
There are two types of binary heap: maximum heap and minimum heap.
Maximum heap: the key value of the parent node is always greater than or equal to that of any child node;
Minimum heap: The key value of the parent node is always less than or equal to the key value of any child node.
Application scenario: Get the maximum value in data (TopK problem)
way  Get maximum value  Delete maximum value  Add elements  

Dynamic array/bidirectional list  O(n)  O(n)  O(1)  
Ordered dynamic array/twoline associative table  O(1)  O(1)  O(n)  
Balanced binary search tree  O(logn)  O(logn)  O(logn)  
The heap  O(1)  O(logn)  O(logn) 
An important property of the heap:
 The value of any node that is always greater than or equal to its children is called the maximum heap, the big root heap, and the big top heap
 The value of any node that is always less than or equal to the value of the child node is called the minimum heap, the root heap, and the top heap
Due to the nature of complete binary trees, the underlying implementation of binary heap generally uses arrays
 The index
i
(n is the number of elements) If I = 0, it’s the root node
 If I > 0, the index of its parent is floor((i1)/2).
 If 2i+1 <= n1, his left child has an index of 2i+1
 If 2i plus 1 is greater than n minus 1, he has no left child
 If 2i+2 <= n1, its right child has an index of 2i+2
 If 2i plus 2 is greater than n minus 1, it has no right child
Recommended reading: data structure  weibo Top 10 hot search is how calculated? (Binary heap)
3.6.1 Heaprelated algorithms
TopK problem
Trie tree
Also known as word search tree, Trie tree, is a tree structure, is a variant of the hash tree. It is typically used to count, sort and save large numbers of strings (but not limited to strings), so it is often used for text word frequency statistics by search engine systems. Its advantages are that it reduces query time by using the common prefix of strings, minimizes unnecessary string comparisons, and improves query efficiency over hash trees.
Dictionary items can be searched by:
(1) Start a search from the root node;
(2) Obtain the first letter of the keyword to be searched, select the corresponding subtree according to the letter, and go to the subtree to continue the search;
(3) In the corresponding subtree, obtain the second letter of the keyword to be searched, and further select the corresponding subtree for retrieval.
(4) Iterative process……
(5) At a node, all the letters of the keyword have been taken out, then read the information attached to the node, that is, to complete the search.
Other operations are similar
Advantages of Trie: The efficiency of prefix search is mainly related to prefix length
The Trie’s downside: It consumes a lot of memory
3.8 Huffman Tree
Given N weights as N leaf nodes, a binary Tree is constructed. If the weight path length of the Tree reaches the minimum, such a binary Tree is called an optimal binary Tree, also known as Huffman Tree. Huffman tree is a tree with the shortest weight path length, and the node with higher weight is closer to the root.
Example:
 The Haverman tree is the basis of current compression algorithms
 Suppose you want to take the string ABBBCCCCCCCCDDDDDDEE
 Can be converted to ASCII code (65
69， 10000011000101), but it can be very long. How can it be short?  The convention letters correspond to binary A: 000, B:001, C:010, D:011, E:100
 Binary code: 000 001001001 010010010010010010 011011011011011 100100
 Twenty letters were converted into sixty bits
 Can be converted to ASCII code (65
 This can be compressed to 41 bits using Huffman trees
 Calculate the frequency of each letter A:1, B:3, C:8, D:6, E:2
 These weights are used to construct a Haverman tree

How to build a Haverman tree?
 N binary trees were constructed with weight values as root nodes to form a forest
 In the forest, two trees with the smallest root nodes are selected and combined as the left and right subtrees of a new tree, and the root node of the new tree is the sum of its left and right subroot nodes
 Delete the two selected trees from the forest and add the new tree to the forest
 Repeat 2 and 3 until the forest becomes a tree.

Left is 0 and right is 1, which gives you five letters
A: 1110, B:110, C:0, D:10, E:1111
Huffman encoding is: 11110110110110000000001010101010101111

Conclusion:

The Huffman tree with n weights has that leaf node

Each Huffman code is not a prefix to another Huffman code

Huffman tree is a tree with the shortest weight path length, and the node with higher weight is closer to the root node
Weighted path length: The total length of the final Huffman code is proportional to the weight of all leaf nodes in the tree multiplied by the length of their path to the root node.

algorithm
.
.
Reference:
 Love data Structures and Algorithms season 1
 Blog.csdn.net/yt618121/ar…
 Blog.csdn.net/yt618121/ar…
 baike.baidu.com
 Juejin. Cn/post / 684490…
 www.wikipedia.org
 www.conardli.top/docs/dataSt…
 Data structure  weibo Top 10 hot search is how calculated? (Binary heap)