background

Last week, a student had an interview. The process was as follows:

Interviewer: Java development, three years, which Java collections are familiar with?

Classmates: ArrayList, HashMap, TreeMap, LinkedList….. (Quite a lot).

Interviewer: Tell us the difference between HashMap in JDK7 and JDK8.

S: There are four main differences:

1, implementation: JDK7 using array + list to achieve, JDK8 using array + list + red black tree

New nodes are inserted into the list in a different order: JDK7 at the head and JDK8 at the tail

3. The HASH algorithm of JDK8 has been simplified

4. Capacity expansion mechanism has been optimized

(At this point, the student thought, fortunately, I have memorized Mr. Tian’s interview cheat sheet Java set part)

Interviewer: Since you know JDK8 has introduced red and black trees, I have black and red pens here, you draw a red and black tree on this wall (small meeting room is glass wall).

Classmate :(hell, I don’t know that, he will also ask the characteristics of red and black, red and black tree and binary tree what is the difference, red and black rotation, add, delete… , pharynx pharynx saliva to say) forehead…. That’s not so good.

There are a lot of things that you might think are different, or at most why a HashMap is not thread safe, or how a HashMap can be expanded. Have you ever had a similar experience?

Today we are going to look at the red black tree that the interviewer stumped the student. In fact, when it comes to red-black trees, we engaged in Java development more or less know something, for a large number of friends are familiar with the guy but also strange. It’s familiar because many of you have probably studied it, seen a lot of articles about red black trees, and we’ve all used them either directly or indirectly. Strange because red-black trees, like Spring’s source code, are daunting to many because of the complexity of their design and implementation.

In general, the definition of a red-black tree is fairly simple, but you need to know the basics, like what is a tree? How do trees traverse? Binary tree? Balanced binary tree? .

Red-black trees are just a little bit more self-balancing between inserts and deletions, but it’s not as scary as you might think, and we’re going to fix that today.

This paper focus on

We started with the concept of trees, went to binary trees, balanced binary trees, and finally red black trees. Actually the main difficulties with the red-black tree is self balancing adjustment in the process of insertion and deletion, insertion process adjustment is relatively simple, delete process to deal with the situation of some more, but whether inserted or deleted, all readers are advised to keep all the figure placed together to observe, to find the secret of the continuity, this article in view of the long length is no longer posted.

The tree

Trees in Life

Trees in data structures

Tree structure is a nonlinear storage structure that stores a collection of data elements with a one-to-many relationship.

As you can see, the trees in our life are exactly the same, but it looks like one is in front and the other is down. Backward knowledge for ease of understanding.

Above the tree is to use A tree structure to store A collection of {A, B, C, D, E, F, G, H, I, J, K, L, M}. For data A, relate to data B, C, and D; For data B, it is related to E and F. This is a one-to-many relationship.

The nodes of the tree

“Node” : Each data element stored using a tree structure is called a “node”. For example, in the diagram, the data element A is A node;

“Parent (parent), child, and sibling” : For nodes A, B, C, and D in the figure, A is the parent of B, C, and D (also called “parent”), while B, C, and D are all children of A (also called “child”). For B, C, and D, they all have the same parent, so they are siblings of each other.

Root node (” root “for short) : Every non-empty tree has one and only one node called the root. In this case, node A is the root of the whole tree.

“Leaf node” : If a node has no children, it is called a leaf node (leaf node). For example, in the figure, nodes K, L, F, G, M, I and J are leaf nodes of the tree.

If a node has no parent, it is the root of the whole tree.

subtree

Subtree: As shown in the figure, the root of the whole tree is node A, which is also A tree if you only look at the parts of nodes B, E, F, K and L, and node B is the root of the tree. Therefore, the tree composed of nodes B, E, F, K and L is called the sub-tree of the whole tree. Similarly, the nodes E, K, and L form a subtree, and the root is E.

Note that a single node is also a tree, except that the root node is itself. In Figure 1, nodes K, L, F, etc. are trees, and they are subtrees of the whole tree.

Once you know the concept of a subtree, you can also define a tree as consisting of a root node and several subtrees.

Empty tree

Empty tree: If the set itself is empty, the resultant tree is said to be empty. There are no nodes in an empty tree.

Binary tree

A binary tree is an ordered tree with at most two children per node. Usually the left subtree is called the left subtree and the right subtree is called the right subtree.

A simple diagram of a binary tree is as follows:

Left subtree and right subtree

Usually the left subtree is called the left subtree and the right subtree is called the right subtree. Ordered trees emphasize that the order of the left and right subtrees of a binary tree cannot be arbitrarily reversed.

Definitions in Java

public class Node {
    T data;
    Node left;
    Node right;
}

Copy the code

Definitions in C++

struct TreeNode {
    int val;
    TreeNode *left;
    TreeNode *right;
    TreeNode(int x) : val(x), left(NULL), right(NULL) {}};Copy the code

Sort binary trees

The so-called sorted binary tree, as the name implies, sorted binary tree is ordered, it is a special structure of the binary tree, we can sort and retrieve all nodes in the tree.

The nature of the
  • If its left subtree is not empty, the values of all nodes in the left subtree are less than the values of its root node.

  • If its right subtree is not empty, the values of all nodes in the right subtree are greater than the values of its root node.

  • With recursion, the left and right subtrees of sorted binary trees are also sorted binary trees.

Simple diagram of sorting binary tree:

Sorted binary trees degenerate into linked lists

The value of all nodes in the left subtree of sorted binary tree is less than the value of the root node, and the value of all nodes in the right subtree is greater than the value of the root node. When we insert a group of elements that are exactly ordered, sorted binary tree will degenerate into a linked list.

Normally, a sorted binary tree would look like this:

However, when the inserted set of elements happens to be in order, the sorted binary tree looks like this, as shown in the following figure:

Under normal circumstances, the sorting binary tree retrieval efficiency is similar to binary search, the time complexity of binary search is O(log n), but if the sorting binary tree degenerated into a linked list structure, then the retrieval efficiency becomes linear O(n), so compared with O(log n), the retrieval efficiency must be much worse.

Think about it, binary search and normal sorted binary trees are order log n in time, so why order log n?

In order to solve the problem that sorted binary trees will degenerate into linked lists under special circumstances (the retrieval efficiency of linked lists is O(n) much worse than that of normal binary trees), so someone invented balanced binary trees similar to red black trees.

Balanced binary trees

Balanced binary trees are also known as AVL (Adelson-Velsky and Landis) trees.

Definition: It is an empty tree or the absolute value of the difference in height between the left and right subtrees is no more than 1, and both subtrees are balanced binary trees.

Two conditions:

  • A balanced binary tree must be a sorted binary tree, that is, the values of all nodes in the left subtree of a balanced binary tree must be less than the values of the root node, and the values of all nodes in the right subtree must be greater than the values of the root node.

  • The absolute value of the depth difference between the left and right subtrees is no more than 1.

Red and black tree

Ha ha ha, this time, finally round I pour.

A red-black tree is a self-balancing binary tree. This means that during insertions and deletions, a red-black tree adjusts the organization of the tree to minimize the height of the tree and save searching time.

Red-black trees have the following characteristics:

  1. The nodes are red or black

  2. The root is always black

  3. Leaf nodes (NIL nodes) are all black

  4. The two immediate children of the red node are black (i.e., there are no two consecutive red nodes on any path from leaf to root).

  5. All simple paths from any node to each leaf contain the same number of black nodes

Many people, it is estimated that in the interview when holding Buddha jio, back the red and black tree of these five characteristics.

Above nature guarantees the red-black tree on the premise of meet the balanced binary tree characteristics, also can be done from the root to the leaves of the longest path most can’t more than twice the shortest path, this is mainly consider two extreme cases, 4 and 5 by nature we can know that in a red and black tree all the shortest path from root to leaf consists of black nodes, The longest node is made up of interleaved red and black nodes (always organized in a red and black order), and because the number of black nodes on the shortest path and the longest path is the same, the number of nodes on the longest path is twice as many as the number of nodes on the shortest path.

Self-balancing strategy

The basic operation of a red-black tree is nothing more than add, delete, change and search, which will not change the structure of the tree, so it is the same as the ordinary balanced binary tree operation. All that is left is adding and deleting. Insertion and deletion destroy the structure of the tree, but with a certain balancing strategy the tree can be redefined. Balancing strategies can be summarized in three simple ways: left rotation, right rotation, and color change. After inserting or deleting nodes, we can finally redefine the tree as long as we do all three operations along the path from node to root.

“Left rotation”

For the current node, if the right child is red and the left child is black, then the left rotation is performed, as shown below:

“Right rotation”

For the current node, if the left child and left grandchild are both red, the right rotation is performed, as shown below:

“Color”

For the current node, if both the left and right children are red, the color change will be performed, as shown in the figure below:

The insert

As a kind of balanced binary tree, red-black tree also needs to locate the insertion point by means of search operation. However, red-black tree stipulates that the newly inserted nodes are all red, which is mainly to simplify the self-balancing process of the tree. For a hollow tree, insert node for red will increase a color changing operation, but for the rest of the situation, if the node is inserted a black node, then 5 will inevitably destroy nature, and insert a node is likely to destroy the nature red 4, but at this point we can through the simple strategy to adjust tree definition to meet again.

We agree that X is the inserted node, P is the parent of X, G is the grandfather of X, and U is the uncle of X.

The following discusses the insertion process according to the above policies and scenarios:

1. The newly inserted node X is the root node.

Now the newly inserted node is red, contrary to property 2, so you just change it to black.

2. The parent node P of newly inserted node X is black

At this point, there are two cases according to the value of X of the newly inserted node relative to the parent node P. If it is less than, simply insert X into the left child position of P (left in the following figure); if the value of X is greater than P, insert X into the right child position of P and then perform a left rotation (right in the following figure).

3. The parent node P is red, and the existing uncle node U is also red

Since P is red, G must be black according to property 4. If the value of X is less than P, it needs to be inserted at the left sub-position of P (as shown in the figure below). After insertion, property 4 is not satisfied. So the length of the path goes down by 1, but since P and U are black, the length of the path goes up by 1, so it stays the same, but G turns red, so you keep recursing up.

If the value of X is greater than P, it needs to be inserted at the right subposition of P (as shown in the figure below). If property 4 is not satisfied after insertion, it needs to perform left rotation to change to the situation above and continue to change color.

4. The parent node P is red, while the uncle node U is black or does not exist

Since P is red, G must be black according to property 4. If the value of X is less than P, it needs to be inserted at the left subposition of P (as shown below). If the insertion does not meet property 4, it needs to perform a right rotation first. In this case, you need to perform another color change operation to meet the definition.

If the value of X is greater than P, it needs to be inserted at the right subposition of P (as shown in the figure below). After insertion, property 4 is not satisfied. In this case, we need to perform a left rotation, and then the above situation is transformed.

Delete operation

Red-black tree as a balanced binary tree, also need to use to locate operations to delete points, we need to decide before delete nodes have several children, is to be deleted if are two words we need most to find value from the left subtree of the nodes of the node, or from the right look for value is the smallest nodes in the subtree, Replace the node to be deleted with its value (as long as the value of the target node disappears from the tree, regardless of which node is deleted). The two nodes have a common, that is, at most only one child node (because it is already the largest and smallest in the range of their own, a mountain can not allow two tigers (rats)), at this time will need to change to delete only a child node node, relatively much simpler.

We agree that X is the node to be deleted, P is the parent of X, S is the child of X, B is the sibling of X, BL is the left child of B, and BR is the right child of B.

  1. If the node X to be deleted is a red node, it can be deleted directly without violating the definition.

  2. If the node X to be deleted is a black node and its child node S is red, you only need to replace X with S and change S from red to black.

  3. If the node X to be deleted is black, and its child node S is also black, this situation needs to be further discussed in different scenarios.

In the third case, we first replace X with S and rename it N, which is X’s name for the elder and the younger, and we need to be clear that what we’re actually deleting here is node X, and the length of the path through N is reduced by 1.

1.N is the new root

This case is relatively simple and does not require any further adjustments.

2.N’s parent node, sibling node B, and child node B are all black

As shown in the figure below, all you need to do is to change B to red, so that all paths through B are reduced by 1, which is exactly the same as all paths through N, but all paths through P are reduced by 1, so you need to continue to determine node P recursively upward.

3. Brother node B of N is red, and other nodes are black

In the image below, you need to perform a left rotation and then switch the colors of P and B. The path of each node does not change before and after adjustment, but because the length of the path passing through N is one unit less, it still does not meet the definition and needs to be adjusted according to the later scene.

4. The parent node P of N is red, the sibling node B, and the child node of B are black

In this case, we simply swap the colors of P and B. In this case, the path of nodes that do not pass through N is not affected, but the path of nodes that do pass through N is increased by 1, which makes up for the loss caused by the previous deletion.

5.N’s sibling B is black, B’s left child is red, and B’s right child is black

Below, we need to implement a right rotation operation, then exchange B with the color of the BL, after operation through all the nodes in the path length does not change, but let N has a new black brothers node, and the brothers node’s right child to red, which can be a scene to introduce according to the next.

Note: a white node indicates that the node can be either black or red, as is the case in subsequent illustrations.

6.N’s sibling B is black, and B’s right child is red

As shown in the figure below, we need to perform a left rotation first, swap the colors of P and B, and change the right child of B to black. After the change, the path length of all nodes except N remains unchanged, but a black node is added to the path passing through N, which just makes up for the loss caused by the previous deletion operation.

Highly recommend this site

www.zhenchao.org

“Highly recommend this site” :

www.cs.usfca.edu/~galles/vis…

A more intuitive illustration of the insert and delete process:

Make sure you like it, thank you!