= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = | |

Welcome to discuss the technology can be added to wechat: Windgs (please note juejin+ XX occupation)

= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = | |

BST (binary search/sort tree) BST (binary search/sort tree) Or a binary tree with the following properties: (1) If its left subtree is not empty, then the values of all nodes in the left subtree are less than the values of its parent node; (2) If its right subtree is not empty, the value of all nodes in the right subtree is greater than that of its parent node; (3) Its left and right subtrees are also binary sorting trees respectively. Traversal: it is divided into pre-order traversal, middle-order traversal (to get an ordered set), and subsequent traversal

 

1. All non-leaf nodes have at most two sons (Left and Right);

2. All nodes store a keyword.

3. The left pointer of a non-leaf node points to a subtree smaller than its keyword, and the right pointer points to a subtree larger than its keyword.

BST tree search, starting from the root node, if the query keyword is equal to the keyword of the node, then hit;

Otherwise, if the query key is smaller than the node key, the left son is entered; If it is larger than the node key, enter

Son to the right, If the pointer to the left son or the right son is null, the report cannot find the corresponding keyword.

If the number of nodes in the left and right subtrees of all non-leaf nodes of BST tree is about the same (equilibrium), then B tree

Search performance approximates binary search; But its advantage over binary search of continuous memory space is that it changes the BST tree structure

(insert and delete nodes) without moving large chunks of memory data, often with constant overhead;

The right-hand side is also a BST tree, but its search performance is already linear; The same set of keywords may lead to different

Tree structure index; Therefore, when using BST trees, we should also consider keeping the structure of the BST tree on the left as much as possible, and avoiding the structure of the BST tree on the right

It’s a question of “balance”;

A self-balancing binary search tree is also known as an AVL tree (as distinct from AVL algorithm). It is an empty tree or the absolute value of the height difference (balance factor) between its left and right subtrees is no more than 1. And the left and right subtrees are a balanced binary tree, the balanced binary tree must be a binary search tree, not necessarily vice versa

Balance factor (balance degree) : a binary sort tree whose balance degree is 1, i.e. the balance factor of each node is 1, -1 and 0. Or a binary sort tree where the heights of the left and right subtrees of each node differ by at most 1.

The purpose of balancing binary tree is to reduce the binary search tree level and improve the search speed

The common implementation methods of balanced binary tree include AA tree, AVL tree, red-black tree, Treap tree, stretch tree, etc

AVL tree is highly balanced, frequent insertion and deletion will cause frequent reblance, leading to efficiency reduction. Red-black tree is not highly balanced, which is a compromise, insertion can be up to two rotations, deletion can be up to three rotations

AVL balanced binary search tree is defined as either an empty tree or a binary sorting tree with the following properties: (1) The absolute value of the difference between the depth of left and right subtrees is less than 1; (2) The left and right subtrees are still balanced binary trees. Equilibrium factor BF= left subtree depth – right subtree depth. The equilibrium factor of a balanced binary tree can only be 1, 0, -1. If its absolute value is greater than 1, the binary sorting tree is unbalanced. The diagram shows balanced tree and non-balanced tree:

Red Black Tree r-B Tree, full name is Red Black Tree, it is a balanced binary Tree. Red-black trees have memory bits on each node to indicate the color of the node, which can be Red or Black. The characteristics of red-black trees are as follows: (1) Each node is either black or red. (2) The root node is black. (3) Each leaf node (NIL) is black. [Note: leaf nodes are NIL or NULL leaf nodes!] (4) If a node is red, its children must be black. (5) All paths from a node to its descendants contain the same number of black nodes.

Note: Leaf nodes in feature (3) of (01) are NIL or null nodes only. (02) Feature (5) ensures that no path is twice as long as any other. Therefore, red-black trees are relatively nearly balanced binary trees

A balanced tree of Order M is a balanced m-way search tree. It is either an empty tree or a tree that satisfies the following properties: 1.

2. The number of keywords j in each non-root node was j: chrysene M /2 erres-1 <= j <= m-1;

3. The degree of all nodes except the root node (excluding the leaf node) is just the total number of keywords plus 1, so the number of internal sub-tree k meets the following requirement: chrysene m/2 gp <= k <= m;

4. All leaf nodes are in the same layer.

Features: a multipath search tree (not binary) :

1. Define that any non-leaf node has at most M sons; And M > 2;

2. The number of sons of the root node is [2, M];

3. The number of sons of non-leaf nodes except root nodes is [M/2, M];

4. Each node stores at least M/2-1 (rounded) and at most M-1 keywords; (At least 2 keywords)

5. Number of keywords of non-leaf nodes = number of Pointers to sons -1;

6. Keywords of non-leaf nodes: K[1], K[2]… [M, K – 1); And K[I] < K[I +1];

7. Pointers to non-leaf nodes: P[1], P[2]… P [M]; Where P[1] points to the keyword less than K[1]

Subtree, P[M] points to the subtree whose keyword is greater than K[m-1], and other P[I] points to the subtree whose keyword belongs to (K[i-1], K[I]);

8. All leaf nodes are located in the same layer;

M=3

B-tree has the following characteristics: 1. The keyword set is distributed in the whole tree;

2. Any keyword appears in only one node;

3. The search may end at non-leaf nodes;

4. Its search performance is equivalent to a binary search in the whole set of keywords;

5. Automatic hierarchical control;

A B+ tree is a tree data structure. It is an N-fork tree. Each node usually has multiple children. The root node may be a leaf node or a node containing two or more child nodes. Purpose: B+ trees are commonly used in databases and file systems of operating systems. File systems such as NTFS, ReiserFS, NSS, XFS, JFS, ReFS, and BFS all use B+ trees as metadata indexes. The characteristic of B+ tree is that it can keep the data stable and orderly, and its insertion and modification have relatively stable logarithmic time complexity. B+ tree elements are inserted from the bottom up.

Definition of A B+ tree A B+ tree is a variant of a B- tree required by the file system. The differences between a B-+ tree of order M and a B-tree of order M are as follows: 1. There are n keywords in the nodes of n subtrees. Each keyword does not store data, but is only used for indexing.

2. All leaf nodes contain information of all keywords and Pointers to records containing these keywords, and leaf nodes themselves are linked in large order according to the size of keywords.

3. All non-terminal nodes can be regarded as index parts, which only contain the maximum (or minimum) keyword in its sub-tree (root node). There are usually two header Pointers on a B+ tree, one to the root node and one to the leaf node with the smallest keyword.

A B+ tree is a variant of a B- tree and is also a multi-path search tree: 1. Its definition is basically the same as that of a B- tree, except that:

2. The number of subtree Pointers and keywords of non-leaf nodes is the same;

3. The subtree pointer P[I] of non-leaf node points to the subtree whose keyword value belongs to [K[I], K[I +1])

(B-tree is the open interval);

5. Add a chain pointer to all leaf nodes;

6. All keywords appear in leaf nodes;

Characteristics of B+ : 1. All keywords appear in the linked list of leaf nodes (dense index), and the keywords in the linked list are exactly

It’s ordered;

2. Impossible to hit on non-leaf nodes;

3. Non-leaf nodes are equivalent to the index of leaf nodes (sparse index), while leaf nodes are equivalent to storage

Key words: data layer of data;

4. More suitable for file indexing system;

6. B* tree: it is a variant of B+ tree. Pointers to brothers are added to non-root and non-leaf nodes of B+ tree.

B* tree defines that the number of keywords of non-leaf nodes is at least (2/3)*M, that is, the minimum usage of block is 2/3 (instead of 1/2 of B+ tree). B+ tree splitting: when a node is full, a new node is assigned and 1/2 of the original node’s data is transferred

Copy to the new node, and finally add a pointer to the new node in the parent node; Splitting a B+ tree affects only the parent and parent

Node, doesn’t affect its sibling, so it doesn’t need a pointer to its sibling;

B* tree splitting:

When one node is full, if its next sibling is not, then part of it

The data is moved to the sibling node, the keyword is inserted into the original node, and the keyword of the parent node is changed

(because the sibling’s keyword range has changed); If the sibling is also full, the parent node and the sibling node are matched

Add a new node between, and copy 1/3 of the data to the new node, and finally add a pointer to the new node in the parent node;

Therefore, the probability of B* tree allocating new nodes is lower than that of B+ tree, and the space utilization rate is higher.

A storage node can store multiple keys. 1. T-tree The T-tree has the following features:

① The difference between the left and right subtrees is less than 1

(2) Multiple keys can be stored in a storage node. The left-most and right-most keys are the minimum and maximum keys of the node, respectively. The left subtree contains only those records whose keys are less than or equal to the minimum, and the right subtree contains only those records whose keys are greater than or equal to the maximum

③ A node with both left and right subtrees is called an inner node, a node with only one subtree is called a half-leaf, and a node with no subtree is called a leaf

④ To maintain space utilization, each internal node needs to contain a minimum number of keys. It can be seen that T tree is a balanced binary tree with multiple keywords in each node. Keywords in each node are arranged in an orderly manner. The left subtree is smaller than the keyword of the root node, and the right subtree is larger than the keyword of the root node. The node structure of T tree above contains the following information: (1) balance(balance factor), whose absolute value is no more than 1, balance = right subtree height – left subtree height; (2) Left_child_ptr and Right_child_ptr represent the pointer to the left and right subtrees of the current node respectively; (3) Max_Item represents the maximum number of key values that can be contained in a node; (4)Key[0] to K[max_item-1] are the Key words stored in the node; (5)nItem is the actual number of keywords stored on the current node.

2.T tree index operation with T tree as an index to complete three main work: search, insert, delete. Both insert and delete are based on lookup. The following describes the flow of the three operations. (1) T find similar to the binary tree of the tree, the difference is that every node on the main elements of comparison is not for node values, but first check to find the target key values contained in the current node of the left and the right values determined range, if so, is used in the current node list of key/value dichotomy to look up; If the target key value is less than the left-most key value of the current node, the left child of the current node is searched similarly. If the target key value is greater than the right-most value of the current node, the right child of the current node is similarly searched. (2) T tree insertion is based on the search, and the search operation is applied to locate the target key value insertion position, and the last node encountered in the search process is recorded. If the search is successful, determine whether there is enough storage space in this node. If so, insert the target key into the node; Otherwise, insert the target key into the node, then insert the left-most key in the node into its left subtree (this is a recursive insert operation), and end; Otherwise, assign a new node and insert the target key; Then according to the relationship between the target key value and the maximum and minimum key value of the node, the newly allocated node is linked to the left or right child of the node. The tree is checked to determine whether the balance factor of T tree meets the conditions. If the balance factor does not meet the conditions, the rotation operation is performed.

(3) The deletion operation of T tree is also based on search, and the search operation is applied to locate the target key value. If the search fails, the system ends. Otherwise, let N be the node where the target key value resides, and delete the target key value from node N; After the node is deleted, if the node N is empty, the node N is deleted and the balance factor of the tree is checked to determine whether the rotation operation is needed. If the number of key values in node N is less than the minimum value, the maximum key value is removed from the left subtree of node N or the minimum value is removed from the right subtree according to the balance factor of N.

3.T tree index key technology to achieve T tree index that is to achieve T tree search, insert and delete. Which is based on the search, the maintenance of the T-tree is the rotation of the T-tree as the key. Rotation of the T-tree occurs when the tree is out of balance due to insertion or deletion of key values. Bring it back into equilibrium. In the insertion case, all nodes along the path from the newly created node to the root node need to be checked in turn until one of the following two conditions occur: the height of two subtrees of a node being checked is equal, and no rotation operation is required; If the height difference between two subtrees of a node being checked is greater than 1, only one rotation operation can be performed on the node. In the case of deletion, it is similarly necessary to check all nodes along the path from the parent node to the root node in turn. During the inspection, if the difference between the left and right subtree heights of a node is found to be out of bounds, a rotation operation is required. Unlike the insert operation, the checking process cannot be stopped after the rotation operation, but must continue until the root node is checked. It can be seen that, for the insertion operation, only one rotation operation is needed at most to restore the T-tree to the equilibrium state; A delete operation may cause an upward chain reaction that rotates the higher-level nodes and may require multiple rotations. In order to balance t-trees, rotation operation is needed. Rotation is the most critical and difficult operation in T-trees. The following is the technology of T-tree rotation.

There are four types of rotation: the rotation caused by the insertion (or deletion) of the left child’s left subtree is denoted as LL rotation, similar to LR, RR, and RL rotation. Insert is similar to delete.

R-tree is a kind of spatial index data structure. The following is a brief introduction:

(1) R-tree is an N-fork Tree, and N is called the fan of r-tree.

(2) Each node corresponds to a rectangle.

(3) The leaf node contains objects less than or equal to n, and its corresponding moment is the outsource rectangle of all objects.

(4) The rectangle of the non-leaf node is the outer rectangle of the rectangle of all the child nodes.

The definition of R-tree is very broad. When r-tree is constructed by the same set of data, different parties can obtain very different structures. What kind of structure is better? There are two criteria:

(1) The adjacent nodes at the position should be aggregated into a parent node in the tree as far as possible.

(2) The proportion of the intersecting part of each sibling node in the same layer should be as small as possible.

R tree is a data structure used to process multidimensional data, which is used to access spatial data composed of objects in two-dimensional or higher dimensional regions. R tree is a balanced tree. There are two types of nodes in a tree: leaf nodes and non-leaf nodes. Each node consists of several index entries. For leaf nodes, the Index is of the form (Index, Obj_ID). Where Index represents the smallest enclosing rectangle MBR and Obj_ID identifies a spatial data object. For a non-leaf node, its Index entries are of the form (Index, Child_Pointer). Child_Pointer points to a child of this node. Index still refers to a rectangular region enclosing the smallest rectangular region of all Index entries MBR on the child node.

2. R-tree spatial index algorithm

1. The history of R-tree – Minimum Enclosing Rectangle multidimensional indexing technology can be traced back to the mid 1970s. At that time, various indexing techniques such as Cell algorithm, quadtree and K-D tree were introduced, but their results were not satisfactory. Driven by the demand of GIS and CAD system for spatial index technology, Guttman proposed r-tree index structure in 1984 and published R-Tree: A Dynamic Index Structure for Spatial Query. R-tree is a highly balanced tree composed of intermediate nodes and page nodes, and the smallest enclosing rectangle of the actual data object is stored in the page node. An intermediate node is formed by aggregating the bounding rectangles of its lower-level nodes, containing all of these bounding rectangles. After that, people put forward different improvements for different spatial operations on this basis, and formed a prosperous index tree family, which is the popular spatial index at present.

On the basis of Guttman’s work, many varieties of R+ trees have been developed. Sellis et al proposed R+ trees. R+ trees are similar to R trees, but the main difference lies in that the space regions corresponding to the brothers in R+ trees do not overlap. So divided space to eliminate the overlap between the R tree for allowing nodes and the “dead zone” (a node does not contain the node data blank area), to reduce the number of invalid query, which greatly improve the efficiency of spatial index, but for the operation of the insert, delete, space object, is due to the operation to ensure the space area without overlapping and lower efficiency. At the same time, R+ tree is redundant for the storage of trans-regional spatial object data, and with the increase of data in the database, the redundant information will continue to grow. Greene also proposed variations of his R tree.

3 R-tree – Forced re-insertion In 1990, Beckman and Kriegel proposed the optimal dynamic r-tree variant — R-tree. R tree and R tree allow the overlap of rectangles, but in the construction algorithm R tree considers not only the “area” of the index space, but also the overlap of the index space. In this method, the algorithm of node insertion and splitting is improved, and the structure of tree is optimized by “forced re-insertion” method. However, r-tree algorithm still cannot effectively reduce the degree of spatial overlap, especially when the amount of data is large and the spatial dimension increases. R trees cannot handle dimensions higher than 20.

4 QR tree – quadtree – combined with the advantages of quadtree and R tree, is the comprehensive application of QR tree using quadtree space divided into some subspace, in the subspace using many R tree index, so as to improve the overlap of index space. QR tree combines the advantages of quadtree and R tree, is a comprehensive application of the two. Experimental results show that compared with R tree, QR tree has higher performance at slightly larger (sometimes even slightly smaller) space cost, and the more index targets, the better the overall performance of QR tree.

5 SS tree – replacing the minimum boundary moment with the minimum boundary circle SS tree improves the performance of the nearest neighbor query by the following measures: using the minimum boundary circle instead of the minimum boundary rectangle to represent the shape of the region, the performance of the nearest neighbor query is enhanced and the storage space is reduced by nearly half; SS tree improves the forced reinterpolation mechanism of R tree. When the dimension increases to 5, the overlap of boundary rectangles in R tree and its variants will reach 90%. Therefore, in the case of higher dimensions (≧5), its performance will become very poor, even worse than sequential scanning.

X tree is a hybrid of linear array and layered R tree. By introducing super nodes, the overlap between the minimum boundary rectangles is greatly reduced and the query efficiency is improved. The X tree is indexed by the boundary circle, the diameter (diagonal) of the boundary rectangle is larger than that of the boundary circle, and the SS tree divides points into small diameter regions. Because the region diameter has a great influence on the nearest neighbor query performance, the nearest neighbor query performance of SS tree is better than that of RTree; The average volume of the boundary rectangle is smaller than that of the boundary circle, RTrees divide points into small volume areas; Because large volume will produce more coverage, the boundary rectangle is better than the boundary circle in volume. SR tree adopts both minimum boundary circle (MBS) and minimum boundary rectangle (MBR). Compared with SS tree, SR tree reduces the area of regions and improves the separation between regions. Compared with R* tree, SR tree improves the performance of neighborhood query.

Summary: B tree: binary tree, each node only store a keyword, is equal to hit, less than go left node, greater than go right node;

B-tree: multi-path search tree, each node stores M/2 to M keywords, non-leaf nodes store children pointing to the keyword range; All the keywords appear only once in the whole tree, and non-leaf nodes can be hit.

B+ tree: On the basis of B-tree, a linked list pointer is added for leaf nodes, all keywords appear in leaf nodes, and non-leaf nodes serve as the index of leaf nodes. B+ trees always hit the leaves;

B* tree: on the basis of B+ tree, a sibling node linked list pointer is added for non-leaf nodes to increase the minimum utilization rate of nodes from 1/2 to 2/3.

B+/B*Tree Application: database index – The index file is separated from the data file, and the index file only holds the address of the data record. Database index – The table data file itself is an index structure organized according to B+Tree. The data field of the leaf node of this Tree holds the complete data record. The key of this index is the primary key of the table