Analysis and implementation of the PRINCIPLE and algorithm of B+Tree, details, graph + code, ο(=•ω < =)ρ⌒☆

An overview of the

B+Tree is a data structure and the main index used in the Innodb database engine in Mysql. In 2019, after implementing a red-black Tree from start to finish, I suddenly wanted to implement a B+Tree. In addition, I read a book “High-performance Mysql” in 2018, which has a great influence on my thinking of optimizing SQL. It explains optimization from the source, but I just picked the key points. I’ve always believed that in order to analyze something, you have to know what the person who created it thought. The primary method of SQL optimization is to use index lookups, so why is using indexes so efficient? Anyway, I implemented B+Tree by myself, just for this purpose.

The index type

The essence of an index is a data structure that can help a database search quickly, that is, reduce the time complexity of search.

The hash index

As the name implies, is a hash table structure index. To understand it, you just need to understand the structure and functionality of HashMap. ① A HashMap stores key-value pairs, that is, equivalent lookup. So the characteristic of hash index is suitable for equivalent search scenarios. ② When I implemented HashMap and used HashMap before, I knew that HashMap stores data in a non-sequential manner due to the hash algorithm, so HashMap cannot provide direct fetching of data by key. Then you can see that the drawback of hash indexes is that they can’t be scoped.

Binary tree index

As the name implies, is a binary tree structure index. So you need to understand binary trees as data structures. (1) The binary tree is characterized by the left node < parent node < right node, which can be searched in a range; (2) For a balanced binary tree, the time complexity of the search is equivalent to the half-search; (3) An extremely unbalanced binary tree will degenerate into a linked list, and the time complexity of the search is the largest; (4) In the case of more and more data, the depth of the binary tree is bound to become deeper and deeper, and the search will become more and more difficult.

The higher the index height is, the database can only compress and store the index in the disk. This means that the index needs to be extracted from the disk every time it is used, which increases the DISK I/O speed.

To sum up, binary tree indexes solve the problem that hash indexes cannot range lookup. The average equivalent lookup is much slower than a hash index, and even worse than a hash index in extreme cases.

BTree index

BTree is a multi-path balanced tree. The difference between BTree and ordinary binary tree is: ① a tree node can store multiple data and the difference between red black tree is: ① The balance of BTree is not achieved by rotating color, but by splitting down to achieve.

Compared with binary tree index: ① Because a tree node can store multiple data, effectively reduce the tree depth. ② Keep balance by splitting down to avoid degenerate into a linked list

In Mysql database, BTree index is used by MyISAM data engine. And, to save space, the MyISAM data engine compresses the values of the index. For example, if you have a string, the second strgda will be stored in the index as 3,gda, to save space.

B ⁺Index of the Tree

B+Tree is also a multi-path balanced Tree. The difference between BTree and B+Tree is that: (1) in structure, all leaf nodes of B+Tree are a linked list, while the leaf nodes of BTree are simple leaf nodes. (2) in data storage, the leaf node linked list of B+Tree stores all node data, sorted from smallest to largest

⁺

⁺

,

,

,
⁺

The implementation process

B⁺Basic Features of Tree

To implement a data structure, you first have to understand what it is, the basic characteristics of the structure, what properties it has. When the number of data is greater than or equal to N, the data is divided into three parts: left, middle, and right. These three parts become three tree nodes. The middle tree node becomes the parent node of the left and right parts. ③ All leaf nodes form an ordered linked list ④ All data are on the leaf node

General structure design

① A tree node is regarded as a class; ② There are multiple data element nodes in the tree node class, and the data element is regarded as a class; ③ There are Pointers to parent nodes in the tree node class. ④ There are Pointers to left and right subtrees in the data element node. My design is that within the same tree node, the first element node on the left may have tree nodes on both sides, while other element nodes only have sub-tree nodes on the right. The leaf node maintains a pointer to the front and back leaf nodes, thus forming a linked list

Define the class

(1) B+Tree is ordered, and the type of data stored is uncertain. 👉 so I needed a Comparable/ Comparable, and I chose Comparable for convenience. A generic is also required, but the specific type of the generic must implement the Comparable interface, i.e., the generic has an upper limit to facilitate comparison.

B+Tree = Root; B+Tree = Root; The head node, the entry to a linked list.

③B+Tree requires some inner classes. Tree node inner class, tree node inside a single data class.

(4) BTree splits upwards when the number of data on a tree node is greater than or equal to N. So we need a global variable N.

public class BTree <T extends Comparable<T>> {
    private int line=3;// Number of data per node
	private Node root;
	private Node head;// the leaf node chain header
	
	/** * tree node *@author Administrator
	 *
	 */
	private class Node<T>{
		// Maintain a list of leaf nodes for sequential use
		private Node pre;
		/ / the parent node
		private Node parnet;
		// A collection of element nodes in a tree node
		private List<Element<T>> eles;
		// Maintain a list of leaf nodes for sequential use
		private Node next;
		public Node(a){
			eles=new LinkedList<>();
		}
		public Node(Element ele){
			this(a); eles.add(ele); }public Node(List<Element<T>> eles){
			this.eles=new LinkedList<>();
			this.eles.addAll(eles);
		}
		@Override
		public String toString(a) {
			if(eles==null) {return null;
			}
			StringBuilder sb=new StringBuilder();
			for(Element e:eles){
				sb.append(e.data).append(",");
			}
			return "Node [eles=" + sb.toString() + "]"; }}/** * single element node *@param <T>
	 */
	private class Element<T>{
		private T data;
		// The left subtree node of the element node
		private Node left;
		// The right subtree node of the element node
		private Node right;
		public Element(T data){
			this.data=data;
		}
		@Override
		public String toString(a) {
			return "Element [data=" + data + ", left=" + left + ", right=" + right + "]"; }}}Copy the code

A constructor

There must be a parameter to pass in, several paths of balanced tree, that is, a tree node has several elements. It’s about split equilibrium.

 private int line=3;// Each tree node can contain the maximum number of element nodes
 public BTree(int line){
		this.line=line;
}
Copy the code

Add data

Visualization process

Take four-way balanced tree as an example. 1. First, when the number of data is less than four, there must be a root node

3. Split into three parts, 4/2=2, with the middle subscript as the boundary. So [67,77], [87], [87,97]. The question arises, why is the third part [87,97], because all the data has to be found in the leaf node.

Code level implementation

B+Tree Add data: ① Start from the root node and compare all data elements in the root node. Unable to find a suitable location, proceed to the next node. ② When a tree node is located, locate the insertion position in the node and insert data into the tree node. ③ Check whether the node in the tree is greater than or equal to N. A. If yes, split the tree node into three nodes to form a child tree. B. If no, insert it directly.

For the split equilibrium of B+Tree, see below.

Add (T T) method

public boolean add(T t){
        // If root is null, there is no node in the tree
		if(root==null){
			root=new Node(new Element<T>(t));
			head=root;
			return true;
		}
		BTree<T>.Element<T> ele = new Element<>(t);
		Node<T> currentNode=root;
		A:while(true) {// Get the list of data currently stored by the node in the access tree
			List<Element<T>> eles = currentNode.eles;
			/** * Find the element ele to be added, the * subscript */ of the first element node greater than ele in currentNode data element of current tree node
			int i=findSetPositInEles(ele, currentNode);
			
			Node nextNode=null;
			
			/** * I >0, indicating that the element node a at I is just larger than the node B to be inserted, so b should be found on the left of A. * B+Tree * B+Tree * B+Tree * B+Tree * B+Tree * B+Tree So the left of a is the right subtree of the node in front of a */
			if(i>0){
				nextNode = eles.get(i-1).right;
			}else{// I <=0, indicating that the node to be inserted is smaller than all the element nodes of the tree node
				// Go down to the left of the node
				nextNode = eles.get(0).left;
			}
			if(nextNode==null){
				eles.add(i, ele);
				// Determine if splitting is required
				if(eles.size()==line){
				    // Split balancing method
					spiltNode(currentNode);
				}
				break A;
			}
			currentNode=nextNode;
		}
		return true;
	}
Copy the code

findSetPositInEles

/** * find the location where the new ele should be inserted in the elements collection@param ele
	 * @param node
	 * @return* /
	public int findSetPositInEles(Element<T> ele,Node node){
		List<Element<T>> eles = node.eles;
		int i=0;
		for(; i<eles.size(); i++){// When the first element node greater than or equal to ele is found, return the corresponding index
			if(ele.data.compareTo(eles.get(i).data)<=0) {returni; }}return i;
	}
Copy the code

SpiltNode (Node Node) Split balance

Node splitting process (for splitting a tree node) : 1. Calculate the middle subscript mid 2 according to the number of element nodes len. Determine whether leaf node 2.1 is a leaf node and divide it into three nodes: [0,mid-1]→lnode, [mid]→mnode, [mid,len-1]→rnode 2.2 Non-leaf node, divide it into three nodes: [0, mid – 1] – > lnode, mid – mnode, [len – mid – 1, 1] – > rnode. 3. If a node has a parent pnode, the Mnode is integrated into the Pnode. If a node has no parent, mnode becomes a parent pnode 4. Lnode and rnode become child pNode 5. If node is originally a leaf node, you need to maintain the linked list formed by leaf nodes, and replace node with LNode → Rnode 6. Check whether a PNode needs to be split. If yes, go to Step 1

    /** * split balance */
	@SuppressWarnings("unchecked")
	private void spiltNode(BTree<T>.Node<T> node){
		int mid=line/2;
		A:while(true) {// Is there a leaf node
			booleanisLeafNode=node.next! =null||node.pre! =null||head==node;
			
			List<Element<T>> eles = node.eles;
			List<BTree<T>.Element<T>> leftEles = eles.subList(0, mid);
			// The leaf node is split into two and generates an element node with the same value as the middle element node, which is pushed up into the parent node; Non-leaf nodes are split into three, and the middle element node is pushed up into the parent nodeList<BTree<T>.Element<T>> rightEles = eles.subList(isLeafNode? mid:mid+1,eles.size());

			// Split the left node
			BTree<T>.Node<T> leftNode=new Node<T>(leftEles);
			// Split the right node
			BTree<T>.Node<T> rightNode=new Node<T>(rightEles);
			// The middle element node of the tree node
			BTree<T>.Element<T> spiltEle = eles.get(mid);
			BTree<T>.Node<T> parnet = node.parnet;
			/** * Updates the leaf node list 

 *  This list is a bidirectional list */
			// If the original split node is a leaf node, it needs to be updated
			if(isLeafNode){
				// Update the list of leaf nodes
				if(node.pre! =null){
					node.pre.next=leftNode;
				}else{
					head=leftNode;
				}
				if(node.next! =null){
					node.next.pre=rightNode;
				}
				leftNode.pre=node.pre;
				leftNode.next=rightNode;
				rightNode.pre=leftNode;
				rightNode.next=node.next;
				
				// After the leaf node is split, it contains the split element, even if the value is the same, but the object is different
// rightEles.set(0, new Element
      
       (spiltEle.data)); The trouble is that rightNode doesn't change rightNode because the eles in rightNode is not pointing to the same set in memory as rightEles
      
				rightNode.eles.set(0.new Element<T>(spiltEle.data));
			}
			/** * split */
			
			// Maintain the relationship at the next level
			// Non-leaf nodes. The right node of the split element is inherited by the remaining right half
			if(! isLeafNode){ rightNode.eles.get(0).left=spiltEle.right;
				// For the children of the original node, their parent object is also changed after splitting
				rightNode.eles.get(0).left.parnet=rightNode;
				
				List<Element<T>> foreachList=leftNode.eles;
				boolean f=false;
				int i=0;
				B:while(true){
					Element<T> temp=foreachList.get(i);
					// The first element is special and can be left or right
					if(i==0){ temp.left.parnet=! f? leftNode:rightNode;if(temp.right! =null){ temp.right.parnet=! f? leftNode:rightNode; }}else{ temp.right.parnet=! f? leftNode:rightNode; } i++;if(! f&&i==foreachList.size()){ foreachList=rightNode.eles; i=0; f=true;
					}else if(f&&i==foreachList.size()){
						breakB; }}}// Maintain the relationship at the next level
			if(parnet==null) {// Split the root node
				spiltEle.left=leftNode;
				spiltEle.right=rightNode;
				Node<T> newNode=new Node<>(spiltEle);
				leftNode.parnet=newNode;
				rightNode.parnet=newNode;
				root=newNode;
				
				return ;
			}else{
				// Inserts spiltEle into the parent node
				List<Element<T>> pEles = parnet.eles;
				int positInEles = findSetPositInEles(spiltEle, parnet);
				pEles.add(positInEles, spiltEle);
				if(positInEles==0){
					pEles.get(0).left=leftNode;
					// Empties the left node of the original 0-bit element
					pEles.get(1).left=null;
				}else{
					pEles.get(positInEles-1).right=leftNode;
				}
				pEles.get(positInEles).right=rightNode;
				leftNode.parnet=parnet;
				rightNode.parnet=parnet;
				
				// Continue to determine whether parent needs to be split
				if(pEles.size()>=line){
					node=parnet;
				}else{
					return; }}}}Copy the code