JDK8 HashMap source code analysis

A, introducing

This article is to analyze JDK8 in the source code of HashMap, JDK8, compared to JDK7, JDK8 added a kind of data structure - red black tree, I believe everyone in the university during the study of data structure will certainly feel familiar with the data structure, red black tree itself is very complex balance tree, This article will not be a very careful analysis of the red-black tree, because of course this is not a single article can solve, but I will link to the relevant information synchronized to youCopy the code

Understand HashMap as a whole

HashMap in JDK8: HashMap in JDK8: HashMap in JDK8public class HashMap<K.V> {
  Node<K,V>[] table;
  int size;
  int modCount;
  int threshold;
  floatloadFactor; } in the same way as in JDK7, a HashMap is still a table array that stores elements. However, JDK8 uses the Node class to represent an element information. Size is still the number of elements in the HashMap, modCount is still the number of changes, threshold is the threshold of the number of elements that need to be reached when the HashMap is expanded, loadFactor loadFactor, The product of table.length and the load factor to obtain the threshold expansion threshold is exactly the same as in the JDK, except that there is no definition of the hashSeed parameter, which I did not specify in the source code of JDK7. Since this parameter has been deprecated in JDK8, I think one of the main reasons is that it is used flexibly in JDK7 to reduce hash collisions (although in most cases we don't know how or how it is used), However, JDK8's changes in the storage structure (from linked list to linked list + red-black tree) guarantee the performance of HashMap even in the case of hash collisions. The element in the linked list is Node type. When the number of elements in a linked list reaches a specified threshold, it will be upgraded to a red-black tree. This operation is called tree-ization, as can be seen from the second table in the figure. A Node array can have both a red-black tree and a linked list ======= Let's look at the definition of Nodestatic class Node<K.V> implements Map.Entry<K.V> {
    final int hash;
    final K key;
    V value;
    Node<K,V> next;
}

static class Entry<K.V> extends HashMap.Node<K.V> {
    Entry<K,V> before, after;
    Entry(int hash, K key, V value, Node<K,V> next) {
        super(hash, key, value, next); }}static final class TreeNode<K.V> extends LinkedHashMap.Entry<K.V> {
        TreeNode<K,V> parent;  // red-black tree links
        TreeNode<K,V> left;
        TreeNode<K,V> right;
        TreeNode<K,V> prev;    // needed to unlink next upon deletion
        booleanred; } First, nodes are defined exactly like entries in JDK7. Next is used to form a LinkedHashMap. The most important thing is that TreeNode is inherited from linkedhashmap. Entry. TreeNode is the definition of a red-black TreeNode. If you look at the TreeNode node carefully, all the attributes except prev are required by a red-black TreeNode. For the HashMap add logic, get a hashCode hash from the key in the key-value that needs to be added, and then use that hash to match table.length -1Add key-value to index X. If there is no value at this location, create a Node and put it there. If there is a value at this location, check whether it is a TreeNode. If you know something about red black trees, you can easily imagine that for a data structure with a red black tree index position, the first element in the index position is the root node of the red black tree, The add, delete, modify, and query operations should be defined in TreeNodeCopy the code

Talk about red black trees

3.1 Comparison method of elements in red-black tree

We know that to add element X to a binary tree, we need to start at root, and we need to compare element X to root. If X is smaller than root, we add element X to the left subtree, and vice versa, so it's important to say, The elements added to the red-black tree need to be comparable. In general, we define an object that uses hashCode and equals methods to locate the index position when operating on a HashMap. Equals equals equals equals equals equals equals equals equals equals equals equals equals equals equals equals equals equals equals In a HashMap, we need to understand how a red-black tree compares elements that are not comparable. Let's say we add element X: <1> The first step is to compare the elements using hashCode. If hashCode can compare them, use the comparison result. If not, use 2 <2> then hashCode must be equal. If equals is not equal, then you have to keep looking for it. You don't know if you should look left or right. If the Comparable interface is implemented for element X, then use the Comparable interface. If the Comparable interface is still equal, then use <4>. If the Comparable interface is not implemented, then use <4>. If the Comparable interface returns equality, we cannot consider X to be equal to the current element because we can only determine equality through equals, and in step 2, we already determined that it is not equal. The Comparable interface is only used to determine if we should go to the left or right subtree to find the element <4> At this point, there is no other way to do it. If we were doing a GET operation, we would have to scan both the left and right subtrees to find the element. If we do put, we need to put element X into the leaf node, so we still need to determine the size. This time, we go <5> <5> to determine the size of the element X and the current element's class.name. If the Class name is the same, <6> <6> Use the memory address to determine the size of two elements!! So to summarize, the order in which the red-black tree determines the size of two elements in a HashMap looks like this: * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * This operation is only seen in the add operation, the PUT method. In the next section, we will analyze red-black tree inserts firstCopy the code

3.2 Insertion operation of red-black tree

final TreeNode<K,V> putTreeVal(HashMap<K,V> map, Node<K,V>[] tab, int h, K k, V v) { Class<? > kc =null;
    boolean searched = false; TreeNode<K,V> root = (parent ! =null)? root() :this;
    for(TreeNode<K,V> p = root;;) {... }} Let's assume that the element we need to add is X, map represents the corresponding HashMap object, TAB is the array of elements in the map, k is the key, v is the value, k and v form an element X, that is, the key-value pair to be added to the HashMap, H is the hash value of k(key), and kc is short for keyClass, because we may later compare the class.name of key to find out whether a subtree of the red-black tree has been searched, To prevent multiple lookups, here's why: We need to check the subtree of the current element to see if X exists. If X exists, we need to check the subtree of the current element to see if X exists. Class. Name -> element X and the key of the current element to determine the size, and determine whether to add to the left subtree or to the right subtree. Let's say we searched for the left subtree, but there may be many more elements in the left subtree, that is, we have not yet reached the leaf node, and then we have to compare the left subtree element to X, so the above operation may be repeated. With the release, I'm going to be able to reduce the subtree query, because I did the query before, and I made sure that the element doesn't exist in the red-black tree and after I explained the traversal above,forThe loop is scanning the red-black tree, in a non-recursive way, so let's look at thisforFirst p is the current node, which must be the root node in the red-black tree at the beginning:for (TreeNode<K,V> p = root;;) {
    int dir, ph; K pk;
    if ((ph = p.hash) > h)
        dir = -1;
    else if (ph < h)
        dir = 1;
    else if((pk = p.key) == k || (k ! =null && k.equals(pk)))
        return p;
    else if ((kc == null &&
              (kc = comparableClassFor(k)) == null) ||
              (dir = compareComparables(kc, k, pk)) == 0) {
        if(! searched) { TreeNode<K,V> q, ch; searched =true;
            if(((ch = p.left) ! =null&& (q = ch.find(h, k, kc)) ! =null) || ((ch = p.right) ! =null&& (q = ch.find(h, k, kc)) ! =null))
                returnq; } dir = tieBreakOrder(k, pk); }... } dir is the result of comparing element X with the currently traversed element, if1If is -, add to the right1Is added to the left if0Ph is the hash value of the current element p, pk is the key of the current element P, i.e. Pk and k(x. key) are to be compared.1) -> equals(2) -> Comparable interface (3) -> class.name(4) -> Memory address (5) If ph > h, if the key of the current element is greater than the key of element X, then element X should be added to the left subtree of the current element, i.e. dir is -1, otherwise,1, these operations correspond to the above (1), if pH = h, it means to go (2), you can see, if equals equal, then directly back to p, because the red-black tree already exists in the element, if you don't have the results, Compare k with pk using compareComparables. If k is Comparable, compare k with pk using compareComparables. So we start looking for elements in the left and right subtrees of the element P, i.eif(! Searchable){} If found, it was returned, or if not, it was searched using the tieBreakOrder() method.4) and (5) to judge, so far, must be able to compare pk and K who is large and who is small, so far, we are on the red-black tree comparison elements of the code analysis, after comparing the size, will start to go behind the code, that is:for(TreeNode<K,V> p = root;;) {... TreeNode<K,V> xp = p;if ((p = (dir <= 0)? p.left : p.right) ==null) {
        Node<K,V> xpn = xp.next;
        TreeNode<K,V> x = map.newTreeNode(h, k, v, xpn);
        if (dir <= 0)
            xp.left = x;
        else
            xp.right = x;
        xp.next = x;
        x.parent = x.prev = xp;
        if(xpn ! =null)
            ((TreeNode<K,V>)xpn).prev = x;
        moveRootToFront(tab, balanceInsertion(root, x));
        return null; }} Analysis: if dir is less than or equal to0, p is updated to become the left child node of P, otherwise it is updated to become the right child node, if p is alreadynull", it indicates that the leaf node has been found. At this point, the add operation should be started. Otherwise, the add operation will continueforCreate a new node using newTreeNode, add the new node to the left or right node according to dir, update next, parent, prev, etc. The prev field and moveRootToFront will be analyzed later. By now, the new element X has been added to the red black tree, but it is possible that the whole tree does not meet the five properties of the red black tree, so we need to perform certain operations on the red black tree. The tree satisfies the red-black tree properties through the balanceInsertion() method. If you're not familiar with red-black trees, or want to know more about red-black trees, I'm going to post it on Github.//github.com/fightzhong/collection-source-code-analyzeAfter entering the lot, you can see the readme in the description, in the bottom, you can see the directory, follow the directory to see before I wrote the article, believe that will have a great help to understand the red-black tree, here the way, I above is through the red and black tree and fourth order B equivalent to understand, fourth-order tree, also known as B2-3-4Tree, there are also articles on the Internet through the third order B tree to carry out equivalent, namely2-3Tree, however, the fourth order B tree is actually more perfect a point, at the same time, the deletion operation of the red black tree is also analyzed, the deletion operation is the most complex in the red black tree, even if it is on the basis of understanding to write out is also a certain difficulty!! , and then I use red black tree to achieve a HashMap, you can also have a look, so far, we have the correlation analysis of red black tree has ended, with these basic cases, we will look at JDK8 HashMap source code is very easy ！！！！Copy the code

4, HashMap put method source analysis

4.1 Overall analysis of PUT method

final V putVal(int hash, K key, V value, boolean onlyIfAbsent, boolean evict) {
    Node<K,V>[] tab; Node<K,V> p; int n, i;
    if ((tab = table) == null || (n = tab.length) == 0)
        n = (tab = resize()).length;
    if ((p = tab[i = (n - 1) & hash]) == null)
        tab[i] = newNode(hash, key, value, null);
    else{... } ++modCount;if (++size > threshold)
        resize();
    afterNodeInsertion(evict);
    return null; Hash is the hash value of the key. OnlyIfAbsent is the last element added to the HashMap. OnlyIfAbsent is the last element added to the HashMap. The evICT field is not used in HashMap. The afterNodeInsertion method is an empty method and will be used in LinkedHashMapifIf table is empty, that is, the array storing data in HashMap has not been initialized, then the resize method is called. This method completes the functions of array initialization and expansion at the same time. Finally, TAB points to the array storing data in HashMap, and n is the second length of the arrayifTo determine, use hash & (n -1) to calculate which index position the key-value should be placed in. We will later call the key-value pair to be added element X, if the silver acid position isnullCreate a Node and place it in the index. If there is a Node in the index, then goelseWe will analyze this separately later. We need to note that after this determination, p already refers to the first element of the corresponding index. After the above steps, the element has been added to the HashMap, which causes modCount to be added1, the size also1, if the size has reached the threshold after the increase, then call the resize method to expand the capacity. Next, let's look at the remaining value aboveelseBranch code: Node<K,V> e; K k;if(p.hash == hash && ((k = p.key) == key || (key ! =null && key.equals(k))))
    e = p;
else if (p instanceof TreeNode)
    e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
else{... }if(e ! =null) { // existing mapping for key
    V oldValue = e.value;
    if(! onlyIfAbsent || oldValue ==null)
        e.value = value;
    afterNodeAccess(e);
    returnoldValue; If the hash of p is the same as the hash of the key of element X and equals equals, then the element already exists in the HashMapifJudge the last oneifIf X already exists in the HashMap when the element X is added, then onlyIfAbsent is used to determine whether the value needs to be replaced. AfterNodeAccess is an empty method. In this case, the priority is to determine whether the node p is a TreeNode. If it is a TreeNode, the red-black tree add node operation is called. We have analyzed this in detail beforeelseBlock:else {
    for (int binCount = 0; ; ++binCount) {
        if ((e = p.next) == null) {
            p.next = newNode(hash, key, value, null);
            if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                treeifyBin(tab, hash);
            break;
        }
        if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k))))
            break; p = e; }} This is a list add operation, except that there is a binCount. This field is used to indicate the length of the list before the element X is addedif ((e = p.next) == null) means that the end of the list is traversed, a new node is created, and p.ext points to it, but binCount is still not added1Operation, since binCount from0Start so when binCount is zero7Is in the linked list8When binCount >=7The treeifyBin method is executed when the number of elements in the list is greater than8Plus the element X9The treeifyBin method is executed. This method is called treification, which converts a list to a red-black tree. Let's look at this methodCopy the code

4.2 treeifyBin tree

final void treeifyBin(Node<K,V>[] tab, int hash) {
    int n, index; Node<K,V> e;
    if (tab == null || (n = tab.length) < MIN_TREEIFY_CAPACITY)
        resize();
    else if ((e = tab[index = (n - 1) & hash]) ! =null) {
        TreeNode<K,V> hd = null, tl = null;
        do {
            TreeNode<K,V> p = replacementTreeNode(e, null);
            if (tl == null)
                hd = p;
            else {
                p.prev = tl;
                tl.next = p;
            }
            tl = p;
        } while((e = e.next) ! =null);
        if((tab[index] = hd) ! =null) hd.treeify(tab); }} here we add a judgment if the array length is less than MIN_TREEIFY_CAPACITY(64), the tree will not be implemented, but the expansion operation will be performed, so we draw a conclusion that when the length of a linked list is greater than8And the number of elements in the HashMap is greater than or equal to64When the tree condition is met, the tree will be executed. Otherwise, the tree will be processed by expansionelse, e is assigned to the first node in the list and then passes through ado.whileStatement, convert all nodes of the linked list to TreeNode nodes, and complete the prev field, thus changing a one-way linked list to a two-way linked list. The prev field will be used later, and then analyzed, and finally called Treeify. The argument is TAB, while treeify is the TreeNode method, and HD represents the first element of the entire bidirectional listfinal void treeify(Node<K,V>[] tab) {
    TreeNode<K,V> root = null;
    for (TreeNode<K,V> x = this, next; x ! =null; x = next) {
        next = (TreeNode<K,V>)x.next;
        x.left = x.right = null;
        if (root == null) {
            x.parent = null;
            x.red = false;
            root = x;
        }
        else{... } } moveRootToFront(tab, root); } The first thing we can see is that the list we need to operate on is already a bidirectional list. The above code uses the root variable to reconstruct a red-black tree. When root is empty, it points to the first element in the listelseStatement block,elseI'm not going to go into the details of the statement block, but it's basically the same thing as the red-black tree addition operation that we did at the beginningforThis is done by moveRootToFront. The root node starts pointing to the first node in the bidirectional list, but as we add elements to the red black tree, The root node is not necessarily the first node in the bidirectional list, so we need to replace the root node with the corresponding index element. In fact, it is very simple, we need to understand that the root node is currently a red black tree, but at the same time, In other words, we're maintaining both a bidirectional list and a red-black tree. We need to take root out of the bidirectional list and put it at the head of the bidirectional list. The data under the corresponding index can be accessed in the form of red-black tree by using the left and right attributes, and in the form of bidirectional linked list by using the next and prev attributesCopy the code

4.3,

JDK8HashMap () {get (); remove () {remove (); There is also a delete operation from the red black tree is also ok, difficult is deleted after the maintenance of the nature of the red black tree section finally there is a resize expansion method, the method is not analyzed, roughly explain the expansion logic: Expansion is doubling the size of the array, and we know that the size of the array is a power of two, so let's say the original capacity was 8, 0000, 1000, and the capacity is 0001 0000, and the position 1 in this new capacity I'm going to call position X, because it's an and operation, When the hashCode of the key of the element in the linked list/red-black tree is 0 in bit X, then the hashCode and the operation with the new capacity will still be the original index position. When bit x is 1, the new index position is equal to the old capacity + the old index position, which is the core of resize method. This completes the migration operation. If a red-black tree splits into two parts, each of those parts, if less than the threshold, degenerates into a linked listCopy the code

A, introducing

Understand HashMap as a whole

Talk about red black trees

3.1 Comparison method of elements in red-black tree

3.2 Insertion operation of red-black tree

4, HashMap put method source analysis

4.1 Overall analysis of PUT method

4.2 treeifyBin tree

4.3,

Related Posts

Digest | Go RESTful API benchmark

Dry! Learn about the retry mechanism

Use pandas, Scipy, and SciKit-learn in PyODPS DataFrame custom functions