Why does HashMap use arrays and linked lists to store key values? Why does HashMap use arrays and how do arrays work? Why does HashMap use linked lists and why does Java8 change linked lists to red-black trees? With that in mind, consider the following analysis:

First, the PUT operation

Start with the code, which will be parsed in detail below

    public V put(K key, V value) {
        return putVal(hash(key), key, value, false.true);
    }

      final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
        Node<K,V>[] tab; Node<K,V> p; int n, i;
        if ((tab = table) == null || (n = tab.length) == 0)
            n = (tab = resize()).length;
        if ((p = tab[i = (n - 1) & hash]) == null)
            tab[i] = newNode(hash, key, value, null);
        else {
            Node<K,V> e; K k;
            if(p.hash == hash && ((k = p.key) == key || (key ! =null && key.equals(k))))
                e = p;
            else if (p instanceof TreeNode)
                e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
            else {
                for (int binCount = 0; ; ++binCount) {
                    if ((e = p.next) == null) {
                        p.next = newNode(hash, key, value, null);
                        if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                            treeifyBin(tab, hash);
                        break;
                    }
                    if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k))))
                        break; p = e; }}if(e ! =null) { // existing mapping for key
                V oldValue = e.value;
                if(! onlyIfAbsent || oldValue ==null)
                    e.value = value;
                afterNodeAccess(e);
                return oldValue;
            }
        }
        ++modCount;
        if (++size > threshold)
            resize();
        afterNodeInsertion(evict);
        return null;
    }

  // resize();
Copy the code

If you don’t want to find the source code, go to the HashMap source code

The source code parsing

  1. It hashes the key once, and the calculated value is the index of the key in the Node array. Therefore, when performing the GET operation, it will use this index to find the corresponding key value. The time complexity is O(1). Let’s look at putVal in more detail.
  2. If the array is empty, resize the array. To recap, resize() initializes the Node array with a load factor of 0.75(the default) and a length of 16(the default) if it is empty, otherwise it doubles the size of the current array. To clarify, the array size is 16, but its threshold is only 16*0.75 = 12(capacity * load factor = threshold). As long as the array elements exceed the threshold, resize() is performed to double the size.
if ((tab = table) == null || (n = tab.length) == 0)
            n = (tab = resize()).length;
Copy the code
  1. Search as an index by bitwise summation of hash value and length -1, if there is no value for this position. Generates a new node to insert.
        if ((p = tab[i = (n - 1) & hash]) == null)
            tab[i] = newNode(hash, key, value, null);
Copy the code
  1. So let’s take a closer look at the logic in else, if this position had a value before
  2. The hash value and key value of this node are not equal to the previous value. If both are equal, then the key is the same and the node is directly returned to the previous node => p.
            if(p.hash == hash && ((k = p.key) == key || (key ! =null && key.equals(k))))
              e = p;
Copy the code
  1. If the keys are different, then there is a hash collision (a hash collision is a case where the keys are different but the hash value is the same), then I check to see if the current node is a tree (red-black tree), and if it is a tree I put the tree node
            else if (p instanceof TreeNode)
              e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
Copy the code
  1. If the key values are not equal and the nodes are not tree nodes, then it means that the linked list is stored now, and the linked list is looped. If the nodes encountered are empty, the nodes are inserted. If the number of inserted nodes exceeds 8, the linked list will be treed. If the same key value is encountered in the loop, the old node is returned.
          else {
              for (int binCount = 0; ; ++binCount) {
                  if ((e = p.next) == null) {
                      p.next = newNode(hash, key, value, null);
                      if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                          treeifyBin(tab, hash);
                      break;
                  }
                  if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k))))
                      break; p = e; }}Copy the code
  1. After a lot of twists and turns, we finally get the location of the node in the array, but at this time, we need to check whether the location already exists data, if there is the previous data back to you, if it is empty, it means that you can insert the data you want. We’ve got the location where we’re going to insert the data, and it may or may not be empty at this point.
            if(e ! =null) { // existing mapping for key
                V oldValue = e.value;
                if(! onlyIfAbsent || oldValue ==null)
                    e.value = value;
                afterNodeAccess(e);
                return oldValue;
            }
Copy the code
  1. At the end of the put operation, there is a modCount, which counts the number of times the data has been modified. Why this number is needed is briefly mentioned here, because a HashMap is not thread safe. If a HashMap uses an iterator but another thread has modified the HashMap, There will be a ConcurrentModificationException, this exception is thrown because, in the generated HashMap iterator will this revision number is assigned to an iterator, when other thread to modify a HashMap and can cause data inconsistencies, So using this number of changes is an alternative thread-safe, fail-fast strategy.
  2. Size (which refers to the size of the current HashMap) is incremented because the node is inserted into the array. If the threshold is exceeded, the array is doubled. AfterNodeInsertion (EVict) afterNodeInsertion(EVict) in HashMap afterNodeInsertion(EVict) Remove the object that was first put into the Map by calling back and forth. The PUT operation ends.
  ++modCount;
        if (++size > threshold)
            resize();
        afterNodeInsertion(evict);
        return null;
Copy the code

Whew, after all this talk, I finally got to get

I’ll explain it in code first

    public V get(Object key) {
        Node<K,V> e;
        return (e = getNode(hash(key), key)) == null ? null : e.value;
    }
      final Node<K,V> getNode(int hash, Object key) {
        Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
        if((tab = table) ! =null && (n = tab.length) > 0 &&
            (first = tab[(n - 1) & hash]) ! =null) {
            if (first.hash == hash && // always check first node((k = first.key) == key || (key ! =null && key.equals(k))))
                return first;
            if((e = first.next) ! =null) {
                if (first instanceof TreeNode)
                    return ((TreeNode<K,V>)first).getTreeNode(hash, key);
                do {
                    if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k))))
                        return e;
                } while((e = e.next) ! =null); }}return null;
    }
Copy the code
  1. Similarly, get is implemented by calling getNode, which hashes the key, passes in a function, and returns the value returned by getNode.
  2. The first if evaluates if the table is empty and returns a null value
    if((tab = table) ! =null && (n = tab.length) > 0 &&
            (first = tab[(n - 1) & hash]) ! =null) {
Copy the code
  1. If it is not empty, the hash value and key value of the node at the first position are the same. (Why to look at the first position first, because the probability of hash collision is very small. Usually, there is only one node in the linked list, so it is not necessary to loop through the whole list.
             if (first.hash == hash && // always check first node((k = first.key) == key || (key ! =null && key.equals(k))))
                return first;
Copy the code
  1. Of course, if you have more than one node in the list then you have to iterate over it, and if you have multiple hash collisions this will not get away. If the node is a tree node, use the tree node’s get method to fetch the number. This is where get ends.
             if((e = first.next) ! =null) {
                if (first instanceof TreeNode)
                    return ((TreeNode<K,V>)first).getTreeNode(hash, key);
                do {
                    if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k))))
                        return e;
                } while((e = e.next) ! =null);
            }
Copy the code

Finally, to answer the initial question why java8 changed to red-black tree storage when linked list data exceeded 8?

This involves refused to service attack, such as some people by finding the value of your hash collision, to make you a HashMap collision occurs unceasingly, the key position of the same list will grow, when you need to the corresponding position of the HashMap when queried, would be to loop through the super large list, performance and its underground. Java8 has improved query performance from O(n) to O(logn) by using red-black trees instead of lists with more than eight nodes.