Based on JDK1.8 HashMap source analysis

Introduction: As a container for storing data, the importance of HashMap is self-evident. This article will analyze the source code of HashMap by focusing on its data structure, constructors, put, get, resize and other methods. To read the hardest code in the simplest way possible.

One: The data structure of HashMap

1.1: Data structure underlying the HashMap

In JDK1.7, HashMap is implemented based on arrays and linked lists. However, with the increase of data storage, there are two major problems. One is that the probability of hash collisions is increasing, resulting in longer and longer linked lists. Second, as the length of linked list increases, data query becomes slower. Therefore, JDK1.8 adds red black tree on the basis of JDK1.7. Under certain conditions, the underlying data store will be converted from single linked list to red black tree to improve query efficiency. The basic structure of HashMap in JDK8 is shown as follows:As you can see from the figure above, a HashMap consists of an array with a one-way list and a red-black tree structure. The parts 0-15 in the figure are hash arrays, in which the elements are a single necklace list. In the figure, the elements in the array are 16 bits long, and the elements in each array represent the heads of each linked list. The procedure for storing data in a HashMap is as follows: First, the hash value is computed using the hashCode of the key, and then the element is found in the array to store. If different keys obtain the same hash value, then a hash collision occurs, and the data is stored in the same linked list. Therefore, red-black tree structure is introduced in JDK1.8. However, there are certain preconditions for using red-black tree structure. We can continue to study with this problem, and I will make a summary later.

Through the above we can know about the storage process, the specific implementation of the internal is also simple to say, the follow-up in the source code will be detailed. Once the hashCode has been evaluated by key, it stores the elements in the linked list, which are of type Node<K, V>. There is a static inner class in the HashMap:

static class Node<K.V> implements Map.Entry<K.V> {
  	/ / hash value
    final int hash;
    final K key;
    V value;
  	// points to the next element
    Node<K,V> next;
}

Copy the code

In fact, the HashMap is also implemented as an array, but it stores data into the hash table through Node<K,V>, and the storage location is through

int index = HashCode(key) % Array.length
Copy the code

This formula determines that.

Two: the concrete implementation of HashMap

2.1: Detailed introduction to member attributes in HashMap

 		// The default initial capacity is 16 and the array must be 2 to the NTH power
    static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16

		// Set the maximum size of the array to 2 to the 30th power
    static final int MAXIMUM_CAPACITY = 1 << 30;

  	// Load factor
    static final float DEFAULT_LOAD_FACTOR = 0.75 f;

  	// The base threshold at which lists can be converted to red-black trees (list length >=8)
    static final int TREEIFY_THRESHOLD = 8;

  	// If the number of nodes in a red-black tree is less than 6, the hash table is degraded to a linked list
    static final int UNTREEIFY_THRESHOLD = 6;

    // The hash table length must be greater than or equal to 64 before the list is converted to a red-black tree
    static final int MIN_TREEIFY_CAPACITY = 64;

    // An array of Node
      
        elements, such as the vertical array of HashMap above, must have a length of 2 to the NTH power
      ,v>
    transient Node<K,V>[] table;

    // Set of Node
      
       ,HashMap converts data to another storage form of Set. This variable is mainly used for iteration function
      ,>
    transient Set<Map.Entry<K,V>> entrySet;

    // The number of nodes 
      
        that have been stored, both in arrays and linked lists
      ,value>
    transient int size;

    // The number of internal map elements modified
    transient int modCount;

    // The threshold for capacity expansion, or the maximum number of key-value pairs that can be accommodated. When size>=threshold, resize() will be expanded.
    int threshold;

    // Customize the load factor
    final float loadFactor;

Copy the code

DEFAULT_LOAD_FACTOR = 0.75f is the default load factor, which is the basis for expanding the hash table. When the number of elements stored in the HashMap is greater than or equal to the existing capacity of the hash table multiplied by this load factor, the expansion mechanism is triggered and the expansion is doubled. For example, when we create a HashMap, we do not specify the size of the hash table. The default value is 16. When we call the PUT method of the HashMap object to store data, when the number of stored data reaches 16*0.75f=12, the expansion mechanism will be triggered and the hash table capacity will be expanded to 32.
TREEIFY_THRESHOLD = 8 and MIN_TREEIFY_CAPACITY = 64 constitute the conditions for converting a linked list to a red-black tree. If the hash table length is greater than or equal to 64 and the linked list length reaches 8, the list is converted to a red-black tree. Otherwise, the capacity expansion mechanism is used.
UNTREEIFY_THRESHOLD = 6 Indicates the threshold for a red-black tree to degenerate into a linked list. When the number of red-black tree nodes is less than or equal to 6, the red-black tree will be converted into a common linked list.
ModCount This property records the number of times k-V pairs are added/deleted or internal structure changes are made in the map. Its main role is to Map the traversal operation do the consistency check, if in the process of the iterator operations, the value of the Map is modified, throw ConcurrentModificationException directly.

2.2: HashMap linked list Node< K, V >

   static class Node<K.V> implements Map.Entry<K.V> {
        final int hash;
        final K key;
        V value;
        Node<K,V> next;

        Node(int hash, K key, V value, Node<K,V> next) {
            this.hash = hash;
            this.key = key;
            this.value = value;
            this.next = next;
        }

        public final K getKey(a)        { return key; }
        public final V getValue(a)      { return value; }
        public final String toString(a) { return key + "=" + value; }
				 /** * The hash value of each node is the * ^: xor of the key hashCode and the hashCode of the value. If the two operands are the same, the result is 0; if they are different, the result is 1 */
        public final int hashCode(a) {
            return Objects.hashCode(key) ^ Objects.hashCode(value);
        }

        public final V setValue(V newValue) {
            V oldValue = value;
            value = newValue;
            return oldValue;
        }

        public final boolean equals(Object o) {
            if (o == this)
                return true;
            if (o instanceofMap.Entry) { Map.Entry<? ,? > e = (Map.Entry<? ,? >)o;if (Objects.equals(key, e.getKey()) &&
                    Objects.equals(value, e.getValue()))
                    return true;
            }
            return false; }}Copy the code

2.3: Constructor of HashMap

There are four constructors of HashMap in JDK1.8:

// Method 1:
// Initialize the capacity and load factor
public HashMap(int initialCapacity, float loadFactor) {
        if (initialCapacity < 0)
            throw new IllegalArgumentException("Illegal initial capacity: " +
                                               initialCapacity);
  			// If the initial capacity is greater than the specified maximum capacity, then the maximum capacity is assigned to the initial capacity
        if (initialCapacity > MAXIMUM_CAPACITY)
            initialCapacity = MAXIMUM_CAPACITY;
        if (loadFactor <= 0 || Float.isNaN(loadFactor))
            throw new IllegalArgumentException("Illegal load factor: " +
                                               loadFactor);
        this.loadFactor = loadFactor;
  
      //tableSizeFor() method, in order to calculate the value closest to initialCapacity and greater than initialCapacity to the NTH power of 2
  		// If the initialization passes in 20, since 20 is between the 4th and 5th powers of 2, the NTH power of 2 closest to 20 and greater than 20 is 32
        this.threshold = tableSizeFor(initialCapacity);
    }

// Method 2:
 public HashMap(int initialCapacity) {
        this(initialCapacity, DEFAULT_LOAD_FACTOR);
    }

// Method 3:
public HashMap(a) {
        this.loadFactor = DEFAULT_LOAD_FACTOR; // all other fields defaulted
    }


// Method 4: Pass in a parameter of type Map and add elements from the parameter Map to the HashMap table
public HashMap(Map<? extends K, ? extends V> m) {
        this.loadFactor = DEFAULT_LOAD_FACTOR;
        putMapEntries(m, false);
    }

final void putMapEntries(Map<? extends K, ? extends V> m, boolean evict) {
  			// Get map size,
        int s = m.size();
        if (s > 0) {
          	// The current hash table is empty
            if (table == null) { // pre-size
              // Calculate the optimal expansion threshold based on the number of map elements and the loading factor of the current table
              // The lesson of this place is that we can use this formula when we set our own initial capacity
                float ft = ((float)s / loadFactor) + 1.0 F;
              	// Let the specified threshold not exceed the maximum value
                int t = ((ft < (float)MAXIMUM_CAPACITY) ?
                         (int)ft : MAXIMUM_CAPACITY);
             		// If the new threshold is greater than the old threshold, then enable the threshold that returns the nearest power of 2
                if (t > threshold)
                    threshold = tableSizeFor(t);
            }
            // If the element table is not empty and the size of the new map is greater than the old threshold, the expansion mechanism is enabled
            else if (s > threshold)
                resize();
            // Traverse the map to add elements to the current table
            for (Map.Entry<? extends K, ? extends V> e : m.entrySet()) {
                K key = e.getKey();
                V value = e.getValue();
                putVal(hash(key), key, value, false, evict); }}}// Capacity expansion mechanism
 final Node<K,V>[] resize() {
   			// Represents the current hash table (bucket)
        Node<K,V>[] oldTab = table;
   			// The length of the current hash table
        int oldCap = (oldTab == null)?0 : oldTab.length;
   			// Current capacity expansion threshold
        int oldThr = threshold;
   			// Define the new hash table length and expansion threshold
        int newCap, newThr = 0;
   			// If the length of the current hash table is greater than 0
        if (oldCap > 0) {
          	// If the length of the current hash table is greater than or equal to the maximum capacity, the expansion threshold is set to the maximum value of Integer
            if (oldCap >= MAXIMUM_CAPACITY) {
                threshold = Integer.MAX_VALUE;
                return oldTab;
            }
            // If the length of the current hash table is not the maximum, twice the length of the current hash table is smaller than the maximum capacity and greater than or equal to the default initial length
          	// Then the new hash is twice as long as the old one
            else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                     oldCap >= DEFAULT_INITIAL_CAPACITY)
                newThr = oldThr << 1; // double threshold
        }
        // If the current hash table is empty and has an initialization threshold, the length of the new table is equal to the threshold of the old table
        else if (oldThr > 0) // initial capacity was placed in threshold
            newCap = oldThr;
        else {               // zero initial threshold signifies using defaults
          // If the current hash table is empty, use the default arguments
            newCap = DEFAULT_INITIAL_CAPACITY;
            newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
        }
   			// If the new hash table does not have a threshold, because there is no expansion threshold set in the above condition,
        if (newThr == 0) {
          	// Get the threshold based on the new table capacity and custom load factor
            float ft = (float)newCap * loadFactor;
          	// Ensure that the threshold is within the specified range
            newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                      (int)ft : Integer.MAX_VALUE);
        }
   			// Update the expansion threshold
        threshold = newThr;
        @SuppressWarnings({"rawtypes","unchecked"})
   					// Create a new hash table based on the new length obtained
            Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
   			// Assign
        table = newTab;
   			// If there are elements in the old hash table, then the elements in the old hash table are moved to the new hash table
        if(oldTab ! =null) {
          	// Iterate over the old hash table
            for (int j = 0; j < oldCap; ++j) {
              	// Get the current node
                Node<K,V> e;
              	// If the current node has elements, the current list is assigned to e
                if((e = oldTab[j]) ! =null) {
                  	// Set the old hash node to null
                    oldTab[j] = null;
                  	// If the list under the current hash node has only one element, then the index is computed directly into the new array
                    if (e.next == null)
                        newTab[e.hash & (newCap - 1)] = e;
                   // If a hash collision occurs and the number of nodes exceeds 8, the tree becomes a red-black tree
                  // If the node is a red-black tree, execute the split method to process the red-black tree node, including upgrading, demoting, and falling back to the linked list
                    else if (e instanceof TreeNode)
                        ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                    else { // preserve order
                      /** * If a hash collision occurs and the number of nodes is less than 8, the hash value of each node in the list will be put into the corresponding subscript of the new hash table. Because the new hash table has been expanded, the position may change during the conversion. * the "lo" prefix represents the node where the original bucket is stored, and the "hi" prefix represents the node where the new bucket is stored. * Head is the Head node and Tail is the Tail node */
                      	// The first and last nodes of the original list
                        Node<K,V> loHead = null, loTail = null;
                      	// The first and last nodes of the new list
                        Node<K,V> hiHead = null, hiTail = null;
                      	// point to the next node of e
                        Node<K,V> next;
                        do {
                            next = e.next;
                           	// If-else: iterates through a list of elements
                            // if (e.hashes & oldCap) == 0, then store in loHead chain
                            // Do not meet the requirements of the hiHead chain, and are tail insertion, which is different from JDK7 header insertion
                          	// There are detailed explanations below
                            if ((e.hash & oldCap) == 0) {
                                if (loTail == null)
                                    loHead = e;
                                else
                                    loTail.next = e;
                                loTail = e;
                            }
                            else {
                                if (hiTail == null)
                                    hiHead = e;
                                elsehiTail.next = e; hiTail = e; }}while((e = next) ! =null);
                      // Place the loHead chain in the array
                        if(loTail ! =null) {
                            loTail.next = null;
                            newTab[j] = loHead;
                        }
                      // Place the hiHead chain at position (j + oldCap) in the array
                        if(hiTail ! =null) {
                            hiTail.next = null;
                            newTab[j + oldCap] = hiHead;
                        }
                    }
                }
            }
        }
        return newTab;
    }
Copy the code

(e.hash&oldCap)==0, then place the two lists in different positions in the new array.
The hash value of the key and the length of the old array are calculated as & (1 if one of the two bits is 1, 0 if not), and then the position is determined by whether the highest bit is 0 or 1.

Here’s an example:

Key/value pair	Key hash (binary)	Before the expansion &	Subscript before expansion	After the expansion &	Subscript after expansion
{ 5 : A }	0000 0000 0000 0000 0101	0101	5	0 0101	5
{ 21 : B }	0000 0000 0000 0000 0000 0001 0101	0101	5	1, 0101	21
{ 37 : C }	0000 0000 0000 0000 0010 0101	0101	5	0 0101	5
{ 53 : D }	0000 0000 0000 0000 0000 0101	0101	5	1, 0101	21

Look at the table above, the hash value of the key evaluates to the length of the old array. The default length of the old array is 15, the binary of 15 is 0000 0000 0000 0000 1111, and the hash value of 5,21,37, 53, when you do ampersand you get to 0101 and you put it under a linked list. When the array is expanded, the length of the array is increased by a binary bit. The length of the array is increased by a binary bit. The length of the array is increased by a binary bit (2 to the fourth power -1)= 15. The same am& gives 00101 and 1 0101, 00101 is the same as before and the subscript of 1 0101 is actually 00101 + 10000 = 5 + 16 = j + oldCap = 21. The index will be 21, and then the array will be expanded and the data will be added to the index 21. The reason why the number of binary digits has changed from 4 to 5 is because the array expansion mechanism is to increase the number of powers of 2.

Conclusion: The array size is always a multiple of 2, so when you expand the array size, the ampersand operation increases the array length and subscript by one bit, and the last high value (the first left) will determine the element’s subscript. If the high level is 0, no position adjustment is required; if the high level is 1, the element is adjusted to the j + oldCap position.

2.4: Some of the main methods in HashMap

2.4.1: The Put() method in HashMap

 public V put(K key, V value) {
        return putVal(hash(key), key, value, false.true);
    }


	/** * onlyIfAbsent Overwrites the value of the element with the same key only if the value of the original element is not null. ** * Evict is false. This parameter is used for LinkedHashMap, there is nothing else (empty implementation) */
   final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
     		// the variable TAB is the reference to the array to be operated on, p is the node to be operated on, n is the length of TAB, and I is the subscript of TAB
        Node<K,V>[] tab; Node<K,V> p; int n, i;
     		// If the TAB is not initialized, use resize() to expand and initialize the TAB
        if ((tab = table) == null || (n = tab.length) == 0)
            n = (tab = resize()).length;
     		P = TAB [I]; p= TAB [I];
        if ((p = tab[i = (n - 1) & hash]) == null)
          	// If the value is null, there is no previous element, no hash collision, and a new node is inserted
            tab[i] = newNode(hash, key, value, null);
        else {
          // when p is not null
            Node<K,V> e; K k;
          	// If the hash value of p is equal to the hash value of key, or the key of p is the same as the key passed in
            if(p.hash == hash && ((k = p.key) == key || (key ! =null && key.equals(k))))
             // If equal, the reference to p is assigned to e
                e = p;
            // If p is a red-black tree node
            else if (p instanceof TreeNode)
                // put it in and return it to e
                e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
            else {
              	// The next step is to determine whether the inserted list remains a linked list or becomes a red-black tree
              	//binCount Counts the number of linked list elements
                for (int binCount = 0; ; ++binCount) {
                    if ((e = p.next) == null) {
                      	// When the next element of p in the list is null, insert that element after p,
                        // And give a reference to the new node to p.ext
                      	// Pass the last argument to null to ensure that this node is the last one
                        p.next = newNode(hash, key, value, null);
                      	// After successful insertion, determine whether to convert to a red-black tree structure
                      	// After inserting an element, the list element has been added by one, but the binCount has not been ++, so the threshold decreases by one
                        if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                         
														// If the list has reached 8 nodes and the array length has reached 64, then it is converted to red-black tree storage
                          	// Otherwise, expand the capacity
                            treeifyBin(tab, hash);
                        break;
                    }
                  	// If the same node is found, the node reference is assigned to e, and the final processing is unified
                    if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k))))
                        break; p = e; }}// Handle the case where the key already exists
            if(e ! =null) { // existing mapping for key
                V oldValue = e.value;
              OnlyIfAbsent Indicates that the same key is present.
              // When onlyIfAbsent is false or oldValue is null, the operation is overwritten.
                if(! onlyIfAbsent || oldValue ==null)
                    e.value = value;
              // The end operation of linkedHashMap has no real meaning in HashMap
                afterNodeAccess(e);
              	// oldValue is returned if the key was inserted before it existed, or null if new data was added
                return oldValue;
            }
        }
        ++modCount;
        if (++size > threshold)
            resize();
     		// The end operation of linkedHashMap has no real meaning in HashMap
        afterNodeInsertion(evict);
        return null;
    }
Copy the code

2.4.2: Get () method in HashMap

 public V get(Object key) {
        Node<K,V> e;
        return (e = getNode(hash(key), key)) == null ? null : e.value;
    }

// Get value based on the hash value of key
 final Node<K,V> getNode(int hash, Object key) {
    // TAB: internal array first: the first node of the index bit n: the array length k: the key of the first node of the index bit
        Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
   The length of the array is greater than 0. The first node of the index is not null
    // Based on the hash value of the key, compute the subscript (n-1) & hash, if there is a subscript, then get the element directly
        if((tab = table) ! =null && (n = tab.length) > 0 &&
            (first = tab[(n - 1) & hash]) ! =null) {
            if (first.hash == hash && // always check first node((k = first.key) == key || (key ! =null && key.equals(k))))
                return first;
          // We iterate through the linked list and red-black tree structure until the next element is null
            if((e = first.next) ! =null) {
                if (first instanceof TreeNode)
                    return ((TreeNode<K,V>)first).getTreeNode(hash, key);
                do {
                    if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k))))
                        return e;
                  // If the key is different, continue until the end of the list, e.next == null
                } while((e = e.next) ! =null); }}// Return null if not found
        return null; } - Check whether the keys have the same hash value and match equals. - Check whether the hash table array isnullAnd index bit head (the first node of the bucket)null. - Not the first node, not a red-black tree, so start traversing the list to get nodes with the same key as keyCopy the code

Common problems in HsahMap

3.1: What is the capacity expansion mechanism of HashMap and when will capacity expansion occur?

The hash table is null or has a length of 0.
The number of K-V logarithms stored in the hash table exceeded threshold.
Capacity expansion also occurs when the list length exceeds 8 and the hash table length reaches 64.

3.2: HashMap is not thread-safe. Why? How to keep thread safe?

Parsing: There is such code in the put() method

 if ((p = tab[i = (n - 1) & hash]) == null) // Insert elements directly if there is no hash collision
  tab[i] = newNode(hash, key, value, null);
Copy the code

If we use multithreading and HashMap insert element, thread and thread B put operation and thread and thread A and B insert element subscript hash value is consistent, when the thread A walk into the if condition temporarily suspended, thread B came in also insert element, and successfully into the elements, at this point thread B hangs, Thread A is released, and the data inserted by thread B will be overwritten by the data inserted by thread A due to the judgment of hash collision. As A result, data inconsistency occurs. So threads are not safe.

Can be used in cases where threads are not safe

Collections.SynchronizedMap()
Copy the code

To implement.

3.3: Why is 8 selected as the threshold value for converting linked lists to red-black trees in HashMap?

According to the Poisson distribution, when the default load factor is 0.75, the probability of the number of elements in a single hash slot is less than one in a million. Therefore, 7 is regarded as a watershed. When 7 is equal to 7, the conversion is not performed, but when 8 is greater than or equal to 8, and when 6 is less than or equal to 6, the linked list is converted.

At this point, HashMap source code research comes to an end, welcome to discuss and learn together.