HashMap core attributes

The core attributes in the HashMap are listed in the figure below. One thing to note here is how the expansion threshold is calculated.

Capacity expansion threshold = loadFactor loadFactor x array length capacity

Insert method put()

 public V put(K key, V value) {
        return putVal(hash(key), key, value, false, true);
    }
Copy the code

The put () method is really just a call to two methods, hash () and putVal ()

Perturbation function hash ()

Source:

static final int hash(Object key) {
        int h;
        return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
    }
Copy the code

Analysis:

The hash method here is to pass in a key value, which returns 0 if the key is null, and an int if the key is not null. This integer is obtained by shifting h from the key to the right by 16 bits. This line of code increases the hash property of the hash.

How does the hash() function increase the hash capability?

What is the advantage of calculating the hash of key h with the 16 bits higher than h?

Any hashCode method of type Object yields a hash of type int. In Java, ints are 4*8=32 bits. If the hash table length is small, the routing algorithm hash&(Length-1) will result in only low hash values being computed. Hash values with the same low value but different high values collide, as in:

// Hash collision example: H1:00000000 00000000 00000000 00000101&1111 = 0101 H2: 00000000 111111 00000000 00000101&1111 = 0101Copy the code

When the perturbation algorithm is added, the high 16 bits of the hash are moved to the right and xOR is performed with the original hash value, mixing the high 16 bits with the low 16 bits to obtain a more hashed lower 16 bits hash value, as shown in the following example:

00000000 00000000 00000000 00000101 // H1 00000000 00000000 00000000 00000000 // H1 >>> 16 00000000 00000000 00000000 00000101 // hash = H1 ^ (H1 >>> 16) = 5 00000000 11111111 00000000 00000101 // H2 00000000 00000000 00000000 11111111 //  H2 >>> 16 00000000 00000000 00000000 11111010 // hash = H2 ^ (H2 >>> 16) = 250Copy the code

In the end

// No Hash collision index1 = (n-1) &h1 = (16-1) &5 = 5 index2 = (n-1) &h2 = (16-1) &250 = 10Copy the code

putVal()

PutVal () is the real method for inserting elements, but it is not open to the public. To get a sense of how putVal() inserts elements,

First, check whether the hash table is initialized. If not, call the expansion method at this location (described below). Length ==0 to hashmap. length>0 or hashMap == null to hashMap! =null phase).

This begs the question, why is a HashMap not initialized (lazily) from the first time it is new, but the first time it calls the PUT method?

Since hash table initialization takes up a lot of memory, the user might just new the HashMap and not use it, and delayed initialization can reduce memory usage.

If the hash table is already initialized, the index element at this position in the bitbucket is determined by route addressing (route addressing algorithm: bucket = (table.length-1) & hash)

  • If the element does not exist, save it directly to the location

  • If there are elements, the bucket determines whether the elements are the same as those to be inserted, and if so, the replacement operation is performed

    If not, determine whether the bucket is a linked list or a red-black tree.

    If the current position is a linked list structure, the list is iterated to see if there are elements that are the same as key, and the replacement is performed if there are

    If not, the tail is inserted and the tree method is determined

What are the conditions for treification?
  • If the list length >=8 and the array length >=64, the list is converted to a red-black tree.
  • If the list length is greater than or equal to 8, but the array length is less than 64, it will be expanded preferentially rather than converted to a red-black tree.
  • When the number of nodes in the red-black tree <=6, it is automatically converted into a linked list.
Why do I need an array length of 64 to convert a red-black tree?

When the array length is short, such as 16, list the length of the eight is occupied 50%, maximum mean load has almost reached ceiling, as if into a red-black tree, after the expansion will once again split the red-black tree to the new array, so that instead of performance benefits, it will reduce performance. Therefore, when the array length is less than 64, the expansion is preferred.

Why does it have to be greater than or equal to 8 to turn into a red-black tree, rather than 7 or 9?

Tree nodes are larger than ordinary nodes. In short linked lists, red-black trees do not show obvious performance advantages, but waste space. In short linked lists, linked lists are used instead of red-black trees. In theoretical mathematics (loading factor =0.75), the probability of the list reaching 8 is 1 in a million; With 7 as the watershed, anything above 7 becomes a red-black tree, and anything below 7 becomes a linked list. Red-black trees are designed to withstand a lot of hash collisions in extreme cases, where a linked list is more appropriate.

Source:

final V putVal(int hash, K key, V value, boolean onlyIfAbsent, Boolean evict) {// TAB indicates current hash table //p indicates current Node Node<K,V>[] TAB; Node<K,V> p; int n, i; / / here for initialization TAB, which is equal to the table in a HashMap, if the current hash table is empty or the length of the hash table to 0 if (= = null (TAB = table) | | (n = TAB. Length) = = 0) / / call expansion method, and give the n value assignment. N = (TAB = resize()).length; n = (TAB = resize()).length; If ((p = TAB [I = (n-1) & hash]) == null) // Assign the value at the current subscript position, TAB [I] = newNode(hash, key, value, null); // the current position of the Node is not empty // there are three cases: ① the current position of the Node is the same as the key of the element to be inserted - ② red black tree - ③ Linked list else {//e represents the value of the element to be inserted, k represents a temporary key Node< k,V> e; K k; / / (1) if the elements of the current node and to insert elements if same (p.h ash = = hash && ((k = p.k ey) = = key | | (key! = null && key.equals(k)))) // Assign the same node position to e, indicating that the subsequent replacement operation e = p; Else if (p instanceof TreeNode) e = ((TreeNode<K,V>)p).puttreeval (this, TAB, hash, key, value); For (int binCount = 0; ; ++binCount) {// Iterate over the last element, If ((e = p.ext) == null) {// Insert into the last node p.ext = newNode(hash, key, value, null); // After inserting the current element, check whether the tree criteria are met. That is, check whether the following two conditions are met. If (binCount >= treeify_threshold-1) // -1 for 1st // Treify treeifyBin(TAB, hash); break; } / / found in the list and insert the key elements of the same node elements if (e.h ash = = hash && ((k = e.k ey) = = key | | (key! = null && key.equals(k)))) // To end the loop, perform the break operation; //p=p.next p = e; }} //e is the node element with the same key as the current insert element key. If e is not null, it means that the element with the same key is found. To perform the replacement if (e! = null) { // existing mapping for key V oldValue = e.value; if (! OnlyIfAbsent | | oldValue = = null) / / new value to replace the old value e.v alue = value; afterNodeAccess(e); return oldValue; +modCount +modCount +modCount +modCount +modCount If (++size > threshold) // Expand resize(); afterNodeInsertion(evict); return null; }Copy the code

Expansion method resize()

The expansion method resize() is divided into two parts in the source code

  • Set the new array length newCap and the new expansion threshold
  • Start real expansion

The first part is to calculate the array length after expansion and the expansion threshold value after expansion. There are different calculation methods according to different conditions of the hash table before expansion.

The second section performs the actual expansion

  • Create a new array
  • Copy each element into the new array
    • The bucket element is a single piece of data that is placed directly into a new hash table based on the hash value computed by the route address
    • The bucket element is a linked list. Check whether it is a high-order or low-order list
      • High order list, put the old array subscript + the old array length position
      • Low-order linked list, placed in the same place as the old array subscript

Source:

Final Node<K,V>[] resize() {oldTab <K,V>[] oldTab = table; Int oldCap = (oldTab == null)? Int oldCap = (oldTab == null)? 0 : oldTab.length; Int oldThr = threshold; Int newCap, newThr = 0; int newCap, newThr = 0; // If the size of the array before expansion is greater than 0, the array is initialized normally. If (oldCap > 0) {// If (oldCap >= MAXIMUM_CAPACITY) {// Set the capacity expansion threshold to int maximum threshold = Integer.MAX_VALUE; // The capacity cannot be expanded. Return oldTab; } // Set the length of the new array to twice that of the old one. Else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY && oldCap >= DEFAULT_INITIAL_CAPACITY) // If the preceding conditions are met, the new capacity expansion threshold is twice the old capacity expansion threshold newThr = oldThr << 1; // Double threshold} // double threshold =0, that is, the hash table is not initialized, and the old threshold is greater than 0. new HashMap(initCap,loadFactor),new HashMap(initCap),new HashMap(map) else if (oldThr > 0) // initial capacity was Placed in threshold newCap = oldThr; // Zero initial threshold Using defaults // So I set the new array length to the default newCap = DEFAULT_INITIAL_CAPACITY; // Use the load factor to calculate the new capacity expansion threshold newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY); If (newThr == 0) {float ft = (float)newCap * loadFactor; float ft = (float)newCap * loadFactor; newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ? (int)ft : Integer.MAX_VALUE); } // Set capacity expansion threshold to the new capacity expansion threshold = newThr; Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap]; @suppressWarnings ({"rawtypes","unchecked"}) // Construct an expanded hash object Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap]; // set table to a new hash table table = newTab; // The old hash table is not empty. If (oldTab! For (int j = 0; j < oldCap; ++j) { Node<K,V> e; if ((e = oldTab[j]) ! OldTab [j] = null; oldTab[j] = null; NewTab [e.hash & (newcap-1)] = e; if (e.next == null) newTab[e.hash & (newcap-1)] = e; Else if (e instanceof TreeNode) ((TreeNode<K,V>)e).split(this, newTab, j, oldCap); Else {// preserve order //③ list structure //loHead indicates the head of the list, loTail indicates the head of the list, Node<K,V> loHead = null, loTail = null; Node<K,V> hiHead = null, hiTail = null; // next pointer to the list Node<K,V> next; // iterate over the list do {// point to the next node next = e.next; If ((e.hash & oldCap) == 0) {// If (loTail == null) // The head points to e loHead = e; Next = e; lotail.next = e; loTail = e; } else {if (hiTail == null) // If (hiHead = e; Else // The next bit of the tail points to e hitail. next = e; hiTail = e; } } while ((e = next) ! = null); // If (loTail! Lotail.next = null; lotail.next = null; NewTab [j] = loHead; newTab[j] = loHead; newTab[j] = loHead; } // If (hiTail! = null) {// set it to null because other references may exist. NewTab [j + oldCap] = hiHead; newTab[j] = hiHead; } } } } } return newTab; }Copy the code

In a linked list structure, how do you distinguish low – and high-order linked lists?

Use e. Hash & oldCap==0 to indicate low order list, otherwise high order list.

Analysis:

What are high and low linked lists?

In the following figure, the length of the unexpanded array is 16 and that of the expanded array is 32. 1111 is derived from hash& (n-1) at subscript 15 of the pre-expanded hash table, so the last four hash values of the linked list elements at subscript 15 must be 1111.

So there are two things that happen,

  • 1111 is preceded by 1, which means the hash value is…. 1, 1111

  • 1111 is preceded by 0, which means the hash value is…. 0 1111

Determine whether the list element is high or low based on whether the preceding digit is 1 or 0, and store it to the specified value.

E.hash &oldCap ==0, e.hash &oldCap ==0, e.hash &oldCap ==0, e.hash &oldCap ==0

OldCap represents the array length before expansion, where oldCap=16=1 0000

E. ash&oldCap has the following two situations

  • 1, 1111

& 1, 0000

= 1 0000, high value is 1, put into high chain

  • 0 1111

& 1, 0000

= 0 0000, high is 0, put in low chain.


Reference:

  • HashMap full B station the most detailed source analysis course, look at the end of the monthly salary at least 5K!

  • JDK1.8 HashMap parsing

  • Hash algorithm in HashMap (perturbation function)

  • Take a closer look at HashMap