An illustration of the storage structure in a HashMap

Because we’re talking about the implementation in version 1.8. In 1.8, HashMap is an array + linked list + red-black tree, as shown in the figure below. The main members and methods of the Hash Map will be introduced later.

What is the specific type of Node in the above diagram? Node is a HashMap inner class that implements the map.Entery interface, which basically holds the elements added by our put method. Next means that this can be a unidirectional linked list, which solves the hash conflict problem through the chained address method. When the number of elements in the bucket exceeds the threshold, the tree is changed to a red-black tree.

// The Node Node in the hash bucket implements map.entry
static class Node<K.V> implements Map.Entry<K.V> {
    final int hash;
    final K key;
    V value;
    Node<K,V> next; // The next pointer to the list
    Node(int hash, K key, V value, Node<K,V> next) {
        this.hash = hash;
        this.key = key;
        this.value = value;
        this.next = next;
    }
    public final K getKey(a)        { return key; }
    public final V getValue(a)      { return value; }
    public final String toString(a) { return key + "=" + value; }
    // Override the Object hashCode
    public final int hashCode(a) {
        return Objects.hashCode(key) ^ Objects.hashCode(value);
    }
    public final V setValue(V newValue) {
        V oldValue = value;
        value = newValue;
        return oldValue;
    }
    / / the equals method
    public final boolean equals(Object o) {
        if (o == this)
            return true;
        if (o instanceofMap.Entry) { Map.Entry<? ,? > e = (Map.Entry<? ,? >)o;if (Objects.equals(key, e.getKey()) &&
                Objects.equals(value, e.getValue()))
                return true;
        }
        return false; }}// Convert the node class to a red-black tree
static final class TreeNode<k.v> extends LinkedHashMap.Entry<k.v> {
    TreeNode<k,v> parent;  / / the parent node
    TreeNode<k,v> left; / / the left subtree
    TreeNode<k,v> right;/ / right subtree
    TreeNode<k,v> prev;    // needed to unlink next upon deletion
    boolean red;    // Color properties
    TreeNode(int hash, K key, V val, Node<k,v> next) {
        super(hash, key, val, next);
    }
    // Return the root node of the current node
    final TreeNode<k,v> root(a) {
        for (TreeNode<k,v> r = this, p;;) {
            if ((p = r.parent) == null)
                returnr; r = p; }}}Copy the code

The above is just a brief overview of the composition of HashMap, the following mainly introduces some of the parameters and important method principle implementation.

Member variables in a HashMap and their meanings

// Default initialization capacity initialization =16
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16
// Maximum capacity = 1 << 30
static final int MAXIMUM_CAPACITY = 1 << 30;
// Default load factor. The critical point for expanding a HashMap is the size of the current HashMap > DEFAULT_LOAD_FACTOR *
//DEFAULT_INITIAL_CAPACITY = 0.75F * 16
static final float DEFAULT_LOAD_FACTOR = 0.75 f;
// When the number of nodes on a bucket in the hash bucket is greater than this value, the linked list is converted to a red-black tree
static final int TREEIFY_THRESHOLD = 8;
// When the number of nodes on a bucket in the hash bucket is less than this value, the red-black tree becomes a linked list
static final int UNTREEIFY_THRESHOLD = 6;
// Convert the bucket structure to the minimum size of the table corresponding to the red-black tree
static final int MIN_TREEIFY_CAPACITY = 64;
// The hash algorithm computes the hash value of the key passed in
static final int hash(Object key) {
    int h;
    return (key == null)?0 : (h = key.hashCode()) ^ (h >>> 16);
} 
//tableSizeFor(initialCapacity) Returns the smallest quadratic power value greater than initialCapacity. Examples will be given below
static final int tableSizeFor(int cap) {
    int n = cap - 1;
    n |= n >>> 1;
    n |= n >>> 2;
    n |= n >>> 4;
    n |= n >>> 8;
    n |= n >>> 16;
    return (n < 0)?1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
}
/ / hash bucket
transient Node<K,V>[] table;
// Save the cached entrySet
transient Set<Map.Entry<K,V>> entrySet;
// The actual number of elements in the bucket! = table.length
transient int size;
// Expanded or changed the map counter. What it means: indicates the number of times the HashMap structure has been modified. Structural changes are those that change the number of mappings in the HashMap or
// Modify its internal structure (for example, rehash rehash). This field is used for collection-views that fail in a HashMap fast
// to create an iterator.
transient int modCount;
// The threshold value, when the actual size (cap*loadFactor) is greater than this value, will be expanded
int threshold;
// Load factor
final float loadFactor;
Copy the code

(1) Hash method description

/ / the hash algorithm
static final int hash(Object key) {
    int h;
    // Key == NULL: returns hash=0
    //key ! = null
    // (1) get key hashCode: h= key.hashcode ()
    // (2) Shift h unsigned 16 bits to the right
    // (3) xOR operation: h ^ h>>>16
    return (key == null)?0 : (h = key.hashCode()) ^ (h >>> 16);
}   
Copy the code

Suppose now that we add elements to a map, such as map.put(“fsmly”,”test”), the binary representation of a hashCode with key “fsmly” is 0000_0000_0011_0110_0100_0100_1001_0010. Following the above steps, we call the hash algorithm to get the hash value:

(2)tableSizeFor method Description

The function of this method is to return the smallest quadratic power greater than initialCapacity. The following example

//n=cap-1=5; Binary 0101B of 5. The >>> operator represents an unsigned right shift with a high of 0
//n |= n>>>1: (1)n=0101 | 0101>>>1; (2)n=0101 | 0010; (3)n = 0111B 
//n |= n>>>2: (1)n=0111 | 0111>>>2; (2)n=0111 | 0011; (3)n = 0111B
//n |= n>>>4: (1)n=0111 | 0111>>>4; (2)n=0111 | 0000; (3)n = 0111B
//n |= n>>>8: (1)n=0111 | 0111>>>8; (2)n=0111 | 0000; (3)n = 0111B
//n |= n>>>16:(1)n=0111 | 0111>>>16; (2)n=0111 | 0000; (3)n = 0111B
static final int tableSizeFor(int cap) {
    int n = cap - 1;
    n |= n >>> 1;
    n |= n >>> 2;
    n |= n >>> 4;
    n |= n >>> 8;
    n |= n >>> 16;
    / / n < 0 returns 1
    // If n> maximum capacity, return maximum capacity
    // Otherwise return n+1(0111B+1B=1000B=8)
    return (n < 0)?1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
}
Copy the code

And look at this:

// When cap is passed to an integer power of 2, subtraction by 1 ensures that the final result is cap, not another 2 greater than cap
// Integer power, for example we pass cap=16=10000B. Calculate as above
//n=cap-1=15=1111B
// n |= n>>>1: n=1111|0111=1111; And we're going to get the same result as n equals 1111B equals 15.
// So return 15+1;
int n = cap - 1; 
Copy the code

The HashMap constructor

Let’s look at the four constructors provided for us in the HashMap source code. As you can see, our most common no-parameter constructor initializes the loadFactor internally and nothing else, and the underlying data structure is delayed until the key-value pair is inserted, or in resize. We will talk about the implementation of the expansion method later.

// (1) The constructor is used to initialize the capacity and load factor
public HashMap(int initialCapacity, float loadFactor) {
    if (initialCapacity < 0)
        throw new IllegalArgumentException("Illegal initial capacity: " +
                                           initialCapacity);
    if (initialCapacity > MAXIMUM_CAPACITY)
        initialCapacity = MAXIMUM_CAPACITY;
    if (loadFactor <= 0 || Float.isNaN(loadFactor))
        throw new IllegalArgumentException("Illegal load factor: " +
                                           loadFactor);
    this.loadFactor = loadFactor;
    this.threshold = tableSizeFor(initialCapacity); // The threshold is the least quadratic power greater than initialCapacity
}
// (2) If only the initial capacity is given, then the load factor is the default: 0.75
public HashMap(int initialCapacity) {
    this(initialCapacity, DEFAULT_LOAD_FACTOR);
}
// (3) The load factor is the default load factor, but the initial size is not specified, and will be resized when the put or get method is called
public HashMap(a) {
    this.loadFactor = DEFAULT_LOAD_FACTOR; // all other fields defaulted
}
// (4) Call putMapEntries to add the values in the passed Map to the new map set, where the load factor is the default
public HashMap(Map<? extends K, ? extends V> m) {
    this.loadFactor = DEFAULT_LOAD_FACTOR;
    putMapEntries(m, false);
}
Copy the code

The constructor method in (4) is to add the elements in the passed map set to the map instance. The source code is as follows

// This function adds all elements from the passed map set to the map instance
final void putMapEntries(Map<? extends K, ? extends V> m, boolean evict) {
    int s = m.size(); //m.size()
    if (s > 0) {
        // If the table of this map instance is null and not initialized, initialize it
        if (table == null) { // pre-size
            // Actual size: ft = m.size() / 0.75 + 1;
            float ft = ((float)s / loadFactor) + 1.0 F;
            // Determine whether the value calculated is less than the maximum value 1<<<30
            int t = ((ft < (float)MAXIMUM_CAPACITY) ?
                     (int)ft : MAXIMUM_CAPACITY);
            // If ft is greater than the current threshold threshhold, tableSizeFor
            // The parameter is the size of the calculation, that is, recalculate the least quadratic power greater than ft
            if (t > threshold)
                threshold = tableSizeFor(t);
        }
        / / if the table! =null and m.size() > threshhold
        else if (s > threshold)
            resize();
        // Add all elements from the map to the map instance
        for (Map.Entry<? extends K, ? extends V> e : m.entrySet()) {
            K key = e.getKey();
            V value = e.getValue();
            putVal(hash(key), key, value, false, evict); }}}Copy the code

How do you determine the location of an element in a bucket

Locating the index of the hash bucket array is a critical first step, regardless of adding, deleting, or finding key-value pairs, so let’s look at how the source code uses the hash() method and other code to determine the location of an element in the hash bucket.

// Compute the hash value of the key in the map
static final int hash(Object key) {
    int h;
    return (key == null)?0 : (h = key.hashCode()) ^ (h >>> 16);
}
// This little piece of code is to locate the element in the bucket. The way to do this is: capacity n-1&hash.
// Where n is an integer power of 2, and (n-1) & hash is essentially hash%n, but
// the (n-1) & hash is not as efficient as the bitwise and, and the (n-1) & hash ensures that only an even number of bits have values
p = tab[i = (n - 1) & hash];
Copy the code

Let’s use an example to calculate the positioning process above, assuming that the bucket size n is now 16.

As you can see, instead of using the hashcode of the original object as the final hash value, the hash method does some bit operation, presumably because if (n-1) is too small, the value of (n-1) & hash will depend entirely on the low order value of the hash, such as 0000 1111, Then the final value is completely dependent on the lower 4 bits of the hash value, so the high hash is completely useless, h ^ (h >>> 16). In this way, the high hash is xOR with the low hash, which also increases the randomness of the hash. This way you don’t just rely on the object’s hashcode method.

Analysis of PUT method

(1) Put method source analysis

public V put(K key, V value) {
    return putVal(hash(key), key, value, false.true);
}

final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
               boolean evict) {
    Node<K,V>[] tab;  Node<K,V> p;  int n, i;
    // Table == null or the length of the table is 0. Call the resize method to expand the capacity
    // Table initialization is delayed until new data is inserted
    if ((tab = table) == null || (n = tab.length) == 0)
        n = (tab = resize()).length;
    // This is where the Hash algorithm is invoked. See the example below for details
    // This is where the coordinates are located
    if ((p = tab[i = (n - 1) & hash]) == null)
        // If the Node in the bucket is null, create a new Node in the bucket
        // In the table array. The hash value at that location is the hash value of the key calculated by calling the hash() method
        tab[i] = newNode(hash, key, value, null);
    // The put element has a conflict with the first element in the bucket by calculating the hash value of its key
    (1) The key is the same, but the value is different
    //(2) The index is the same, but the key and value are different. And the way we're dealing with it is with linked lists and red-black trees
    else {
        Node<K,V> e; K k;
        // P = TAB [I]; // P = TAB [I] Here are the comparisons:
        //(1) TAB [I]. Hash Whether equal to the passed hash. TAB [I] is the first element in the bucket
        //(2) Compare whether the passed key is the same as the key in this position
        //(3) If they are the same, it means the same key, then replace the corresponding value directly (the replacement will be done later)
        if(p.hash == hash && ((k = p.key) == key || (key ! =null && key.equals(k))))
            // Assign the first element in the bucket to e to record the value of the first position
            e = p;
        // This is a red black tree. Hash values and keys are not equal. Is a red-black tree node
        else if (p instanceof TreeNode)
            e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value); // Add the red-black tree
        // Judge as a linked list node
        else {
            for (int binCount = 0; ; ++binCount) {
                // If the end of the list is reached
                if ((e = p.next) == null) {
                    // Insert a new node at the end
                    p.next = newNode(hash, key, value, null);
                    // If the value of binCount is greater than 8, the tree is converted to a red-black tree
                    if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                        treeifyBin(tab, hash);
                    break;
                }
                // If the list is traversed, the key of the node to be inserted is determined by the key of some node in the list
                // Let's break out of the loop and update the old value
                if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k))))
                    break;
                / / e = p.n ext. Used to traverse a linked listp = e; }}// Check whether the insert exists in the HashMap. If e is assigned and not null, it exists. Update the old key-value pair
        if(e ! =null) { // existing mapping for key
            V oldValue = e.value;
            if(! onlyIfAbsent || oldValue ==null)
                e.value = value; // Update the old value with the passed parameter value
            afterNodeAccess(e);
            return oldValue; // Return the old value}}/ / modCount modification
    ++modCount;
    // If the capacity exceeds, expand the capacity
    if (++size > threshold)
        resize();
    afterNodeInsertion(evict);
    return null;
}
Copy the code

(2) Summary of the implementation process of put method

You can see that the main logic in putVal is to call the putVal method, and the argument passed is to call the hash() method to compute the hash value of the key. The main logic is in putVal. To familiarize yourself with the execution of this method with comments, I’ll summarize the execution of this method here:

  1. First (TAB = table) = = null | | (n = TAB. Length) = = 0 that determine whether a hash bucket is null, if null then invokes the resize method is expanded. We’ll talk about this later

  2. Locate the element’s position in the bucket by calculating the index I from the hash value of the key and the length of the hash bucket. If there is no element (null) at the calculated position, then create a new node and add it to the position.

  3. If table[I] is not null and there are already elements, then a hash conflict has occurred. There are three possible cases

    (1) Determine whether the key is the same, if the key is the same, then replace the old value with the new value;

    (2) If the key is not the same, then we need to determine whether the current bucket is converted to a red-black tree, if yes, construct a TreeNode node and insert it into the red-black tree;

    ③ If it is not a red-black tree, use the chain address method to deal with the conflict problem. The main thing here is to walk through the list, and if you find the same element as the key during the walk, you will replace the old value with the new value. Otherwise, it will traverse to the end of the list, where a new Node will be added and inserted into the list. After insertion, it needs to determine whether the threshold 8 for converting to a red-black tree has been exceeded. If so, it will be converted to a red-black tree.

  4. Finally, you need to change the modCount value.

  5. Check whether the inserted size exceeds threshhold. If the size exceeds threshhold, expand the capacity.

Many of the areas above cover expansion, so let’s first look at expansion methods.

Resize analysis

(1)resize method source code

When the size inside the map is greater than DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY, the size of the array needs to be expanded to accommodate more elements. The resize method is implemented to replace the existing smaller array with a new array.

// This method can be used in two ways: 1. Initialize the hash table (table==null) 2. The current array capacity is too small and needs to be expanded
final Node<K,V>[] resize() {
    Node<K,V>[] oldTab = table; //oldTab refers to the old table array
    // If oldTab is not null, oldCap is the size of the original table
    // If oldTab is null, oldCap is 0
    int oldCap = (oldTab == null)?0 : oldTab.length; 
    int oldThr = threshold; / / threshold
    int newCap, newThr = 0;
    if (oldCap > 0) { 
        // oldCap! =0, oldCap= original table.length();
        if (oldCap >= MAXIMUM_CAPACITY) {
            threshold = Integer.MAX_VALUE; // If it is greater than the maximum capacity, the value is assigned to the integer maximum threshold
            return oldTab;
        }
        // If the array size is expanded to less than the maximum size and oldCap is greater than the default value of 16
        // Double the length of the extension.
        else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                 oldCap >= DEFAULT_INITIAL_CAPACITY)
            newThr = oldThr << 1; // Double threshold Expands threshhold twice
    }
    else if (oldThr > 0) // initial capacity was placed in threshold
        // Here oldThr=tabSizeFor(initialCapacity), as seen from the constructor above, if not called
        // If there are no parameters, then threshhold must always be an integer power of 2 obtained by tabSizeFor
        // This is the length of the Node array
        newCap = oldThr; 
    else { The I so initial threshold using defaults refers to the 0 I so initial threshold using defaults.
        // When we call the null constructor (table == null,threshhold = 0), the new capacity is equal to silent
        And threshhold is also equal to the default load factor * the default initialized capacity
        newCap = DEFAULT_INITIAL_CAPACITY;
        newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
    }
    // Calculate the new resize upper limit
    if (newThr == 0) {
        float ft = (float)newCap * loadFactor; // Capacity * load factor
        newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                  (int)ft : Integer.MAX_VALUE);
    }
    threshold = newThr;
    @SuppressWarnings({"rawtypes"."unchecked"})
    // Create a new Node array with the new size as the length
    // Of course, the bucket array initialization is also done here
    Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap]; 
    table = newTab;
    // The original table is not null
    if(oldTab ! =null) {
        // Move each bucket to the new buckets
        for (int j = 0; j < oldCap; ++j) {
            Node<K,V> e;
            // Select * from table where subscript j is null
            if((e = oldTab[j]) ! =null) {
                oldTab[j] = null; Set table[j] to NULL;
                if (e.next == null) // If there is no list at that position, that is, only that element in the array
                    // Calculate the subscript (n-1)&hash in the new table array with the new capacity
                    newTab[e.hash & (newCap - 1)] = e; 
                else if (e instanceof TreeNode) 
                    // If the node is a red-black tree, the red-black tree needs to be split when remapping
                    ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                else { // The linked list optimizes the hash block
                    Node<K,V> loHead = null, loTail = null;
                    Node<K,V> hiHead = null, hiTail = null;
                    Node<K,V> next;
                    do {// If it is not a red-black tree, it is a linked list
                        next = e.next;
                        / / the original position
                        if ((e.hash & oldCap) == 0) {
                            // If loTail is null, add to loTail directly
                            if (loTail == null) 
                                loHead = e;
                            // Add loTail to the end of the list
                            else
                                loTail.next = e;
                            // After adding, point loTail to the end of the list, so that the next add from the tail
                            loTail = e;
                        }
                        // Original position + old capacity
                        else {
                            // If hiTail is null, add it to that position
                            if (hiTail == null)
                                hiHead = e;
                            //hiTail is the end node of the linked list
                            elsehiTail.next = e; hiTail = e; }}while((e = next) ! =null);
                    // Map the grouped list to the new bucket
                    // Put the original index into the bucket
                    if(loTail ! =null) {
                        // The old list is transferred to the new list, and the relative positions of the list elements are unchanged;
                        // The actual operation is on the memory address of the object
                        loTail.next = null;Set the end of the list to NULL
                        newTab[j] = loHead; // The head node of the list is stored at position j in the array
                    }
                    // Put the old index +oldCap into the bucket
                    if(hiTail ! =null) {
                        hiTail.next = null;
                        newTab[j + oldCap] = hiHead;
                    }
                }
            }
        }
    }
    return newTab;
}
Copy the code

(2)(E.ash & oldCap) == 0 analysis

We know that capacity is a power of 2, so oldCap is 10… (e.ash & oldCap) == 0 Binary form of 0 (e.g. 16=10000B).

(1) If the condition is true, it means that the hash bit for which oldCap is 1 is 0(1&0=0, all other bits are 0, and the result is 0), which has no effect on the calculation of the new index. Let’s take an example of how an array index changes before and after expansion:

If cap = 1 and hash = 0, the index is the same before and after resize. (e.hash & oldCap) == 0. (e.hash & oldCap) == 0

(2) If the condition is false, then the hash bit whose oldCap is 1 is 1. For example, new index =hash&(newcap-1)= hash&((16<<2) -1)=10010, which is 10000 more than oldCap. As in the following example

If cap = 1 and hash = 1, the index changes before and after resize. Let’s look at the following, using the hash value above, which corresponds to (e.ash & oldCap)! = 0, which happens to be the original index + the original capacity

(3) Partial code understanding

In fact, this part is similar to the put method, using the chain address method to resolve hash conflicts, is the operation of the list.

/ / the original position
if ((e.hash & oldCap) == 0) {
    // If loTail is null, add to loTail directly
    if (loTail == null) 
        loHead = e;
    // Add loTail to the end of the list
    else
        loTail.next = e;
    // After adding, point loTail to the end of the list, so that the next add from the tail
    loTail = e;
}
// Original position + old capacity
else {
    // If hiTail is null, add it to that position
    if (hiTail == null)
        hiHead = e;
    //hiTail is the end node of the linked list
    else
        hiTail.next = e;
    hiTail = e;
}
Copy the code

Let’s go straight to a simple diagram

(4) the resize

The resize code is a little longer, but that’s what it boils down to

  • If oldTab is null, then initialize the bucket array, which is what the no-parameter constructor initializes to. No capacity or threshold is assigned. If it is not empty, then it does a bit operation, moves one bit to the left, and expands by a factor of two.
  • To expand, create an array of new capacity and iterate over the old array:
    • If the node is empty, assign the value to insert directly
    • If the node is a red-black tree, it needs to be split (I don’t understand red-black trees yet, so I won’t explain).
    • If it is a linked list, recalculate the subscript according to the hash algorithm, and split the list into groups (I believe that we basically know the general process of split the list here).
    • Attached is a diagram of reSIZE expansion on the Internet. Blog links to https://blog.csdn.net/carson_ho/article/details/79373134

Analysis of GET method

(1) Get method source code

The basic logic is to calculate the hash value based on the key and locate the hash bucket index. If it can be the value of the current index, it directly returns the value of its value. Otherwise, we use the key to traverse the equal index key until we find the position.

public V get(Object key) {
    Node<K,V> e;
    return (e = getNode(hash(key), key)) == null ? null : e.value;
}

final Node<K,V> getNode(int hash, Object key) {
    Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
    // Calculate the location in the array table. The specific calculation method has also been introduced above
    if((tab = table) ! =null && (n = tab.length) > 0 &&
        (first = tab[(n - 1) & hash]) ! =null) {
        // Look for an element in the array
        if (first.hash == hash && // always check first node((k = first.key) == key || (key ! =null && key.equals(k))))
            return first;
        // This position is the red black root node or the chain list head node
        if((e = first.next) ! =null) {
            // If first is a red black tree node, then the red black tree is traversed
            if (first instanceof TreeNode)
                return ((TreeNode<K,V>)first).getTreeNode(hash, key);
            // If the node is not a tree, it will traverse the list to find it
            do {
                if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k))))
                    return e;
            } while((e = e.next) ! =null); }}return null;
}

Copy the code

HashMap multithreaded problem

Refer to https://coolshell.cn/articles/9606.html