preface

HashMap, the most commonly used collection class framework in Java, is a very typical data structure in the Java language. The following blogger takes a closer look at this data structure through the source code of HashMap.

1 HashMap source code analysis

Note: the source code is based on JDK1.8, which is different from JDK1.7 and earlier versions

1.1 a hash table

Before analyzing the HashMap source, let’s take a look at the differences between arrays, linked lists, and hash tables.

An array of



Arrays, as we all know, are contiguous Spaces in memory, so reads are efficient, but deletes and inserts are inefficient, because inserts and deletes can cause other data to move left and right.

The list

Large memory usage, small space complexity, O(N) time complexity, fast insertion and deletion, high memory utilization.

The disadvantage is that the query efficiency is relatively low, have to start from the first traversal.

Hash table



Hash table is the combination of array and linked list, take the advantages and disadvantages of both to do a combination and balance.

1.2 put method

After entering the put source, the first method is as follows,

public V put(K key, V value) {
    return putVal(hash(key), key, value, false.true);
}
Copy the code

The key to this method is the hash(key), which generates the hash of the key. Public native int hashCode() public native int hashCode(); Then you take the top 16 bits of the hash and hash to get the lower 16 bits, which are more random. return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16); The purpose of this method is to calculate a more hashed index, reduce the hash conflict, do not understand friends can go to the special check, the algorithm design is still very subtle, the blogger here will not repeat.

final V putVal(int hash, K key, V value, boolean onlyIfAbsent, boolean evict) {
    Node<K,V>[] tab; Node<K,V> p; int n, i;
    if ((tab = table) == null || (n = tab.length) == 0)
        n = (tab = resize()).length;
    if ((p = tab[i = (n - 1) & hash]) == null)
        tab[i] = newNode(hash, key, value, null);
    else {
        Node<K,V> e; K k;
        if(p.hash == hash && ((k = p.key) == key || (key ! =null && key.equals(k))))
            e = p;
        else if (p instanceof TreeNode)
            e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
        else {
            for (int binCount = 0; ; ++binCount) {
                if ((e = p.next) == null) {
                    p.next = newNode(hash, key, value, null);
                    if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                        treeifyBin(tab, hash);
                    break;
                }
                if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k))))
                    break; p = e; }}if(e ! =null) { // existing mapping for key
            V oldValue = e.value;
            if(! onlyIfAbsent || oldValue ==null)
                e.value = value;
            afterNodeAccess(e);
            return oldValue;
        }
    }
    ++modCount;
    if (++size > threshold)
        resize();
    afterNodeInsertion(evict);
    return null;
}
Copy the code

This method implements the core operation of HashMap PUT.



The first step is to declare and then determine whether the table needs to be initialized. The result of this initialization is a Node array of length 16.



The subscript is then computed using the hash value, and if there is no data at that subscript location, the value is added. If there is a value, the conflict is handled,



As you can see, p.next indicates that it puts the value at the end of the list, making it a one-way list. Careful people will also notice that there is an if judgment.

if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
    treeifyBin(tab, hash);
Copy the code

When the length is greater than or equal to 8, the treeifyBin method is used, which has a judgment. If the length is less than 64, the capacity is expanded. Anything above 64 will turn the list into a red-black tree, increasing efficiency. This is also a change made in JDK1.8 to improve efficiency through red-black trees!

1.3 the get method

public V get(Object key) {
    Node<K,V> e;
    return (e = getNode(hash(key), key)) == null ? null : e.value;
}

final Node<K,V> getNode(int hash, Object key) {
    Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
    if((tab = table) ! =null && (n = tab.length) > 0 &&
        (first = tab[(n - 1) & hash]) ! =null) {
        if (first.hash == hash && // always check first node((k = first.key) == key || (key ! =null && key.equals(k))))
            return first;
        if((e = first.next) ! =null) {
            if (first instanceof TreeNode)
                return ((TreeNode<K,V>)first).getTreeNode(hash, key);
            do {
                if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k))))
                    return e;
            } while((e = e.next) ! =null); }}return null;
}
Copy the code

The hash value of the key is used to calculate the index of the array, and then the linked list of that index is found.

2 talk about the expansion mechanism of HashMap

2.1 Why Is capacity Expansion Required

This question is actually quite easy to answer. Because before or after Java8, the more key-values a HashMap stores for the same hash value, the more hash collisions cost. Therefore, capacity expansion is needed to improve efficiency.

2.2 When to expand capacity

  1. Expand when the threshold is reached. Threshold = array length x load factor
  2. When the list length reaches 8 and the value length is less than 64, the array is expanded.

2.3 EXPANSION Policy for JDK1.8

The expansion strategy for JDK1.7 and earlier versions is simple: define a newTable twice as long as the current table, and then find the hash conflicts and iterate over the linked list. Calculate the position of the element in the new array based on the hash value and pass it over.

After JDK1.8, there is a difference, here is the source code.

final Node<K,V>[] resize() {
    Node<K,V>[] oldTab = table;
    int oldCap = (oldTab == null)?0 : oldTab.length;
    int oldThr = threshold;
    int newCap, newThr = 0;
    if (oldCap > 0) {
        // If the size of an array exceeds the maximum size in Java, collisions can only occur
        if (oldCap >= MAXIMUM_CAPACITY) {
            threshold = Integer.MAX_VALUE;
            return oldTab;
        }
        // Double the size
        else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                 oldCap >= DEFAULT_INITIAL_CAPACITY)
            newThr = oldThr << 1; // double threshold
    }
    else if (oldThr > 0) // initial capacity was placed in threshold
        newCap = oldThr;
    else {               // zero initial threshold signifies using defaults
        newCap = DEFAULT_INITIAL_CAPACITY;
        newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
    }
    if (newThr == 0) {
        float ft = (float)newCap * loadFactor;
        newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                  (int)ft : Integer.MAX_VALUE);
    }
    // Update the new capacity to the expanded capacity
    threshold = newThr;
    @SuppressWarnings({"rawtypes","unchecked"})
    Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap]; // New array after expansion
    table = newTab;
    if(oldTab ! =null) {
        // Iterate over the old array
        for (int j = 0; j < oldCap; ++j) {
            Node<K,V> e;
            if((e = oldTab[j]) ! =null) { // There is an element in position j, put it in e
                oldTab[j] = null;
                if (e.next == null) // There is only one element
                    newTab[e.hash & (newCap - 1)] = e;
                else if (e instanceof TreeNode) // The red-black tree is treated separately
                    ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                else { // Conflicting linked lists
                    Node<K,V> loHead = null, loTail = null;
                    Node<K,V> hiHead = null, hiTail = null;
                    Node<K,V> next;
                    // Iterate over all nodes
                    do {
                        next = e.next;
                        // The hash value of the key, the length of the old array and the result of the operation determine whether the element is placed at the old index or the new index
                        if ((e.hash & oldCap) == 0) {
                            if (loTail == null)
                                loHead = e;
                            else
                                loTail.next = e;
                            loTail = e;
                        }
                        else {
                            if (hiTail == null)
                                hiHead = e;
                            elsehiTail.next = e; hiTail = e; }}while((e = next) ! =null);
                    // (e.hash & oldCap) == 0 to place the list in position j of the new array
                    if(loTail ! =null) {
                        loTail.next = null;
                        newTab[j] = loHead;
                    }
                    // all nodes of (e.hash & oldCap) == 1 are placed in the new array (j+ the old array length)
                    if(hiTail ! =null) {
                        hiTail.next = null;
                        newTab[j + oldCap] = hiHead;
                    }
                }
            }
        }
    }
    return newTab;
}
Copy the code

From the source code, we can see that the 1.8 calculation strategy is better. Use e. Hash & oldCap to calculate the position of the element in the new array, no need to hash again, the result is 0, the subscript position is 1, the subscript position = the original array length + the original subscript position.