First, HashMap source code analysis

1.1 Member Variables

Static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; Static final int MAXIMUM_CAPACITY = 1 << 30; Static final float DEFAULT_LOAD_FACTOR = 0.75f; Static final int TREEIFY_THRESHOLD = 8; Static final int UNTREEIFY_THRESHOLD = 6; static final int UNTREEIFY_THRESHOLD = 6; Static final int MIN_TREEIFY_CAPACITY = 64; static final int MIN_TREEIFY_CAPACITY = 64; //Node array TRANSIENT Node<K,V>[] table;Copy the code
static class Node<K,V> implements Map.Entry<K,V> { final int hash; final K key; V value; Node<K,V> next; Node(int hash, K key, V value, Node<K,V> next) { this.hash = hash; this.key = key; this.value = value; this.next = next; }}Copy the code

1.2 Construction method

    public HashMap(int initialCapacity, float loadFactor) {
        if (initialCapacity < 0)
            throw new IllegalArgumentException("Illegal initial capacity: " +
                                               initialCapacity);
        if (initialCapacity > MAXIMUM_CAPACITY)
            initialCapacity = MAXIMUM_CAPACITY;
        if (loadFactor <= 0 || Float.isNaN(loadFactor))
            throw new IllegalArgumentException("Illegal load factor: " +
                                               loadFactor);
        this.loadFactor = loadFactor;
        this.threshold = tableSizeFor(initialCapacity);
    }

    public HashMap(int initialCapacity) {
        this(initialCapacity, DEFAULT_LOAD_FACTOR);
    }

    public HashMap() {
        this.loadFactor = DEFAULT_LOAD_FACTOR; // all other fields defaulted
    }

    public HashMap(Map<? extends K, ? extends V> m) {
        this.loadFactor = DEFAULT_LOAD_FACTOR;
        putMapEntries(m, false);
    }
Copy the code

None of the first three constructors allocated memory for the table, which means that the allocation of memory for the TABel occurs during the PUT operation.

1.3 put operation

public V put(K key, V value) { return putVal(hash(key), key, value, false, true); Hash static final int hash(Object key) {int h; // Hash static final int hash(Object key) {int h; return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16); } final V putVal(int hash, K key, V value, boolean onlyIfAbsent, boolean evict) { Node<K,V>[] tab; Node<K,V> p; int n, i; / / if the current table not initialized, then directly expands the if ((TAB = table) = = null | | (n = TAB. Length) = = 0) n = (TAB = the resize ()). The length; If ((p = TAB [I = (n-1) & hash]) == null) TAB [I] = newNode(hash, key, value, null); if (p = TAB [I] = (hash, key, value, null); Else {// Key Node<K,V> e; K k; / / if the key is equal, then the value for cover, here e points to the current position of the first Node, the subsequent have covered operation if (p.h ash = = hash && ((k = p.k ey) = = key | | (key! = null && key.equals(k)))) e = p; Else if (p instanceof TreeNode) e = ((TreeNode<K,V>)p).puttreeval (this, TAB, hash, key, value); Else {// Iterate through the list, overwriting the value if the key is equal; // If there is no equivalent key in the list, connect the new node directly to the end of the list and determine whether to convert to a red-black tree. for (int binCount = 0; ; ++binCount) { if ((e = p.next) == null) { p.next = newNode(hash, key, value, null); if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st treeifyBin(tab, hash); break; } if (e.hash == hash && ((k = e.key) == key || (key ! = null && key.equals(k)))) break; p = e; }} // The same key is found, and value is overwritten. if (e ! = null) { // existing mapping for key V oldValue = e.value; if (! onlyIfAbsent || oldValue == null) e.value = value; afterNodeAccess(e); return oldValue; } } ++modCount; If (++size > threshold) resize(); afterNodeInsertion(evict); return null; }Copy the code

Put the process:Note the method of calculating array subscripts :(n-1) & hash, which is bitwise and hash by subtracting the length of the array by 1

1.4 Capacity Expansion

final Node<K,V>[] resize() { Node<K,V>[] oldTab = table; int oldCap = (oldTab == null) ? 0 : oldTab.length; int oldThr = threshold; int newCap, newThr = 0; If (oldCap >= MAXIMUM_CAPACITY) {threshold = integer.max_value; if (oldCap >= MAXIMUM_CAPACITY) {threshold = integer.max_value; return oldTab; Else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY && oldCap >= DEFAULT_INITIAL_CAPACITY) newThr = oldThr << 1; // double threshold } else if (oldThr > 0) // initial capacity was placed in threshold newCap = oldThr; else { // zero initial threshold signifies using defaults newCap = DEFAULT_INITIAL_CAPACITY; newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY); } if (newThr == 0) { float ft = (float)newCap * loadFactor; newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ? (int)ft : Integer.MAX_VALUE); } threshold = newThr; @SuppressWarnings({"rawtypes","unchecked"}) Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap]; table = newTab; if (oldTab ! = null) {// walk through old table for (int j = 0; j < oldCap; ++j) { Node<K,V> e; if ((e = oldTab[j]) ! = null) { oldTab[j] = null; if (e.next == null) newTab[e.hash & (newCap - 1)] = e; else if (e instanceof TreeNode) ((TreeNode<K,V>)e).split(this, newTab, j, oldCap); LoHead = null, loTail = null; loHead = null; loTail = null; Node<K,V> hiHead = null, hiTail = null; Node<K,V> next; do { next = e.next; If ((e.hash & oldCap) == 0) {if (loTail == null) loHead = e; else loTail.next = e; loTail = e; } oldCap else {if (hiTail == null) hiHead = e; else hiTail.next = e; hiTail = e; } } while ((e = next) ! = null); If (loTail! = null) { loTail.next = null; newTab[j] = loHead; } // put the bucket in the old index +oldCap if (hiTail! = null) { hiTail.next = null; newTab[j + oldCap] = hiHead; } } } } } return newTab; }Copy the code

The size of the array is doubled. When calculating the index of the array, you do not need to recalculate the hash. You only need to check whether the new higher bit of the hash value is 0 or 1.

1.5 get operation

public V get(Object key) { Node<K,V> e; return (e = getNode(hash(key), key)) == null ? null : e.value; } final Node<K,V> getNode(int hash, Object key) { Node<K,V>[] tab; Node<K,V> first, e; int n; K k; if ((tab = table) ! = null && (n = tab.length) > 0 && (first = tab[(n - 1) & hash]) ! = null) { if (first.hash == hash && // always check first node ((k = first.key) == key || (key ! = null && key.equals(k)))) return first; if ((e = first.next) ! = null) { if (first instanceof TreeNode) return ((TreeNode<K,V>)first).getTreeNode(hash, key); do { if (e.hash == hash && ((k = e.key) == key || (key ! = null && key.equals(k)))) return e; } while ((e = e.next) ! = null); } } return null; }Copy the code

Check whether the hash of the key can be found in the array. If the hash of the key cannot be found, the key does not exist. If it does, it traverses the linked list or red-black tree to find the corresponding node.

HashMap common interview questions

2.1 How is HashMap implemented?

  • Jdk1.7 is implemented with an EntrySet array, that is, the element of an array stores an EntrySet object. The EntrySet has Pointers to subsequent nodes, so the whole implementation is array + linked list. If the hash value of the current key is the same as the key value, the value is overwritten. If the hash value of the key at the current position is equal but the key value is not, the conflict is resolved using the zipper method.
  • Jdk1.8 uses array + linked list + red-black tree. When the length of the linked list exceeds a certain length, the disadvantage of the linked list will be magnified, that is, the search time of the linked list is long, so the linked list is transformed into a red-black tree to reduce the search time. The time complexity of the linked list search is O(n), and the red-black tree is O(logn).

2.2 Why is the length of a HashMap a power of 2?

  • You can improve efficiency when you compute subscripts. Computing subscript naturally comes up with the mod operation, but it is found that the mod operation when the divisor is a power of 2 is equivalent to the bitwise sum operation when the divisor is a power of 2. That is to say, when length is a power of 2, hash%length == hash&(length-1), and the binary bit operation is more efficient than the mod operation.
  • Reduce hash collisions. Discover the method of calculating array subscript (leng-1) & hash through the source code. If length is a power of 2, then the low bits of Length-1 are all ones in binary notation, as shown below.

If length is not a power of 2, then length-1 will have a 0, as shown below.If length is not a power of 2, hashCode has multiple possibilities for index=21, and the lower part of h is no longer unique, so the chance of hash collisions increases.

  • In expansion, the new position of the element will either remain unchanged or be moved by a power of 2 on the basis of the original position, which greatly reduces the re-hashing of the previously well-hashed old data, as shown in the figure below.

2.3 What does the HashMap thread worry about now?

  • In jdk1.7, a loop occurs in the linked list. Key codes for jdk1.7 expansion:
void transfer(Entry[] newTable, boolean rehash) { int newCapacity = newTable.length; for (Entry<K,V> e : table) { while(null ! = e) { Entry<K,V> next = e.next; If (rehash) {e.hash = null == e.key? 0 : hash(e.key); } int i = indexFor(e.hash, newCapacity); e.next = newTable[i]; newTable[i] = e; e = next; }}}Copy the code

The whole process is a double loop, first traversing the array, then traversing the chain on the array, and then placing the nodes in the correct position by means of head insertion. Now suppose that the capacity of a HashMap is 2 and the load factor is 0.75, then the capacity will be expanded when the second key is put. Assume that both threads are moved to position 3. When both threads are put at the same time, thread 2 is suspended at the place marked [1] and thread 1 completes the capacity expansion.

At this point, thread 2 recovers and continues to perform the following steps, resulting in the following figure:

    if (rehash) {
        e.hash = null == e.key ? 0 : hash(e.key);
    }
    int i = indexFor(e.hash, newCapacity);
    e.next = newTable[i];
    newTable[i] = e;
    e = next;
Copy the code

After execution, the variable e points to node B, because e is not null, then continue to execute the loop body, after execution of the reference relation:

Variable E refers back to node A again, and can only continue to execute the loop body. Here is a careful analysis:

  1. Finish Entry

    next = e.next; At present, node A does not have next, so variable next points to null;
    ,v>
  2. e.next = newTable[i]; NewTable [I] points to B, that is, a’s next points to node B, then B points to A, a points to B, forming a cycle.
  3. newTable[i] = e; Node A is placed in array I;
  4. e = next; E points to null.

Final reference relationship:

If c is a new key, and c hashing the array index at 3, but there is no key c in the map, then when YOU get(c), you are stuck in an infinite loop between B and A.

  • In JDK1.8, the head insertion method is changed to the tail insertion method. Theoretically, there is no problem with the dead loop, but there is the problem of data loss.

    Jdk1.8 putVal

final V putVal(int hash, K key, V value, boolean onlyIfAbsent, boolean evict) { Node<K,V>[] tab; Node<K,V> p; int n, i; / / if the current table not initialized, then directly expands the if ((TAB = table) = = null | | (n = TAB. Length) = = 0) n = (TAB = the resize ()). The length; // Calculate the array index position, if there is no key in the position, If ((p = TAB [I = (n-1) & hash]) == null) // label [2] TAB [I] = newNode(hash, key, value, null); / / omit}Copy the code

If thread 1 completes the annotation [2] and the result is true and suspends here, thread 2 inserts a new node at TAB [I], then thread 1 resumes execution and updates its node data at TAB [I], and thread 2’s data is overwritten.

2.4 Since HashMap threads are not safe, is there a thread-safe Map?

ConcurrentHashMap, Hashtable

2.5 Why is the default HashMap load factor 0.75?

After extensive testing, 0.75 was found to be the best. It not only has high space utilization, but also reduces the hash conflict.

2.6 Can the Array + Array structure implement HashMap?

It’s ok, but it’s not performing well. Array columns are used to store keys for different hashes, and rows are used to resolve hash conflicts. Array rows cannot be too small, which requires frequent expansion, or too large, which wastes space.

Third, summary

  • To avoid frequent expansion, it is best to specify the size when initializing the HashMap.
  • Hashmaps are not thread-safe, and in multithreaded cases looping lists and data loss can occur.
  • Jdk1.8 introduced red-black trees, which convert to red-black trees when chains are expressed to a certain length to reduce search time.
  • The Key value of a HashMap is unique, and the Key and value can be null.