00 HashMap underlying data structure

In JDK 1.7, HashMap is an array plus a list. In JDK 1.8, a red-black tree structure was added. When the list length is greater than 8 and the hash bucket capacity is greater than 64, the list structure will be converted to a red-black tree structure. Therefore, its composition structure is shown in the figure below:

Each element of the array in a HashMap is also called a hash bucket, which is an instance of a key-value. In Java7, it is called Entry. In Java8, it is called Node.

Since all of its positions are null, put inserts compute an index value based on the hash of the key. For example, I put (” Dog “, 666), insert the element “dog” into the HashMap, and hash out the index position to be 3. The structure is as follows, again an array.

If there are no hash collisions, then the list will have to be introduced. Suppose I put (” Dog “, 666) again, insert the element “dog” into the HashMap, and hash out the index position to be 3. The structure is as follows: a linked list is formed.

Each hash bucket contains four fields: hash, key, value, and next, where next represents the next node in the list.

static class Node < K.V > implements Map.Entry < K.V > {
    final int hash;
    finalK key; V value; Node < K, V > next; . }Copy the code

The reason why JDK 1.8 adds red-black trees is that if the list is too long, the performance of HashMap will be seriously affected. Red-black trees can be used to quickly add, delete, modify, and query the list, which can effectively solve the problem of slow operation when the list is too long.

01 Important methods of HashMap

PS: The following source analysis is based on JDK1.8 version.

1.0 Query (GET method)

The source code is as follows:

public V get(Object key) {
    Node < K, V > e;
    // Hash the key
    return (e = getNode(hash(key), key)) == null ? null : e.value;
}

final Node < K, V > getNode(int hash, Object key) {
    Node < K, V > [] tab;
    Node < K, V > first, e;
    int n;
    K k;
    // Non-null judgment
    if((tab = table) ! =null && (n = tab.length) > 0 &&
        (first = tab[(n - 1) & hash]) ! =null) {
        // Determine if the first element is the element to be queried
        // always check first node
        if(first.hash == hash && ((k = first.key) == key || (key ! =null && key.equals(k))))
            return first;
        // The next node is not null
        if((e = first.next) ! =null) {
            // If the first node is a tree, use getTreeNode to get the corresponding data directly
            if (first instanceof TreeNode)
                return ((TreeNode < K, V > ) first).getTreeNode(hash, key);
            do { // Non-tree structure, loop node judgment
                // If hash is equal and key is the same, return this object
                if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k))))
                    return e;
            } while((e = e.next) ! =null); }}return null;
}
Copy the code

The code comments go into detail to emphasize that when hashing conflicts, we need not only to determine the hash value, but also to determine whether the key value is equal to determine whether the element is the one we want.

1.1 New (Put Method)

The source code is as follows:

public V put(K key, V value) {
    // Hash the key
    return putVal(hash(key), key, value, false.true);
}

final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
    boolean evict) {
    Node < K, V > [] tab;
    Node < K, V > p;
    int n, i;
    // create table if hash table is empty
    if ((tab = table) == null || (n = tab.length) == 0)
        n = (tab = resize()).length;
    // Calculate the array index I to insert based on the hash value of key
    if ((p = tab[i = (n - 1) & hash]) == null)
        // If table[I] is null, insert it directly
        tab[i] = newNode(hash, key, value, null);
    else {
        Node < K, V > e;
        K k;
        // Override value if key already exists
        if(p.hash == hash && ((k = p.key) == key || (key ! =null && key.equals(k))))
            e = p;
        // If the key does not exist, check whether it is a red-black tree
        else if (p instanceof TreeNode)
            // Red-black tree inserts key-value pairs directly
            e = ((TreeNode < K, V > ) p).putTreeVal(this, tab, hash, key, value);
        else {
            // For a linked list structure, loop ready for insertion
            for (int binCount = 0;; ++binCount) {
                // The next element is empty
                if ((e = p.next) == null) {
                    p.next = newNode(hash, key, value, null);
                    // Convert to a red-black tree for processing
                    if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                        treeifyBin(tab, hash);
                    break;
                }
                // Key already exists overwriting value directly
                if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k))))
                    break; p = e; }}// existing mapping for key
        if(e ! =null) { 
            V oldValue = e.value;
            if(! onlyIfAbsent || oldValue ==null)
                e.value = value;
            afterNodeAccess(e);
            return oldValue;
        }
    }
    ++modCount;
    // Expand the capacity when the capacity exceeds the upper limit
    if (++size > threshold)
        resize();
    afterNodeInsertion(evict);
    return null;
}
Copy the code

The comments are already very detailed. But the new method is more complicated, so draw a flow chart for your understanding:

1.2 Capacity Expansion (Resize Method)

The source code is as follows:

final Node < K, V > [] resize() {
    // Array before expansion
    Node < K, V > [] oldTab = table;
    // The size and threshold of the array before expansion
    int oldCap = (oldTab == null)?0 : oldTab.length;
    int oldThr = threshold;
    // Pre-define the size and threshold of the new array
    int newCap, newThr = 0;
    if (oldCap > 0) {
        // If the value exceeds the maximum value, the capacity will not be expanded
        if (oldCap >= MAXIMUM_CAPACITY) {
            threshold = Integer.MAX_VALUE;
            return oldTab;
        }
        // Expand the capacity to double the current capacity, but cannot exceed MAXIMUM_CAPACITY
        else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
            oldCap >= DEFAULT_INITIAL_CAPACITY)
            newThr = oldThr << 1; // double threshold
    }
    // There is no data in the current array, use the initialized value
    else if (oldThr > 0) // initial capacity was placed in threshold
        newCap = oldThr;
    else { // zero initial threshold signifies using defaults
        // If the initialization value is 0, the default initialization capacity is used
        newCap = DEFAULT_INITIAL_CAPACITY;
        newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
    }
    // If the new capacity is 0
    if (newThr == 0) {
        float ft = (float) newCap * loadFactor;
        newThr = (newCap < MAXIMUM_CAPACITY && ft < (float) MAXIMUM_CAPACITY ?
            (int) ft : Integer.MAX_VALUE);
    }
    threshold = newThr;
    @SuppressWarnings({ "rawtypes", "unchecked" })
    Node < K, V > [] newTab = (Node < K, V > []) new Node[newCap];
    // Start capacity expansion and assign the new capacity to the table
    table = newTab;
    // The original data is not empty. Copy the original data to the new table
    if(oldTab ! =null) {
        // Loop through the array based on capacity, copy non-empty elements into the new table
        for (int j = 0; j < oldCap; ++j) {
            Node < K, V > e;
            if((e = oldTab[j]) ! =null) {
                oldTab[j] = null;
                // If there is only one list, direct assignment is performed
                if (e.next == null)
                    newTab[e.hash & (newCap - 1)] = e;
                else if (e instanceof TreeNode)
                    // Red-black tree related operations
                    ((TreeNode < K, V > ) e).split(this, newTab, j, oldCap);
                else { // preserve order
                    // List replication, JDK 1.8 expansion optimization part
                    // If the node is not empty and is a single list, split the single list elements in the original array
                    Node < K, V > loHead = null, loTail = null;// Save the linked list in the original index
                    Node < K, V > hiHead = null, hiTail = null;// Save the linked list in the new index
                    Node < K, V > next;
                    do {
                        next = e.next;
                        // The hash value and the length of the original array are ampersand, 0 in the original array index position
                        if ((e.hash & oldCap) == 0) {
                            if (loTail == null)
                                loHead = e;
                            else
                                loTail.next = e;
                            loTail = e;
                        }
                        // Old index + oldCap
                        else {
                            if (hiTail == null)
                                hiHead = e;
                            elsehiTail.next = e; hiTail = e; }}while((e = next) ! =null);
                    // Put the original index into the hash bucket
                    if(loTail ! =null) {
                        loTail.next = null;
                        newTab[j] = loHead;
                    }
                    // Put the old index + oldCap in the hash bucket
                    if(hiTail ! =null) {
                        hiTail.next = null;
                        newTab[j + oldCap] = hiHead;
                    }
                }
            }
        }
    }
    return newTab;
}
Copy the code

It can be seen from the above source code that expansion is mainly divided into two steps:

  • Capacity expansion: Creates a new empty array of entries twice the length of the original array.
  • Bit operation: The original element hash and the original array length are ampersand.

Instead of recalculating the hash value of each element as in JDK 1.7, JDK 1.8 does a high-order operation (e.hash & oldCap) to determine if the element needs to be moved, assuming the following information for key1:

  • Key1. Hash = 10; Binary: 0000 1010
  • OldCap = 16; Binary: 0001 0000

The result obtained by using E. hash & oldCap, the higher bit is 0, when the result is 0, it means that the position of the element will not change during expansion, and the information of key 2 is assumed as follows:

  • Key2. Hash = 17; Binary: 0001 0001
  • OldCap = 16; Binary: 0001 0000

When the result is 1, it indicates that the position of the element has changed during the expansion. The new subscript position is equal to the original subscript position + the original array length, as shown in the figure below: key2 and kry4 are the moved positions.

02 What are the attributes of a HashMap?

As follows, if you look at the code comment, it’s very clear.

// HashMap initializes the length
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4// aka 16

// Maximum length of HashMap
static final int MAXIMUM_CAPACITY = 1 << 30/ / 1073741824

// Default load factor (expansion factor)
static final float DEFAULT_LOAD_FACTOR = 0.75 f;

// When the list length is greater than this value and the array length is greater than 64, the list is converted to a red-black tree
static final int TREEIFY_THRESHOLD = 8;

// Convert the critical value of the linked list. When the number of elements is smaller than this value, the red-black tree structure is converted to the linked list structure
static final int UNTREEIFY_THRESHOLD = 6;

// Minimum tree capacity
static final int MIN_TREEIFY_CAPACITY = 64;
Copy the code

03 Why is the initialization length of a HashMap 16?

As mentioned earlier, mapping from Key to position in the HashMap array uses a Hash function, such as index = Hash(” Dog “).

Note that the HashMap initializes the length with 1<<4 instead of writing 16 directly. Why is that? In fact, this is for the convenience of bit operation. Bit and operation is much more efficient than arithmetic calculation. The reason for choosing 16 is to serve the algorithm that maps Key to index.

So how do you implement a Hash function that is as evenly distributed as possible? And thereby reduce HashMap collisions? The HashCode value of the Key is used to perform bitwise operations.

HashCode (Key) & (length-1)

For example, if the decimal key is “book” is 3029737, then the binary is 101110001110101110 1001 and the HashMap length is 16, length-1 by default. Decimal: 15; Binary: 1111

101110001110101110 100&1111 = 1001; 1001 is decimal =9, so index=9.

The index result of the hash algorithm depends on the last few bits of the hashCode value

You can try to specify length 10 and other numbers that are not to the second power and do bitwise operations. The probability of finding the same thing with index is much higher. In the case of length 16 or any other power of 2, the value of length-1 is that all binary bits are 1. In this case, the result of index is the same as the value of the last few bits of hashCode. As long as the input Hashcode itself is evenly distributed, the result of the hash algorithm is uniform

Therefore, the default length of a HashMap is 16 to reduce the probability of hash collisions.

Why is it tree-like 8 and tree-like 6?

The average search length of red-black tree is log(n). When the length is 8, the search length is 3, while the average search length of linked list is N /2. That’s 8 divided by 2; Find the length of the linked list greater than the tree, into the tree, more efficient.

When is 6, tree: 2.6; Linked list: 3. Linked list > tree. It should still be tree-like, but tree-like takes time, and it’s not worth sacrificing time for efficiency.

What is the loading factor? Why is the loading factor 0.75?

The expansion mechanism was mentioned earlier. So when does that happen? This depends on the length of the array and the load factor.

If the load factor is 0.5 and the initial capacity of the HashMap is 16, then the HashMap will be expanded when there are 16*0.5=8 elements in the HashMap.

So why is the loading factor 0.75 and not 0.5 or 1.0? This is a trade-off between capacity and performance:

  • The capacity of a HashMap must be a power of 2 in order to improve capacity expansion efficiency. So, if the load factor is 3/4, then the product of capacity can be an integer

  • When the loading factor is set to a large value, the threshold of expansion is higher, the frequency of expansion is low, and the space occupied is small. However, the probability of Hash conflict is increased. Therefore, more complex data structures are required to store elements, which increases the operation time of elements and reduces the operation efficiency.

  • When the loading factor is set to a small value, the threshold for expansion is lowered and more space will be occupied. In this case, the storage of elements is sparse and the possibility of hash conflict is low, so the operation performance is high.

Therefore, an average of 0.75 from 0.5 to 1.0 was taken as the loading factor.

Is HashMap thread safe?

No, because the get and put methods are unlocked. There is no guarantee that the value of put at one moment will be the same value of GET a moment later, causing thread-safety problems.

Another HashTable is thread-safe, but the granularity of locking is too large. The concurrency is very low. At most, one thread is allowed to access at the same time, and the performance is not high. Normally we use currentHashMap, which we’ll talk about later, of course.

07 Why do we need to override hashCode when overriding equals?

In Java, all objects inherit from the Object class. The Ojbect class has two methods: equals and hashCode, which are used to compare whether two objects are equal.

Let’s start with equals:

public boolean equals(Object obj) {
    return (this == obj);
}
Copy the code

Without overriding equals, it’s actually equal ==. It has the following two characteristics:

  • For value objects, == compares the values of two objects
  • For reference objects, == compares the addresses of two objects

A HashMap uses the hashcode of the key to find the address index. If the index is the same, it will form a linked list, which means that “dog” and “dog” are likely to be in the same position.

The previous get method says that when hashes clash, we need not only to determine the hash value, but also to determine whether the key value is equal to determine whether the element is the one we want. If the hash value is the same as the hash value, how do you know which object you want? That’s right, it’s using equals, so if YOU override hashCode instead of equals, when you have a hash conflict, you don’t know which object to get the same hashCode.

08 HashMap Infinite Loop Analysis

The following code is based on JDK1.7 analysis. This problem is mainly caused by the end of the linked list in JDK1.7. Assuming that the default size of the HashMap is 2, there are no elements in the original HashMap. Add elements key1, key2, key3 using three threads: T1, T2, and t3. I set a breakpoint before capacity expansion, and let all three threads stop there. The source code is as follows:

void transfer(Entry[] newTable, boolean rehash) {
    int newCapacity = newTable.length;
    for (Entry < K, V > e: table) {
        while (null! = e) { Entry < K, V > next = e.next;// Add a breakpoint here
            if (rehash) {
                e.hash = null == e.key ? 0 : hash(e.key);
            }
            inti = indexFor(e.hash, newCapacity); e.next = newTable[i]; newTable[i] = e; e = next; }}}Copy the code

Suppose three elements hash together on the same linked list. Key1 →key2→key3 Nothing’s wrong. Everything’s fine.

Release the breakpoint and expand the HashMap. It could be key1→key2→key3. Unfortunately, after the expansion, key1 and key2 are still in the same position, which forms a linked list. If key2 is inserted after key1, use the head insertion method. So it’s going to be key2→key1

When all three threads have adjusted, an Infinite Loop will appear as shown below: Get (key1) will raise an Infinite Loop exception.

Of course, the reason for the dead-loop is that JDK 1.7 inserts the list in reverse order at the beginning, which changes the order between the list nodes during expansion. This problem has been improved in JDK 1.8 to include tail-forward insertion, which preserves the original order of the list elements during expansion so that the list does not become looped.

09 summary

HashMap is the focus of the Java foundation. It can be said that whether it is in the job or in the interview are very commonly used, small partners must do skilled use, easy to be considered a pass. This article basically covers all the important points of HashMap, but red-black trees are not covered, mainly for reasons of space, and will be discussed separately later. In addition, if you find any mistakes in this article, please kindly correct them.

Well, that’s doggie’s interpretation of the HashMap source code. Thanks to all the tech community leaders for their efforts, especially geek time, which is awesome. If I have seen further, it is by standing on your shoulders. Hope you found this article helpful, and we’ll see you in the next article

10 The shoulders of giants

  • Kaiwu.lagou.com/course/cour…

Send some interview questions & e-books

If you read here and like this article, please click on it.

I don’t know what to give you. Just a few hundred ebooks and the latest 2021 interview materials. Wechat search JavaFish reply ebook send you 1000+ programming ebook; Reply to the interview and send some interview questions; Reply 1024 send you a complete set of Java video tutorials.

The interview questions have answers, as follows: If you need them, come and take them, absolutely free, no tricks.