HashMap (Java8)

preface

As a Java developer, HashMap can be described as a sharp tool in the business, 9 dragons again picked up this cliche knowledge point, in-depth source, savor.

First, let’s throw out a few questions about hashMaps, and learn with them as fun as playing hide-and-seek.

1. Why use HashMap? What are the features of HashMap?

2. What are the main parameters of HashMap? What does it do?

3. What data structure is HashMap based on?

4. What happens to the initial capacity passed in when constructing the HashMap? Why do you do that?

5. When will HashMap be expanded? What did you do with the expansion? Does 8 hash collisions necessarily convert to a red-black tree?

6. What happens when you add or delete a hashMap on foreach?

1. Why use HashMap?

When we use a tool, it must be because some of its features meet our needs and can solve our problems quickly and accurately. So why do we use HashMap?

This implementation provides constant-time performance for the basic operations (get and put), assuming the hash function disperses the elements properly among the buckets.

There is a sentence in the source code comment, which is why we use HashMap.

The HashMap provides constant time performance (O(1)) for the basic operations (GET and PUT), assuming that the hash function distributes the elements appropriately among the buckets.

If you need to store and query values quickly, you can use HashMap, which can be done in O(1) time. If your key’s hashCode is different enough.

Another feature of Map is that keys cannot be duplicated. Let’s take a look at how a HashMap ensures that O(1) gets and puts.

2. Chew through the main parameters of HashMap

2.1. Static Constants

    // The default initialization bucket capacity must be a power of 2.

    static final int DEFAULT_INITIAL_CAPACITY = 1 << 4;

    // Maximum bucket capacity

    static final int MAXIMUM_CAPACITY = 1 << 30;

    // Default load factor

    static final float DEFAULT_LOAD_FACTOR = 0.75 f;

    // Determine whether to convert the list to a tree threshold

    static final int TREEIFY_THRESHOLD = 8;

    // Determine whether to convert the tree to a linked list threshold

    static final int UNTREEIFY_THRESHOLD = 6;

    // Determine whether the list can be converted to a tree. If the current bucket capacity is smaller than this value, resize(). Avoid table capacity is too small, which is easy to generate hash collisions.

    static final int MIN_TREEIFY_CAPACITY = 64;

Copy the code

2.2, field,

    / / the hash table

    transient Node<K,V>[] table;

    // Cached EntrySet is used with iteration

    transient Set<Map.Entry<K,V>> entrySet;

    // Records the number of key-value pairs in the HashMap

    transient int size;

    // When a structural change is made to the hashMap, it increments by 1. Structural changes are operations that add or delete Hash tables.

    transient int modCount;

    // Determine whether to expand the capacity threshold. threshold = capacity * load factor

    int threshold;

    // Load factor, used to calculate threshold, can be specified in the constructor.

    final float loadFactor;

Copy the code

3. Sniff the HashMap data structure

Above we see a Node array for Node[] table.

Why use arrays?

Answer: For quick access to elements. Oh, what the hell, then I have to ask, why arrays can access elements quickly?

The array finds the address of the KTH element by simply matching [first address + element size *k].
The CPU cache reads contiguous memory space, and since arrays are contiguous memory addresses, all or part of an array is contiguous in the CPU cache.

Let’s look at the structure of Node.

 static class Node<K.V> implements Map.Entry<K.V> {

        final int hash;    / / key hash

        final K key;    / / the key object

        V value;        / / value object

        Node<K,V> next;    // Next node of the link



        Node(int hash, K key, V value, Node<K,V> next) {

            this.hash = hash;

            this.key = key;

            this.value = value;

            this.next = next;

        }

 }

Copy the code

The Node contains a reference to the next Node.

At this point, we know that the underlying data structure of HashMap is based on arrays + linked lists. But is that it? In jdk1.7 this is true. In order to improve the efficiency of linked list queries on hash collisions, jdk1.8 will convert linked lists to red-black trees after 8 hash collisions, increasing the time complexity of linked list queries from O(N) to O(logN).

If a HashMap is able to evenly place nodes into a table array, we can find the desired value quickly as long as we somehow know the index of the array where the Node with the specified key resides.

And then we’re going to look at how to locate the table array.

4. Go to the HashMap constructor

With the above basic knowledge, know the field meaning and data structure, we have a little confidence can formally enter the source code to read. I think the best way to understand a class is to start with the constructor, know what initialization is done to construct the object, and then drill down into the usual methods.

    public HashMap(int initialCapacity) {

        // If only the initial value is passed, the default load factor of 0.75 is used

        this(initialCapacity, DEFAULT_LOAD_FACTOR);

    }



   public HashMap(int initialCapacity, float loadFactor) {

        if (initialCapacity < 0)

            throw new IllegalArgumentException("Illegal initial capacity: " +

                                               initialCapacity);

       // Ensure that the initial capacity is 2^30

        if (initialCapacity > MAXIMUM_CAPACITY)

            initialCapacity = MAXIMUM_CAPACITY;

        if (loadFactor <= 0 || Float.isNaN(loadFactor))

            throw new IllegalArgumentException("Illegal load factor: " +

                                               loadFactor);

       // Use the specified value to initialize the load factor and determine whether to expand the threshold.

        this.loadFactor = loadFactor;

        this.threshold = tableSizeFor(initialCapacity);

    }

Copy the code

As you can see, the main purpose of the constructor is to initialize the load factor and the hash table capacity. And you might be wondering, isn’t this initializing threshold? Don’t be fooled, this is just temporarily storing the hash table capacity on threshold. I think the HashMap does not want to add extra fields to store the hash table capacity, because the array length can be represented, but the array has not been initialized for the time being, so the capacity is temporarily stored in threshold.

We see that the user-specified initialCapacity passed into tableSizeFor method returns a value that is the actual initialized capacity. ????? What the hell is this? But let’s unveil its mystery.

/ * *

 * Returns a power of two size for the given target capacity.

* /

static final int tableSizeFor(int cap) {

        int n = cap - 1;

        n |= n >>> 1;

        n |= n >>> 2;

        n |= n >>> 4;

        n |= n >>> 8;

        n |= n >>> 16;

        return (n < 0)?1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;

    }

Copy the code

Okay, let’s just cover it up. 9 Dragons didn’t figure it out. Return the first number greater than or equal to the target value, which must be a power of 2.

For example:

If you put in 10, the first number greater than or equal to 10, again, to the power of 2, is 16;

If you put in 7, the first number greater than or equal to 7, again to the power of 2, is 8;

If you enter 20; The first number greater than or equal to 20, again to the power of 2, is 32;

At this point we have to ask ourselves, why does the hash table have to be a power of two?

5. Main methods of dissecting HashMap

5.1, the put

Whenever we new a HashMa object, we call the PUT method to add key-value pairs. Can I be the same as those who post code directly? What’s the difference, hahaha. 9 dragon will read the source code first, and then paste flow chart, so we will understand a little more.

public V put(K key, V value) {

    return putVal(hash(key), key, value, false.true);

}



static final int hash(Object key) {

        int h;

    // The high 16 bits of the key and the low 16 bits of the key are xor to reduce the probability of hash collisions

        return (key == null)?0 : (h = key.hashCode()) ^ (h >>> 16);

    }

Copy the code

Let’s see what putVal did.

    / * *

* This method is used to store (k,v) key-value pairs into a HashMap

     *

     * @paramThe hash key hash

     * @paramKey the key object

     * @paramValue Key Value object

     * @paramOnlyIfAbsent If true, the original value is not overwritten.

     * @param evict if false, the table is in creation mode.

     * @returnReturns the old value, or null if none is present.

* /

    final V putVal(int hash, K key, V value, boolean onlyIfAbsent,

                   boolean evict) {

        Node<K,V>[] tab; Node<K,V> p; int n, i;

        // When a HashMap object is constructed, the load factor and capacity are initialized, but the hash table is not initialized. This is where the first initialization takes place.

        if ((tab = table) == null || (n = tab.length) == 0)

            n = (tab = resize()).length;

        // If you get a hash value, how can the hash values be evenly distributed in the table array if they are rarely the same? The easiest thing to think of is hash%n, where n is the length of the table array. But the % operation is slow, and we know that the bit operation is the fastest, and the computer recognizes binary. So if n is guaranteed to be a power of 2, hash%n is the same as hash&(n-1). That's why the initial capacity is to the power of two.

        // When no hash bucket is found, build a Node to insert it

        if ((p = tab[i = (n - 1) & hash]) == null)

            tab[i] = newNode(hash, key, value, null);

        else {

            // Otherwise, a hash collision is generated.

            Node<K,V> e; K k;

            // Check whether the hash is the same as that of the node in the bucket and the equals method of the key is true, indicating duplicate keys, then record the current node

            if (p.hash == hash &&

((k = p.key) == key || (key ! =null && key.equals(k))))

                e = p;

            // If the bucket node is a tree node, place it in the tree and return the old value

            else if (p instanceof TreeNode)

                e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);

            else {

                // Indicates a linked list and has not yet been converted to a red-black tree.

                for (int binCount = 0; ; ++binCount) {

                    // If the next index of a node is null, there is no node behind it

                    if ((e = p.next) == null) {

                        p.next = newNode(hash, key, value, null);

                        // The length of the list is 9, i.e. 8 hash collisions will transform the list into a red-black tree

                        if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st

                            treeifyBin(tab, hash);

                        break;

                    }

                    // If the key is the same key, the loop is broken

                    if (e.hash == hash &&

((k = e.key) == key || (key ! =null && key.equals(k))))

                        break;

                    p = e;

                }

            }

            // Check if the key is duplicate

            if(e ! =null) { // existing mapping for key

                // Get the old value

                V oldValue = e.value;

                // Because onlyIfAbsent is false by default, the new value overwrites the old value by default

                if(! onlyIfAbsent || oldValue ==null)

                    e.value = value;

                afterNodeAccess(e);

                // Return the old value

                return oldValue;

            }

        }

        // To insert new data into the Hash table, increment modCount

        ++modCount;

        // Check whether the capacity of the current key pair meets expansion requirements. If yes, expand the capacity

        if (++size > threshold)

            resize();

        afterNodeInsertion(evict);

        return null;

    }

Copy the code

To sum up:

The put method first computes the hash value of the key.
If the hash table is not initialized, initialize it.
Then calculate where the hash should be in the hash bucket;
If there is no value at that location, insert directly;
If yes, check whether it is a tree node. If yes, insert it into the red-black tree.
Otherwise, it is a linked list, and the tail insertion method is used for insertion. After insertion, it is determined whether hash collisions meet 8 times. If so, the linked list is transformed into a red-black tree.
After insertion, check whether the key is the same. If the key is the same, use the new value to overwrite the old value.
++modCount indicates that a new key-value pair has been inserted; Then determine whether to expand the capacity.

Soul Torture: Does a true hash collision 8 times always convert to a red black tree?

That’s not true. In PUT, this method is called to convert the list to a red-black tree if the hash hits eight times, but it doesn’t necessarily translate. Tab.length must be greater than or equal to 64 to perform the conversion operation. Hash collisions are obvious when the table size is too small, but larger tables are not better.

 final void treeifyBin(Node<K,V>[] tab, int hash) {

        int n, index; Node<K,V> e;

     If the length of the table is less than 64, expand the table first

        if (tab == null || (n = tab.length) < MIN_TREEIFY_CAPACITY)

            resize();

        else if ((e = tab[index = (n - 1) & hash]) ! =null) {

            // Only 64 or greater will be converted

            TreeNode<K,V> hd = null, tl = null;

            do {

                TreeNode<K,V> p = replacementTreeNode(e, null);

                if (tl == null)

                    hd = p;

                else {

                    p.prev = tl;

                    tl.next = p;

                }

                tl = p;

            } while((e = e.next) ! =null);

            if((tab[index] = hd) ! =null)

                hd.treeify(tab);

        }

    }

Copy the code

5.2, the resize ()

The put method uses the resize() method twice. Now let’s examine the implementation logic of resize().

final Node<K,V>[] resize() {

        Node<K,V>[] oldTab = table;

        int oldCap = (oldTab == null)?0 : oldTab.length;

        int oldThr = threshold;

        int newCap, newThr = 0;

        // If the old table has data

        if (oldCap > 0) {

            // When the table length reaches the defined maximum value, the capacity expansion is not performed, but the capacity expansion threshold is changed to integer.max_value.

            if (oldCap >= MAXIMUM_CAPACITY) {

                threshold = Integer.MAX_VALUE;

                return oldTab;

            }

            If the result is less than MAXIMUM_CAPACITY and the old capacity is greater than or equal to the default value 16, the new threshold is also doubled

            else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&

                     oldCap >= DEFAULT_INITIAL_CAPACITY)

                newThr = oldThr << 1; // double threshold

        }

    //oldCap equals 0 If the old threshold is greater than 0, the old threshold is assigned to the new capacity. This step corresponds to the specified capacity constructor, which assigns a threshold value when specifying the capacity

        else if (oldThr > 0) // initial capacity was placed in threshold

            newCap = oldThr;

    // This step corresponds to the no-argument constructor, in which case the default value is used

        else {               // zero initial threshold signifies using defaults

            newCap = DEFAULT_INITIAL_CAPACITY;

            newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);

        }

    // Because oldCap is greater than 0 but not greater than the default 16, newThr is still 0. In this case, newThr needs to be calculated based on the value of newCap.

        if (newThr == 0) {

            float ft = (float)newCap * loadFactor;

            newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?

                      (int)ft : Integer.MAX_VALUE);

        }

    // Overwrite threshold with the new threshold

        threshold = newThr;

        @SuppressWarnings({"rawtypes"."unchecked"})

    // Use newCap to initialize the new table. Here newCap is twice as large as oldCap

        Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];

        table = newTab;

    // At this point, the new table capacity is calculated, the new threshold is calculated, and the new table is created. Let's start moving the old table data to the new table

        if(oldTab ! =null) {

            // Migrate from table to table

            for (int j = 0; j < oldCap; ++j) {

                Node<K,V> e;

                // If the subscript j has a value, get the reference and assign it to e

                if((e = oldTab[j]) ! =null) {

                    // Since we already have a reference to e, we can assign null to the original array, help GC

                    oldTab[j] = null;

                    // If e.ext does not point to a node in the current slot, then there is only one node in the current slot

                    if (e.next == null)

                        newTab[e.hash & (newCap - 1)] = e;

                    // Verify that there is more than one node in the current slot and check whether e is a TreeNode. If so, use the tree migration method

                    else if (e instanceof TreeNode)

                        ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);

                    else { // preserve order

                        // Because the expanded node is not at j, it is at j + oldCap.

                        // the loHead node records the head pointer of the list at j, and the loTail pointer records the tail pointer at j

                        //hiHead records the head pointer of the list at j+oldCap, hiTail records the tail pointer at j+oldCap

                        Node<K,V> loHead = null, loTail = null;

                        Node<K,V> hiHead = null, hiTail = null;

                        Node<K,V> next;

                        do {

                            next = e.next;

                            // Check whether it is still at j (more on that later)

                            if ((e.hash & oldCap) == 0) {

                                if (loTail == null)

                                    // record the header pointer of j

                                    loHead = e;

                                else

                                    // Link nodes

                                    loTail.next = e;

                                loTail = e;

                            }

                            // Otherwise at [j+oldCap]

                            else {

                                if (hiTail == null)

                                    // Record the header pointer to j+oldCap

                                    hiHead = e;

                                else

                                    // Link nodes

                                    hiTail.next = e;

                                hiTail = e;

                            }

                        } while((e = next) ! =null);

                        if(loTail ! =null) {

                            loTail.next = null;

                            // Place the list in the same position at j

                            newTab[j] = loHead;

                        }

                        if(hiTail ! =null) {

                            hiTail.next = null;

                            // Place the changed list at [j+oldCap]

                            newTab[j + oldCap] = hiHead;

                        }

                    }

                }

            }

        }

    // return the new list

        return newTab;

    }

Copy the code

Now let’s take a closer look at E. Hash & oldCap. Without further ado, go straight to the picture above.

It’s a little too detailed not to like it.

Resize () calls ((TreeNode)e).split(this, newTab, j, oldCap). So with that knowledge, it actually does the same thing. Split the red-black tree into two subtrees, again at the original position and the original +oldCap position. Note, however, that this method converts the red-black tree back to the linked list if the tree node is less than or equal to 6.

final void split(HashMap<K,V> map, Node<K,V>[] tab, int index, int bit) {

            TreeNode<K,V> b = this;

            // Relink into lo and hi lists, preserving order

            TreeNode<K,V> loHead = null, loTail = null;

            TreeNode<K,V> hiHead = null, hiTail = null;

            int lc = 0, hc = 0;

            for(TreeNode<K,V> e = b, next; e ! =null; e = next) {

                next = (TreeNode<K,V>)e.next;

                e.next = null;

                // Determine whether the position has changed

                if ((e.hash & bit) == 0) {

                    if ((e.prev = loTail) == null)

                        loHead = e;

                    else

                        loTail.next = e;

                    loTail = e;

                    ++lc;

                }

                else {

                    if ((e.prev = hiTail) == null)

                        hiHead = e;

                    else

                        hiTail.next = e;

                    hiTail = e;

                    ++hc;

                }

            }



            if(loHead ! =null) {

                If the number is less than or equal to 6, convert back to the list

                if (lc <= UNTREEIFY_THRESHOLD)

                    tab[index] = loHead.untreeify(map);

                else {

                    tab[index] = loHead;

                    if(hiHead ! =null) // (else is already treeified)

                        loHead.treeify(tab);

                }

            }

            if(hiHead ! =null) {

                if (hc <= UNTREEIFY_THRESHOLD)

                    tab[index + bit] = hiHead.untreeify(map);

                else {

                    tab[index + bit] = hiHead;

                    if(loHead ! =null)

                        hiHead.treeify(tab);

                }

            }

        }

Copy the code

Here, resize() method 9 dragon nibble finished, good tooth ache ah.

5.2, the get

Given the data structure of HashMap and how to store and manage key values against PUT in constant time, isn’t it easy to get? Please have a taste of this dish. We save key-value pairs and store them with key as the condition, so we also get the value by key when we value.

    public V get(Object key) {

        Node<K,V> e;

        // Computes the hash of the key to locate the bucket

        return (e = getNode(hash(key), key)) == null ? null : e.value;

    }

     final Node<K,V> getNode(int hash, Object key) {

        Node<K,V>[] tab; Node<K,V> first, e; int n; K k;

         // If the hash bucket has a value, and the bucket position based on the hash continues also has a value

        if((tab = table) ! =null && (n = tab.length) > 0 &&

            (first = tab[(n - 1) & hash]) ! =null) {

            // Check whether the first node matches, if found, return

            if (first.hash == hash && // always check first node

((k = first.key) == key || (key ! =null && key.equals(k))))

                return first;

            // If the first one does not match, check whether next exists

            if((e = first.next) ! =null) {

                // If yes, check whether the bucket node is a tree node. If yes, check whether the bucket node is a tree node

                if (first instanceof TreeNode)

                    return ((TreeNode<K,V>)first).getTreeNode(hash, key);

                do {

                    // If it is not a tree node, check the match from the head of the list to the end of the list

                    if (e.hash == hash &&

((k = e.key) == key || (key ! =null && key.equals(k))))

                        // Return if found

                        return e;

                } while((e = e.next) ! =null);

            }

        }

         // If not found, return null

        return null;

    }

Copy the code

To summarize the GET process:

Computes the hash value for key
Hash &(n-1) is used to determine whether the hash bucket bit has a value, or null if there is no value
If there is a value, determine whether the first one matches. (Matching means that the hash value is the same and equals returns true)
If the first one does not match, check whether it is a tree node. If it is a tree node, search the red black tree
If it is not a tree node, it is a linked list, which is searched from the top of the table to the bottom of the table.

6. Brief modCount

This field is not unique to maps. Collections (List, Set) also have this field. This field is used for rapid failure of iteration, which is in the process of iteration, if call put the trap, the clear, remove the container internal data increase or decrease the number of operations, throw ConcurrentModificationException.

A HashMap has three iterators: KeyIterator, ValueIterator, and EntryIterator. These iterators correspond to the inner classes of KeySet, Values, and EntrySet. Each time a user calls its corresponding iterator() method, a corresponding iterator is new.

Here I will not paste the code, too much, interested can go to have a look. Here’s why you fail fast.

final class KeyIterator extends HashIterator

        implements Iterator<K> {

        public final K next(a) { return nextNode().key; }

    }



    final class ValueIterator extends HashIterator

        implements Iterator<V> {

        public final V next(a) { return nextNode().value; }

    }



    final class EntryIterator extends HashIterator

        implements Iterator<Map.Entry<K.V>> {

        public final Map.Entry<K,V> next(a) { return nextNode(); }

    }

Copy the code

Consumers can choose which iterators to use according to their own needs. Each one inherits from HashIterator, so let’s take a look.

 abstract class HashIterator {

        Node<K,V> next;        // next entry to return

        Node<K,V> current;     // current entry

        int expectedModCount;  // for fast-fail

        int index;             // current slot



        HashIterator() {

            // The key here is that every time the iterator is used, the modCount is assigned to the expectedModCount inner class

            expectedModCount = modCount;

            Node<K,V>[] t = table;

            current = next = null;

            index = 0;

            if(t ! =null && size > 0) { // advance to first entry

                do {} while (index < t.length && (next = t[index++]) == null);

            }

        }



        public final boolean hasNext(a) {

            returnnext ! =null;

        }



        final Node<K,V> nextNode(a) {

            Node<K,V>[] t;

            Node<K,V> e = next;

            If modCount and expectedModCount are equal, it indicates that other threads or the current thread called the PUT and remove methods during iteration.

            if(modCount ! = expectedModCount)

                throw new ConcurrentModificationException();

            if (e == null)

                throw new NoSuchElementException();

            if ((next = (current = e).next) == null&& (t = table) ! =null) {

                do {} while (index < t.length && (next = t[index++]) == null);

            }

            return e;

        }



         // The iterator's own remove method can only be called if it wants to remove the node, but it removes the node from the nextNode() call

        public final void remove(a) {

            Node<K,V> p = current;

            if (p == null)

                throw new IllegalStateException();

            // Whether modCount has been modified is also checked before deletion

            if(modCount ! = expectedModCount)

                throw new ConcurrentModificationException();

            current = null;

            K key = p.key;

            removeNode(hash(key), key, null.false.false);

            expectedModCount = modCount;

        }

    }

Copy the code

So, in the process of iteration to add or delete operation will throw ConcurrentModificationException HashMap. Remember the question from the beginning? Yes, that’s it. You can look at the source code for List etc. ModCount also exists and the implementation is the same.

7,

We spent a lot of time and energy on HashMap, and now we all know the answer to the first question, including some questions raised in the process. – 9 dragons did not discuss concurrency conditions, nor did they discuss 1.7 concurrent expansion linked list infinite loop, too many online. More importantly, HashMap itself does not support concurrent operations, so what do you have in mind?

If there is any mistake in the article, please point out, also welcome everyone to have questions can put forward, discuss progress together.

If you think this article is helpful to you, please help to like, share to show support, if reprinted please indicate the source. Don’t say much, pay attention, don’t get lost.