HashMap from awareness to source analysis

Map

A Map is a data structure that is used frequently during development. A Map is an abstract interface for key-value pair mapping, which does not include duplicate keys, i.e. one Key corresponds to one value. HashMap, HashTable, and ConcurrentHashMap are all important members of the Java Collection Framework. The Map interface provides three collection views that allow you to view the contents of a mapping in the form of a keySet(keySet()), a value set (values()), or a key-value mapping set (entrySet).

The HasH table

We know that the storage mode of array is to allocate a fixed continuous space in memory, which has fast addressing speed (fast query speed) and time complexity of O(1). However, when inserting and deleting elements, it is necessary to move the elements of the array, so the speed of insertion and deletion is slow and the time complexity is O(n). The storage mode of linked list is discontinuous in memory. Each element stores the memory address of the next element and finds the next element through this address. Therefore, linked list is slow in query and time complexity O(n), and fast in insert and delete and time complexity O(1). If we want a data structure that is fast to query as well as fast to insert and delete, what should we do? Then the Hash table is generated, and the Hash function is used to calculate the location of the key in the Hash table, which is called the Hash address, and then the value is stored in the Hash address, and then the value can be directly manipulated by the key. The query, insert, and delete operations take O(1). Since it is the key that calculates the storage position through the hash function, the quality of the hash function directly affects the operation efficiency of the hash table, such as waste of storage space, a large number of conflicts (that is, different keys calculate the same storage position).

Hash function can map the input of arbitrary length to the output of fixed length, that is, hash address hash conflict is inevitable, commonly used hash conflict solution has the following two methods.

Chain address method (zip method)

Using the method of combining array and linked list, a linear table is established for each hash address in the hash table, and the data with the same hash address is stored in the linear tableThe head pointer to the list is stored in an arrayHash addresses, keys, values and other information are generally stored in linked list nodes. Generally, the hash address is used to calculate the subscript of the array, and the hash value is saved in the same array with the same subscript. The zipper method is suitable for frequent insertion and deletion operations.
Open addressing method Open addressing method is also known as linear detection method, the basic idea is: hash table T[0…m-1] as a circular vector, if the initial detection address is D, the longest detection path is: D, D + I, D +2i,… , m – 1. If T[d] has a hash conflict, the next T[d+1] will be detected. I is a custom constant until T[m-1] is detected. Open addressing is prone to clustering, which means that the data in the hash table are joined together, and when new elements are added, hash conflicts can easily occur.
Zipper method and open addressing comparison zipper method: simple conflict handling, no clustering phenomenon, at the same time, linked list insertion and deletion operations are simple, so zipper method is suitable for frequent insertion and deletion operations. Open addressing: In order to reduce collisions, ** load factor (loading factor)** is required to be small, which can waste a lot of space when the node size is large. In the open addressing method, when deleting a node, the space where the node is cannot be simply empty, otherwise, the search path of the node after it will be truncated. This is because in various open addressing methods, empty address unit is the condition of search failure. Therefore, logical deletion is required when deleting a node, that is, a deletion mark is marked on the node to be deleted.

Load factor = Number of elements in the hash table/array length of the hash table

HashMap

The data structure

HashMap uses the above zip method to resolve hash conflicts. A HashMap is non-thread-safe, allows for null keys and values, does not guarantee ordering (such as insertion order), and does not guarantee ordering over time (data is migrated after the hash table is doubled). Let’s create a HashMap and run it

HashMap<String, Integer> map = new HashMap();
map.put("Chinese", 1);
map.put("Mathematics", 2);
map.put("English", 3);
map.put("History", 4);
map.put("Political", 5);
map.put("Geography", 6);
map.put("Creatures", 7);
map.put("Chemistry"And 8);Copy the code

The data structure

table
size
threshold
loadFactor
modCount

Table: YesEntry[]Array type, andEntryIt’s actually aSingly linked listHash table key-value pairs are stored inEntryIn the array, eachEntryIt corresponds to a hash address, right hereEntryIt is often called a barrel
Size: is the size of the HashMap, the number of key-value pairs saved
DEFAULT_INITIAL_CAPACITY: The default capacity of the HashMap (the size of the array) is 16
MAXIMUM_CAPACITY: The maximum capacity of the HashMap (30 of 2). If the passed capacity is greater than this value, it is replaced by the maximum capacity
Threshold: indicates the threshold of the HashMap, which is used to determine whether to adjust the capacity of the HashMap. Threshold = capacity x load factor. When the number of key-value pairs stored in the HashMap reaches threshold, the HashMap doubles the capacity
LoadFactor: indicates the loadFactor
ModCount: Used to implement fail-fast

Quick failure mechanism: For thread unsafe (note is thread safe collection have this mechanism) a collection of objects of iterators, if in the process of using iterators have other threads to modify the structure of the collection objects or the number of elements, so iterative end immediately, will throw ConcurrentModificationException iterator.

The constructor

HashMap has four constructors, as follows:

// The load factor is 0.75 by default, and the size of the HashMap (array size) is 16 publicHashMap() { this.loadFactor = DEFAULT_LOAD_FACTOR; Public HashMap(int initialCapacity) {// All other fields defaulted} // Specify the size of the HashMap constructor load factor is default 0.75 public HashMap(int initialCapacity) { this(initialCapacity, DEFAULT_LOAD_FACTOR); Public HashMap(int initialCapacity, int initialCapacity,float loadFactor) {
    if (initialCapacity < 0)
        throw new IllegalArgumentException("Illegal initial capacity: " +
                                           initialCapacity);
    if (initialCapacity > MAXIMUM_CAPACITY)
        initialCapacity = MAXIMUM_CAPACITY;
    if (loadFactor <= 0 || Float.isNaN(loadFactor))
        throw new IllegalArgumentException("Illegal load factor: "+ loadFactor); this.loadFactor = loadFactor; this.threshold = tableSizeFor(initialCapacity); } // Contains the constructor for the submap with the default load factor of 0.75 public HashMap(Map<? extends K, ? extends V> m) { this.loadFactor = DEFAULT_LOAD_FACTOR; putMapEntries(m,false);
}
Copy the code

Why is the load factor 0.75 by default? According to the official explanation, when the load factor is 0.75, the length of the Entry list is almost impossible to exceed 8(the probability of reaching 8 is 0.00000006). The purpose is to keep the length of the Entry list as small as possible and make the query efficiency of HashMap as high as possible.

Since the HashMap doubles in size (size) when it is larger than its initial capacity (capacity), we can specify the capacity (array size) when we initialize the HashMap when we know the size of our data, since most of the time the HashMap does not need to grow that much. Note that the specified capacity must be a power of 2. Even if the capacity we pass in is not a power of 2, the source code will convert the capacity to a power of 2. For example, if we pass in 5, the final capacity is 8.

Why does the capacity have to be a power of two? Since HashMap is an array + single-linked list structure, we want the storage of elements to be more uniform. Ideally, only one element is stored in each Entry, which is the most efficient way to query. How can it be evenly stored? The first thing that came to our mind was the modulo hash address % size. The masters at SUN had the same idea as ours, but they used bitwise operations to do it (bitwise operations are efficient). In order for bitwise operations to be the same as modulo operations, Hash & (capacity-1) == hash % capacity, the capacity must be a power of 2.

Put method

Before JDK1.8, insert hashMap at the head of the list. This article analyzes JDK1.8 source code, insert hashMap at the end of the list.

According to the keyhashCode()Calculates the value of the current key pairHash addressIs used to locate the subscript of the key-value pair stored in the HashMap array
judgetableWhether to initialize or notresize()fortableInitialize the capacity and the value of threshold
Hash (I = (n-1) &hash)** to find the corresponding keytableArray index, if the corresponding array index location has no valuenewNode(hash, key, value, null)Method to create a node for the key-value pair.

Here’s a question: When the table array length changes, does it get the wrong value? The analysis follows. I = (n-1) &hash (I = (n-1) &hash) hash (I = (n-1) &hash) Because hash addresses may exceed the size of the array, and in order to distribute key-value pairs more evenly among buckets, and because capacity changes, the hash addresses of nodes in buckets are not the same, and the same hash addresses may be assigned different subscripts.

If the hash address is used to calculate the corresponding value of the keytableThe array index has nodes, and the key of the nodekeyAnd the incoming keykeyEqual, the hash address is equal to the hash address passed in,Then the corresponding node reference is assigned to e.
If the hash address is used to calculate the corresponding value of the keytableThe array index has nodes, and the hash address of the node is the same as the hash address passed in, but the key of the nodekeyAnd the incoming keykeyIf the key of the node is found in the process of traversing the listkeyAnd the incoming keykeyEqual, the hash address and the hash address passed in are also equal, then the correspondingvalueUpdate value. Otherwise the callnewNode(hash, key, value, null)Method to create a node for the key-value pair to add to the linked listThe tail, if the length of the linked list after adding nodes >= 8, it becomes a red-black tree
If e is not empty andonlyIfAbsentfortrueIt doesn’t cover the samekeyAnd the same hash addressvalue.

public V put(K key, V value) {
    return putVal(hash(key), key, value, false.true); } // If onlyIfAbsent is yestrue, then value of the same key will not be overwritten. If is evictfalse. Final V putVal(int) is called at initializationhash, K key, V value, Boolean onlyIfAbsent, Boolean evict) {// TAB <K,V>[] TAB; Node<K,V> p; int n, i; // If the current hash table is empty, it is initializedif((TAB = table) = = null | | (n = TAB. Length) = = 0) / / expansion and then go directly to a hash table, and the expansion after the hash bucket assigned to n n = length (TAB = the resize ()). The length; // If the current index node is empty, no hash collision has occurred. Create a new Node, Node, and mount it at index. // The index index of the array uses the hash address & the length of the hash bucket -1 instead of the modulus operationif ((p = tab[i = (n - 1) & hash]) == null)
        tab[i] = newNode(hash, key, value, null);
    else{// Otherwise a hash conflict occurs. //e Node<K,V> e; K k; // If the hash values are equal and the key is equal, the value operation is overriddenif (p.hash == hash&& ((k = p.key) == key || (key ! = null && key.equals(k)))) e = p; // Assign the current node reference to eelse ifE = ((TreeNode<K,V>)p).puttreeval (this, TAB,hash, key, value);
        else{// Instead of overwriting, insert a normal list node // traverse the listfor (int binCount = 0; ; ++binCount) {
                if((e = p.ext) == null) {// go to the end and append a newNode to the end.hash, key, value, null); // If the number of linked lists =8 after the node is appended, the tree is converted to a red-black treeif (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                        treeifyBin(tab, hash);
                    break; } // If the node to override is foundif (e.hash == hash&& ((k = e.key) == key || (key ! = null && key.equals(k))))break; p = e; }} // If e is not null, there is a node that needs to be overwritten.if(e ! = null) { // existing mappingforKey // overwrites the node value and returns oldValue V oldValue = e.value;if(! onlyIfAbsent || oldValue == null) e.value = value; // This is an empty implementation function used for the LinkedHashMap rewrite. afterNodeAccess(e);returnoldValue; }} // If it reaches this point, a new node is inserted, so modCount is modified and null is returned. ModCount ++modCount; // Update the size and determine whether expansion is required.if(++size > threshold) resize(); // This is an empty implementation function used for the LinkedHashMap rewrite. afterNodeInsertion(evict);return null;
}
Copy the code

HashCode () is a method of the Object class. The hashCode() method returns the hash code of an Object. This method is designed to better support hash tables such as sets, hashtables, and hashMaps. What hashCode() does: If there are 1000 equals elements and you create a new one, you have to call equals 1000 times to compare each of them to the same object, which is very inefficient. Ashcode actually returns the object’s storage address. If there is no element at that location, the element is stored directly there. If there is already an element at that location, then equal is called to compare the new element.

The get method

tableNot empty, andtableThe length is greater than 0 and according to the keykeythehashCode()To calculate theHash addressAgain,Ampersand based on the number of buckets -1 and the hash addressEvaluates the index of the array. If the index is not empty (that is, there is a pointer to the list head), proceed, otherwise returnnull.
And the hash address and key of the first nodekeyThe first node is returned.
Continues if the next node of the first node is not empty, or if the first node is a node of the treegetTreeNode(hash, key), looks for nodes in the tree, and returns. Otherwise, go through the list and find the keykey, hash address is the same, return this node.

public V get(Object key) {
    Node<K,V> e;
    return (e = getNode(hash(key), key)) == null ? null : e.value;
}

final Node<K,V> getNode(int hash, Object key) {
    Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
    if((tab = table) ! = null && (n = tab.length) > 0 && (first = tab[(n - 1) &hash]) != null) {
        if (first.hash == hash&& // If the first Node indexed, key andhashValue is equal to the parameters of the passed in, it returns the Node ((k = first. Key) = = key | | (key! = null && key.equals(k))))return first;
        if((e = first.next) ! = null) {// If the index to the first Node does not meet the requirements, loop over its next Node.if(First instanceof TreeNodereturn ((TreeNode<K,V>)first).getTreeNode(hash, key);
            do{// Get in the linked listif (e.hash == hash&& ((k = e.key) == key || (key ! = null && key.equals(k))))return e;
            } while ((e = e.next) != null);
        }
    }
    return null;
}
Copy the code

The remove method

tableNot empty, andtableThe length is greater than 0 and according to the keykeythehashCode()To calculate theHash address, and then calculate the index of the array according to the hash address, the index is not empty (that is, there is a pointer to the list head) continue to proceed, otherwise 6 ‘.
If hash address, keykeyThe same,Assigns the corresponding node reference to nodeThen go to 4. Otherwise, go to 3.
If it is a tree, the command is executedgetTreeNode(hash, key)Look for the node in the tree and return, otherwise iterate through the list to find the keykeyHash the address of the same node thenAssign the corresponding node reference to node, then go to 4. Otherwise, go to 6.
If the nodenodeNot null (that is, the key is queriedkeyCorresponding node), and whenmatchValueforfalseOrvalueIf they are equal, go to 5; otherwise, go to 6.
Is called if the node is a treeremoveTreeNode(this, tab, movable)Remove the corresponding node. Otherwise remove the corresponding node from the linked list,
returnnull.

public V remove(Object key) {
    Node<K,V> e;
    return (e = removeNode(hash(key), key, null, false.true)) == null ?
        null : e.value;
}

final Node<K,V> removeNode(int hash, Object key, Object Value, Boolean matchValue, Boolean movable) {// p is the front Node of the Node to be deleted. Node<K,V>[] TAB; Node<K,V> p; int n, index; // If the hash table is not empty, according tohashIf there is a node under index.if((tab = table) ! = null && (n = tab.length) > 0 && (p = tab[index = (n - 1) &hash])! = null) {//node is the node to be deleted node <K,V> node = null, e; K k; V v; // If the list header is the node that needs to be deletedif (p.hash == hash&& ((k = p.key) == key || (key ! = null && key.equals(k)))) node = p; // Assign the node reference to be deleted to nodeelse if((e = p.next) ! = null) {// Otherwise loop over to find the node to be deleted and assign the value to nodeif (p instanceof TreeNode)
                node = ((TreeNode<K,V>)p).getTreeNode(hash, key);
            else {
                do {
                    if (e.hash == hash&& ((k = e.key) == key || (key ! = null && key.equals(k)))) { node = e;break;
                    }
                    p = e;
                } while((e = e.next) ! = null); }} // If the node to be deleted is node and matchValue isfalseOr the values are the sameif(node ! = null && (! matchValue || (v = node.value) == value || (value ! = null && value.equals(v)))) {if (node instanceof TreeNode)
                ((TreeNode<K,V>)node).removeTreeNode(this, tab, movable);
            else if(node == p)// If node == p, TAB [index] = node.next;else// Otherwise, the node to be deleted is in the middle of the table. ++modCount; // change modCount --size; // Modify size visting deremoval (node); //LinkedHashMap callback functionreturnnode; }}return null;
}
Copy the code

Either containsKey method

Returns true if the specified key exists, false otherwise. The containsKey method calls the same get method as the get method.

public boolean containsKey(Object key) {
    return getNode(hash(key), key) ! = null; }Copy the code

Hash table initialization and double capacity resize method

By analyzing the resize method, we can know why the hash table can still get the correct value after the capacity changes

    final Node<K,V>[] resize() { Node<K,V>[] oldTab = table; Int oldCap = (oldTab == null)? OldCap = (oldTab == null)? 0 : oldTab.length; Int oldThr = threshold; Int newCap, newThr = 0; // If the old capacity is greater than 0, it is not the first initializationif(oldCap > 0) {// Is the old capacity greater than 2 to the 30th power (the maximum capacity)if(oldCap >= MAXIMUM_CAPACITY) {// Set the threshold to the maximum value of Integer. Threshold = integer.max_value; // Return the old hash table (the old hash table has reached its maximum capacity and cannot be expanded, so return)returnoldTab; } // New hash table capacity = old capacity <<1, that is, new capacity = 2 times the old capacity, if the new capacity is less than 2 to the 30th power (maximum capacity) and the old capacity is greater than or equal to the default capacity (16)else if((newCap = oldCap <<1) < MAXIMUM_CAPACITY && oldCap >= DEFAULT_INITIAL_CAPACITY) // New hash table threshold = old hash table threshold <<1, Table newThr = oldThr << 1; // Double threshold} // For the first time, if the old threshold is greater than 0, that is, HashMap is initialized with the passed capacity size or the passed capacity size, load factor constructor, THR eshlod has already been initialized in the constructor, so the threshold is greater than 0 hereelse if (oldThr > 0) // initial capacity was placed inThreshold // New capacity = old threshold newCap = oldThr; New capacity = default capacity, new threshold = default load factor * default capacity if initialized with a no-argument constructorelse{ // zero initial threshold signifies using defaults newCap = DEFAULT_INITIAL_CAPACITY; newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY); } // The new threshold =0, that is, the above is executedelse if(oldThr > 0)(initialized using a constructor with arguments), is initialized using a constructor with arguments, and a new threshold is computedif (newThr == 0) {
            float ft = (float)newCap * loadFactor;
            newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                      (int)ft : Integer.MAX_VALUE);
        }
        threshold = newThr;
        @SuppressWarnings({"rawtypes"."unchecked"}) // create a new hash table with the new capacity Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap]; table = newTab; // Focus on this part, if the old hash table is not emptyif(oldTab ! = null) {// Iterate over the old hash table,for(int j = 0; j < oldCap; ++j) { Node<K,V> e; // If the old hash table is not emptyif((e = oldTab[j]) ! = null) { oldTab[j] = null; // If the node has no next node (i.e., only one node), it is placed directly into the new hash tableif (e.next == null)
                        newTab[e.hash & (newCap - 1)] = e;
                    else if (e instanceof TreeNode)
                        ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                    else{// preserve order // remove all nodes of the old hash table, such as the size of the old hash table is 16, there is a value of a in the array with index 0, now the new hash table size is 32, the value of a is repositioned at index 32, that is, the new hash table size is 32. In simple terms, the new index = the old index of the hash table + the capacity of the new hash table, because of this node migration, so we inhashThe Mapputget operation still gets the correct value after the hash table size changes, but because this migration operation can consume a lot of resources, try to estimate the hash table size when creating HashMa p, and try not to double it. The migrations here are all bitwise operations, so the number of buckets must be raised to the power of 2 at initialization to ensure that the bitwise operations are the same as the modulo operations. Node<K,V> loHead = null, loTail = null; Node<K,V> hiHead = null, hiTail = null; Node<K,V> next;do {
                            next = e.next;
                            if ((e.hash & oldCap) == 0) {
                                if (loTail == null)
                                    loHead = e;
                                else
                                    loTail.next = e;
                                loTail = e;
                            }
                            else {
                                if (hiTail == null)
                                    hiHead = e;
                                elsehiTail.next = e; hiTail = e; }}while((e = next) ! = null);if(loTail ! = null) { loTail.next = null; newTab[j] = loHead; }if(hiTail ! = null) { hiTail.next = null; newTab[j + oldCap] = hiHead; } } } } }return newTab;
    }
Copy the code

We can run an example and debug it.

HashMap<String, Integer> map = new HashMap();
for (int i = 1; i <= 24; i ++) {
    map.put(String.valueOf(i), i);
}
for (int i = 25; i <= 80; i ++) {
    map.put(String.valueOf(i), i);
}
Copy the code

Let’s new a HashMap with a no-parameter constructor (the default hash table size is 16, and the default load factor is 0.75) and debug it

for
11
12

for

for

The resources

Understand HashMap from top to bottom

The original address: https://ddnd.cn/2019/03/07/jdk1.8-hashmap/