## The background,

In Java, HashMap is a very common data structure, recently reviewed, here to analyze the source level of HashMap summary, if there is any unreasonable or question, welcome to communicate.

AbstractMap class HashMap is a Java container that implements the Map interface using AbstractMap.

In the Map family of many implementation classes, we do the most common operation is to instantiate Map, map.put(key,value), map.get(key), very simple, but a careful analysis of the source will find that these classes have a lot of interesting design. The following uses HashMap as an example to tease things out.

## 2. Macro structure of HashMap

Arrays and linked lists are the two most basic linear data structures, which are represented by Java arrays and reference variables. HashMap uses arrays and linked lists to store data. By default, there is an array with a length of 16 in HashMap, and each element of the array stores a head node of the linked list, as shown in the following figure:

(image from www.cnblogs.com/moonandstar…

When executing Map Map =new HashMap(); Initializes an empty array of type Entry in the Map instance.

If the data in the HashMap now looks like the figure above, when you put elements into the HashMap again, say map.put(” key1 “, “value1”); It will first calculate the hashCode of key1, and then calculate the coordinate position of the array containing the Entry according to the hashCode, and then determine whether there is already an element in this position. If there is not, the new element can be directly saved in the corresponding position of the array. If there is already an element, the new element can be saved in the corresponding position of the array. If the value of the existing element is equal to the key value of the new element, replace the value of the existing element with the new value. Otherwise, the next of the new element should point to the old element (the head of the current list), and then save the new element in the corresponding position of the array, which is equivalent to inserting the new element in the first place when it comes. Insert Entry1 into the list at 0 in the preceding image by pointing its next to Entry0 and storing itself in the array.

When retrieving an element in a HashMap based on the key value, the hashCode of the key is calculated first, and then the index of the array where the element is stored in the linked list is found according to the hashCode, and the element with the same key value is found through the linked list. This is why if a HashMap puts the same key multiple times with different values, the previous value is overwritten by the latest one.

Here’s a quick look at the important constants and variables in the HashMap class:

● HashMap array: TRANSIENT Entry[] table

● Default array length: static final int DEFAULT_INITIAL_CAPACITY = 1 << 4

● Array maximum length: static final int MAXIMUM_CAPACITY = 1 << 30

● Default loading factor: static final float DEFAULT_LOAD_FACTOR = 0.75f

● Capacity expansion threshold: private int threshold; (threshold = capcity * loadFactor)

Key is the key that stores the element, value is the value that stores the element, next is the “pointer” to the next node, and hash is the hashCode of the key:

`class Entry<K,V> implements Map.Entry<K,V> { final K key; V value; Entry<K,V> next; int hash; ... }Copy the code`

DEFAULT_INITIAL_CAPACITY is the default array length of 16, MAXIMUM_CAPACITY is the maximum array length of 2 ^ 30, The reason is that 2 to the 31st is greater than integer.max_value (2147483647), and all arrays must be to the power of 2 (reasons explained later), so the maximum array length can only be 2 to the power of 30.

## Third, source code analysis

Here’s a brief overview of what happens during the initialization, put, and get elements of a HashMap. Here’s a look at the source code for each step. Java7 is an example.

Initialization of the HashMap

A HashMap has four constructors:

`Public HashMap () {... {} public HashMap (int initialCapacity)... } public HashMap(int initialCapacity, float loadFactor){... } public HashMap(Map<? extends K, ? Extends V > m) {... }Copy the code`

Both the first and second methods end up calling the third constructor, the first with the default array length (DEFAULT_INITIAL_CAPACITY) and the default load factor (DEFAULT_LOAD_FACTOR), The second calls the third constructor with a custom length and a custom load factor. The third constructor code is as follows: check the validity of the defined array length and the loading factor, and then assign the loading factor. Here we can see that the array length initialCapacity is assigned to the threshold. The entire initialization process does not change the length of the array (table), so after the new Hashmap, the table is still an empty array.

`Public HashMap(int initialCapacity, float loadFactor) {// If the array length is less than 0, If (initialCapacity < 0) throw new IllegalArgumentException("Illegal Initial Capacity: "+ initialCapacity); If (initialCapacity > MAXIMUM_CAPACITY) initialCapacity = MAXIMUM_CAPACITY; // If (initialCapacity > MAXIMUM_CAPACITY) initialCapacity = MAXIMUM_CAPACITY; // If the array length is not greater than 0, or is non-numeric, Throw an exception if (loadFactor < = 0 | | Float. The isNaN (loadFactor)) throw new IllegalArgumentException (" Illegal load factor: " + loadFactor); // set the loadFactor and capacity expansion threshold this.loadFactor = loadFactor; threshold = initialCapacity; init(); }Copy the code`

2. Put operations on HashMap (put NULL, Hash, expand)

The put method of a HashMap performs the following operations: checking whether the table is empty, checking whether the key is null, calculating the hash of the key, calculating the array index based on the hash, and storing elements.

`public V put(K key, V value) { if (table == EMPTY_TABLE) { inflateTable(threshold); } if (key == null) return putForNullKey(value); int hash = hash(key); int i = indexFor(hash, table.length); for (Entry<K,V> e = table[i]; e ! = null; e = e.next) { Object k; if (e.hash == hash && ((k = e.key) == key || key.equals(k))) { V oldValue = e.value; e.value = value; e.recordAccess(this); return oldValue; } } modCount++; addEntry(hash, key, value, i); return null; }Copy the code`

(1) When a table is empty, use the inflateTable method to expand the size of the table inflateTable inflateTable method inflateTable function inflateTable function inflateTable function inflateTable function inflateTable function inflateTable function inflateTable function inflateTable function inflateTable function inflateTable function inflateTable function inflateTable function inflateTable function inflateTable function inflateTable function inflateTable function inflateTable function inflateTable function inflateTable function inflateTable function inflateTable function inflateTable function inflateTable Therefore, in order to ensure storage requirements and save the most space, it is necessary to calculate the power greater than or equal to and closest to 2, and then instantiate the table. The array capacity is the newly calculated length capacity.

`Private void inflateTable(int toSize) {// Returns values greater than or equal to and closest to the power of 2 int Capacity = roundUpToPowerOf2(toSize); threshold = (int) Math.min(capacity * loadFactor, MAXIMUM_CAPACITY + 1); table = new Entry[capacity]; initHashSeedAsNeeded(capacity); }Copy the code`

PutForNullKey (); putForNullKey(); putForNullKey(); putForNullKey(); putForNullKey() If no node with a null key is found, the element (key-value pair) is added directly to the list with array subscript 0.

`private V putForNullKey(V value) { for (Entry<K,V> e = table[0]; e ! = null; e = e.next) { if (e.key == null) { V oldValue = e.value; e.value = value; e.recordAccess(this); return oldValue; } } modCount++; addEntry(0, null, value, 0); return null; }Copy the code`

(3) Compute the hash of the key, using the hash to compute the table below the array

Hash (key) method is to calculate according to the key value and returns an integer number, in Java 7, if the key is a String, mainly by the sun. The misc. Hashing. StringHash32 (k) (String), Otherwise, perform xor on the hashCode of the hashSeed and key, followed by an unsigned right-shift xOR, as follows:

`final int hash(Object k) { int h = hashSeed; if (0 ! = h && k instanceof String) { return sun.misc.Hashing.stringHash32((String) k); } h ^= k.hashCode(); h ^= (h >>> 20) ^ (h >>> 12); return h ^ (h >>> 7) ^ (h >>> 4); }Copy the code`

In Java8, directly (h = key.hashcode ()) ^ (h >>> 16), as follows:

```
static final int hash(Object key) {
int h;
return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}Copy the code
```

After calculating the hash value, you need to calculate the index of the array based on the hash value as follows:

```
static int indexFor(int h, int length) {
// assert Integer.bitCount(length) == 1 : "length must be a non-zero power of 2";
return h & (length-1);
}Copy the code
```

The hash value is an integer ranging from -2^31 to 2^31-1. There are more than 4 billion hash values. How to determine one of the indices of the array based on one of the 4 billion numbers?

For example, entry1(key1= “ABC”,value1= “ABC”) and entry2(key2= “def”,value1= “def”) are stored in HashMap. Assume that the hash value of key2 is 27 and the binary value is 11011.

If the array length is 2 to the power of 2, then every bit of the -1 binary will have the value of 1. The result of the & operation on the key will be the same as the key hash except for the high part of the value exceeding the array length of -1.

What happens if the array length is not a power of two, say 15? The diagram below:

When the length of the array is 15, the binary end value is 0, and the end value of the calculation result must always be 0. Therefore, the calculation result will never be 00001, 00011, 00101, 00111, 01001, 01011, 01101, 01111, etc., and the end value will never be 1. That is, arrays with subscript 1, 3, 5, 7, 9, 11, 13, and 15 will never store data, resulting in a huge waste of space, which is why the array length in a HashMap must be a power of 2.

(4) Store elements

Once the index of the array is found, we need to store the elements in two cases: 1. 2. The key of the PUT element does not exist. In the first case, the linked list stored in the space corresponding to the subscript of the number group will be traversed first. If there are nodes with this key value, the corresponding new value will be replaced with the old value. In the second case, you need to call addEntry(hash,key,value, array subscript) to create a new node element. Before creating a node element, you need to expand the array space before creating a new node.

`void addEntry(int hash, K key, V value, int bucketIndex) { if ((size >= threshold) && (null ! = table[bucketIndex])) { resize(2 * table.length); hash = (null ! = key) ? hash(key) : 0; bucketIndex = indexFor(hash, table.length); } createEntry(hash, key, value, bucketIndex); }Copy the code`

Size is the number of elements currently stored in HashMap, and the value of threshold is the expansion critical value (array length * loading factor). When size is greater than or equal to threshold and the current location of the array is not empty, Call the resize method to expand the capacity (before Java1.6, the capacity will be expanded as long as the size is greater than or equal to threshold). The expanded capacity is twice the length of the array. The core logic of expansion is in the transfer method called by resize method, which is mainly to copy the elements in the original table to the expanded newTable:

`void transfer(Entry[] newTable, boolean rehash) { int newCapacity = newTable.length; For (Entry<K,V> e: table) {// Count while(null! // Iterate through the list Entry<K,V> next = e.next; If (rehash) {e.hash = null == e.key? 0 : hash(e.key); } int i = indexFor(e.hash, newCapacity); e.next = newTable[i]; newTable[i] = e; e = next; }}}Copy the code`

In order to intuitively understand the code of transfer method, several diagrams are drawn. Suppose there are four elements entry1, entry2, entry3 and entry4 in HashMap, and their storage positions in table are as follows:

The for loop iterates over every element in the table array, while loop iterates over every node on the list. When Entry next = e.next; After, e points to enty1, next points to entry2; Int I = indexFor(e.hash, newCapacity); After that, the value of I is the subscript of e (entry1) computed in the new table (assuming I =2); E.next = newTable[I]; After that, entry1’s next points to newTable[2], which is null (this step is important because new nodes are always inserted into the head of the list in the same chain). When newTable[I] = e; Entry1 has been moved to the new table array with subscript 2, as shown below:

The result after the second while loop is that entry2 is moved to the new list with subscript 7, as shown below:

On the third while loop, finish Entry Next = e.next; After that, e points to enty3 and next points to null. Int I = indexFor(e.hash, newCapacity); After that, the value of I is the subscript of e (entry3) computed in the new table (assuming I =2); E.ext = newTable[I]; After that, entry3’s next points to newTable[2], which is entry1 (this step is important because new nodes are always inserted into the head of the list in the same chain). NewTable [I] = e; Entry3 has been inserted into the head of the linked list with subscript 2 in the new table array, as shown below. When e = next; , the value of e is null, ending the while loop.

Here is the result after the second for loop, with entry4 inserted into the head of the new table with subscript 2:

Get operation on HashMap

Get operation is relatively simple, the source code is as follows:

```
public V get(Object key) {
if (key == null)
return getForNullKey();
Entry<K,V> entry = getEntry(key);
return null == entry ? null : entry.getValue();
}Copy the code
```

If the key is null, call getForNullKey directly to iterate over the node element with a null key on the array subscript 0 and return value, otherwise call getEntry to get value. If size==0 (no element in HashMap), return null; Otherwise, the hash value of the key value is calculated, and the index of the array where the element stores the linked list is calculated based on the hash value. Finally, the list is traversed, the element with the same key value is found and returned.

`final Entry<K,V> getEntry(Object key) { if (size == 0) { return null; } int hash = (key == null) ? 0 : hash(key); for (Entry<K,V> e = table[indexFor(hash, table.length)]; e ! = null; e = e.next) { Object k; if (e.hash == hash && ((k = e.key) == key || (key ! = null && key.equals(k)))) return e; } return null; }Copy the code`

## Four, other

1. Performance issues

Performance of HashMap is mainly reflected in expansion operations. As can be seen from the above, when the number of elements in HashMap size is greater than or equal to the product of array capacity and loading factor, the array capacity will be doubled. It is relatively time-consuming to iterate over the elements of the original array in a HashMap and recalculate the hash value of each element’s key and the position of the new array.

Therefore, there are two main factors that affect the performance of HashMap: Capacity refers to the length of array when HashMap is initialized. LoadFactor refers to the utilization of array space before HashMap is expanded. Ideally, HashMap store only one element in every position of the array (array is full, and there is no list), the search efficiency and storage utilization rate is the highest, because by computing the hash key can directly in an array to locate the position, do not need to traverse the list, but this is the ideal state, once appear, hash collision can produce in real operation list, Therefore, when initializing a HashMap, you can estimate how many elements will be stored in the HashMap to set the initial capacity of the HashMap and the value of the load factor, avoiding the HashMap expansion operation as much as possible.

By the way, Hash collision attack can be launched by constructing a large number of key-value pairs with the same key value hashcode, which will degrade the structure based on Hash table to that based on linked list. When the data volume is large, the locating time complexity of an element is reduced from O(1) to O(n).

Detail can refer to the following articles: a senior DoS attacks – Hash collision attacks: www.jianshu.com/p/5b99ae1ba…

Hash collision denial of service attacks: ihyperwin.iteye.com/blog/152008…

HashMap Hash collision resolution and future improvements: blog.csdn.net/Luo_da/arti…

2. Thread safety

If you’re familiar with HashMap, you know that HashMap is thread unsafe.

(1) Assignment is unsafe: If two threads want to store key-value pairs in a HashMap at the same time, the key-value pair stored by the thread lock that performs the assignment first may be overwritten by the key-value pair stored by the other thread when the two threads execute the createEntry () method at the same time.

(2) When expanding HashMap in multi-threaded environment, each thread will execute the code of expanding HashMap (resize method). Finally, only one thread will assign the new array to table, and the operations of other threads will be lost. When a thread has completed capacity expansion and another thread has just started capacity expansion, it may directly use the expanded array as the original array. A large number of threads perform put operations on the same HashMap, which may result in a looped linked list during expansion, resulting in an infinite loop when getting elements.

Solution:

(1) Externally package HashMap to realize synchronization mechanism

(2) using the Map Map = Collections. SynchronizedMap (new HashMap (…). ); Synchronized is added to all methods, but does not contain any put/get/contain methods.

(3) use java.util.HashTable to lock the entire table array synchronously.

(4) using Java. Util. Concurrent. ConcurrentHashMap, relatively safe, High efficiency (Synchronous lock locks only one bucket (a single element of array table) at a time. It allows multiple threads to read and write different buckets at the same time and ensures the synchronization of put/ GET on the same bucket. Recommended)

3. Improvements to HashMap in java1.8

JDK1.8’s HashMap source code implementation is different from 1.7. It is very different, and its underlying data structure is also different, introducing a red-black tree structure, as shown in the following figure.

JDK1.8HashMap performs 15% better than JDK1.7, and even 100% better in some size areas. With the increase of size, JDK1.7 time tends to increase, while JDK1.8 time tends to decrease significantly and shows a steady logarithmic increase. When a list length is greater than 8, HashMap dynamically replaces it with a red-black tree, which reduces the time complexity from O(n) to O(logn).