## Introduction to the

HashMap is the most frequently used data type for mapping (key, value pair) processing by Java programmers. It stores data based on the hashCode value of the key, and in most cases can be directly located to its value, thus providing fast access, but the traversal order is uncertain. HashMap allows at most one record to be null and multiple records to be null, and HashMap is not thread-safe. The ConcurrentHashMap and the Collections SynchronizedMap methods can be used to make hashMaps thread-safe. In JDK1.8, the underlying implementation of HashMap is optimized, including red-black tree and capacity expansion optimization. Take a look at JDK1.8’s HashMap and see what has been optimized for it.

## Storage structure

To understand a HashMap, you first need to know what a HashMap is, namely its storage structure; Second, figure out what it does, how it functions. We all know that a HashMap uses a hash table to store data according to the hash value of the key. However, different keys may have the same hash value, which may cause conflicts. Hash table to solve the conflict, can use open address method and linked address method to solve the problem, Java HashMap uses the linked address method to solve the hash conflict, simply speaking, is the combination of array and linked list. Each array element has a linked list structure. When storing data, if a hash conflict occurs, the array subscript is first obtained and the data is placed on the linked list of the corresponding subscript element. Here we consider a problem, even if the design of hash algorithm is reasonable, it is inevitable that the zipper is too long. Once the zipper is too long, it will seriously affect the performance of HashMap. In JDK1.8, we further optimize the data structure and introduce red-black tree. When the list length becomes too long (more than 7 by default), the list is converted to a red-black tree, as shown in the figure below.

## Important fields

A HashMap is accessed according to the hash value of the key. The performance of a HashMap is directly related to the quality of the hash algorithm. The more dispersed and uniform the results of the hash algorithm are, the smaller the probability of hash collisions will be, and the higher the access efficiency of the map will be. Of course, it also depends on the size of the hash array. If the hash array is large, even the bad hash algorithm will be scattered, and if the hash array is small, even the good hash algorithm will have more collisions, so you need to balance the space and time costs to find a more balanced value.

The JDK1.8 version is also a trade-off between time, space, and efficiency. Not only the data structure is optimized, but also the capacity expansion is optimized, greatly improving the performance of HashMap. Let’s take a look at the concrete implementation through the source code.

Let’s take a look at some of the more important attributes in the HashMap.

```
// The default initial capacity must be a power of 2.
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4;
// The maximum number of key-values that a Map can contain. When the Map size is greater than threshold, the Map will be expanded. Capacity x Capacity expansion factor
int threshold;
// Maximum capacity of hashMap
static final int MAXIMUM_CAPACITY = 1 << 30;
// The default size of the HashMap bucket array
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; / / 16
// Default load factor. When the capacity exceeds 0.75*table.length, expand the capacity
static final float DEFAULT_LOAD_FACTOR = 0.75 f;
// Load factor of the HashMap, specified in the constructor.
final float loadFactor;
// The number of linked list nodes is greater than 8
static final int TREEIFY_THRESHOLD = 8;
// The red-black tree node transforms the linked list node threshold, 6 nodes to
static final int UNTREEIFY_THRESHOLD = 6;
// Store elements in the Node array to the power of 2.
transient Node<K,V>[] table;
// Turn red black tree, the minimum length of table
static final int MIN_TREEIFY_CAPACITY = 64;
// List node, inherited from Entry
static class Node<K.V> implements Map.Entry<K.V> {
final int hash;
final K key;
V value;
Node<K,V> next;
/ /... .
}
// Red black tree node
static final class TreeNode<K.V> extends LinkedHashMap.Entry<K.V> {
TreeNode<K,V> parent; // red-black tree links
TreeNode<K,V> left;
TreeNode<K,V> right;
TreeNode<K,V> prev; // needed to unlink next upon deletion
boolean red;
// ...
}
Copy the code
```

The attributes in a HashMap are relatively easy to understand. Why is the default hash bucket array table 16 long and must be 2 to the NTH power?

So let’s talk a little bit about why the length of a hash array is 2 to the n.

In both JDK1.7 and JDK1.8, the key index position is computed using hash & (length-1).

We should know that hash % length is equivalent to hash & (length-1).

```
Suppose we have a hash of a key binary as follows: we will only look at the low values here. hahsCode0010 0011——————— to decimal —————————>35
& %
(length-1) =15: 0000 1111 length = 16-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- (binary)0011= (decimal)3 3
Copy the code
```

Why not hash % length instead of hash & (leng-1)? The bottom of the computer is binary number calculation and storage, & is close to the bottom of the computer operation, compared with % operation efficiency should be fast.

So why does length have to be 2 to the n?

```
hahsCode 0010 0011 0010 1111
&
(length-1) =15: 0000 1111 (length-1) = 13: 0000 1111
------------------------------------------------------------
0011 1111
Copy the code
```

```
hahsCode 0010 1110 1110 1100
&
(length-1) =13: 0000 0101 (length-1) = 13: 0000 0101
-----------------------------------------------------------
0100 0100
Copy the code
```

In fact, we can find that when the length of the hash array is 2 n, all the binary code length – 1 is 1, so the location of the index, completely dependent on the low value of the hash value, and the probability that the conflict is to be given a lower risk of not 2 n, index is completely dependent on the hash value of low, So as long as the hash values are evenly distributed, the probability of collisions is much lower, so length is 2 to the NTH power is better.

Secondly, when length is 2 to the n power, it is also convenient to do expansion, JDK1.8 in the expansion algorithm has also been optimized, the use of the method is also very clever. We’ll talk about that in the expansion method.

## Function implementation

### Determine index position

The data structure of HashMap is the combination of array and linked list. Therefore, of course, we hope that the location of elements in the HashMap should be as evenly distributed as possible, and the number of elements in each position should be only one. So when we use the hash algorithm to find this position, we can immediately know that the element at the corresponding position is the one we want, no need to iterate the list query, greatly optimizing the query efficiency.

TableSizeFor () ensures that the hash bucket array is always 2^n when HashMap() is initialized.

```
static final int tableSizeFor(int cap) {
int n = cap - 1;
n |= n >>> 1;
n |= n >>> 2;
n |= n >>> 4;
n |= n >>> 8;
n |= n >>> 16;
return (n < 0)?1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
/ * * if this cap = 3 incoming parameters the corresponding binary for 10 n = 2 n = n | n > > > 1 10 | 01 11... . N = 11(binary) = (base 10) 3
}
Copy the code
```

```
//JDK1.8 Hash algorithm
static final int hash(Object key) {
int h;
return (key == null)?0 : (h = key.hashCode()) ^ (h >>> 16);
}
// JDK 1.7 Hash algorithm
static final int hash(int h) {
h ^= k.hashCode();
h ^= (h >>> 20) ^ (h >>> 12);
return h ^ (h >>> 7) ^ (h >>> 4);
}
// Index position
index = hash & (length-1);
//JDK1.7 uses hashCode() + 4 bits + 5 xor (9 perturbations)
JDK 1.8 simplifies the hash function = only 2 perturbations = 1 bit + 1 Xor.
Copy the code
```

In JDK1.8, we optimized the algorithm for high-order operations by using the high-16bit xor low-16bit hashCode() : (h = k.hashcode ()) ^ (h >>> 16), mainly from the speed, efficiency, quality, compared with JDK1.7, JDK1.8 reduces the number of hash function perturbation, also can be considered to optimize the hash algorithm. Doing this ensures that both high and low bits are involved in the Hash calculation when the HashMap capacity is small, without incurring too much overhead.

```
Suppose you have a key hash binary as follows hahsCode0000 0000 0011 0011 0111 1010 1000 1011
hahsCode>>>16 0000 0000 0000 0000 0000 0000 0011 0011-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --, or ^ operation0000 0000 0011 0011 0111 1010 1011 1000
&
HashMap.size()-1 0000 0000 0000 0000 0000 0000 0000 1111-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -0000 0000 0000 0000 0000 0000 0000 1000In decimal8
Copy the code
```

### Put method

Found a flow chart from the net, the feeling is very good, took directly to come over to use, hey hey…. I drew it pretty clearly. Look at the flow chart, combined with the source code to see it.

```
final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
boolean evict) {
Node<K,V>[] tab; Node<K,V> p; int n, i;
// Check whether the hash bucket array is empty.
if ((tab = table) == null || (n = tab.length) == 0)
// If the hash bucket array is empty, initialize it. The default bucket array size is 16
n = (tab = resize()).length;
// If the bucket array is not empty, calculate the index position of the key, determine whether the index position is already occupied.
if ((p = tab[i = (n - 1) & hash]) == null)
// If it is not occupied, encapsulate it as a Node and place it in a hash bucket array.
tab[i] = newNode(hash, key, value, null);
else {
Node<K,V> e; K k;
// If the Node to be inserted already exists, replace the old Node.
if(p.hash == hash && ((k = p.key) == key || (key ! =null && key.equals(k))))
e = p;
// If it does not exist, insert the node to determine whether it is a tree node.
else if (p instanceof TreeNode)
e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
else {
// If it is a normal node, loop over the list corresponding to the bucket and insert the node to the end of the list
for (int binCount = 0; ; ++binCount) {
if ((e = p.next) == null) {d
p.next = newNode(hash, key, value, null);
// If the length of the list is greater than 7, turn the node into a tree node
if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
treeifyBin(tab, hash);
break;
}
// If the list already exists, replace the old node.
if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k))))
break; p = e; }}if(e ! =null) { // existing mapping for key
V oldValue = e.value;
if(! onlyIfAbsent || oldValue ==null)
e.value = value;
afterNodeAccess(e);
return oldValue;
}
}
++modCount;
// If the critical point for capacity expansion is exceeded, capacity expansion is carried out.
if (++size > threshold)
resize();
afterNodeInsertion(evict);
return null;
}
Copy the code
```

### The get method

The GET method is probably simpler than the PUT method, and can be understood at a glance through the source code. Without further ado, let’s go straight to the code.

```
public V get(Object key) {
Node<K,V> e;
return (e = getNode(hash(key), key)) == null ? null : e.value;
}
final Node<K,V> getNode(int hash, Object key) {
Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
// The hash bucket array is not empty, and the Node that calculates the index position based on the passed key is not empty.
if((tab = table) ! =null && (n = tab.length) > 0 &&
(first = tab[(n - 1) & hash]) ! =null) {
// If the Node where the first hash bucket is calculated is the Node to be found, return it directly.
if (first.hash == hash && // always check first node((k = first.key) == key || (key ! =null && key.equals(k))))
return first;
if((e = first.next) ! =null) {
// If it is a tree node, search the tree node directly.
if (first instanceof TreeNode)
return ((TreeNode<K,V>)first).getTreeNode(hash, key);
// Loop through the list of hash buckets
do {
if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k))))
return e;
} while((e = e.next) ! =null); }}return null;
}
Copy the code
```

### Capacity Expansion Mechanism (REsize)

Personally, I think the expansion algorithm of HashMap is the essence of the whole HashMap, and the method used is also very clever. I have to admire the big guy.

```
final Node<K,V>[] resize() {
// Insert the current table into oldTab
Node<K,V>[] oldTab = table;
// The current hash bucket array size
int oldCap = (oldTab == null)?0 : oldTab.length;
// Current capacity expansion threshold
int oldThr = threshold;
int newCap, newThr = 0;
// If the current hash bucket array is not empty
if (oldCap > 0) {
// If the capacity exceeds the maximum value
if (oldCap >= MAXIMUM_CAPACITY) {
// Change the threshold to 2^31-1
threshold = Integer.MAX_VALUE;
return oldTab;
}
// If the value is not higher than the maximum value, the value is doubled
else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
oldCap >= DEFAULT_INITIAL_CAPACITY)
newThr = oldThr << 1; // double threshold
}
else if (oldThr > 0)
// Put for the first time after initialization using new HashMap(int initialCapacity)
newCap = oldThr;
else {
// New hashMap () has not been initialized yet;
newCap = DEFAULT_INITIAL_CAPACITY;
newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
}
// Calculate the new resize upper limit
if (newThr == 0) {
float ft = (float)newCap * loadFactor;
newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
(int)ft : Integer.MAX_VALUE);
}
// Set the current threshold to the newly calculated threshold and define a new table with the newly calculated capacity. Moves elements from the old Hash bucket to the new Hash array.
threshold = newThr;
@SuppressWarnings({"rawtypes"."unchecked"})
Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
table = newTab;
/** If it's just initialization of the HashMap, which is done here, the following code moves the data from the old array to the expanded hash bucket array. * * /
if(oldTab ! =null) {
for (int j = 0; j < oldCap; ++j) {
Node<K,V> e;
if((e = oldTab[j]) ! =null) {
// Set the node of the old table to empty for garbage collector collection
oldTab[j] = null;
// Check whether the hash bucket location has only one node. If there is only one node at this location, the node is recalculated in the expanded hash bucket array
if (e.next == null)
newTab[e.hash & (newCap - 1)] = e;
else if (e instanceof TreeNode)
// If it is a red-black tree node, the red-black tree rehash distribution is performed
((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
else { // preserve order
// If it is a normal list node, all nodes at that position need to be recomputed in the new hash bucket array.
Node<K,V> loHead = null, loTail = null;
Node<K,V> hiHead = null, hiTail = null;
Node<K,V> next;
do {
next = e.next;
// If you want to move the hash value of the node to the old capacity, if the result is 0, it will remain in the same position in the expanded array.
if ((e.hash & oldCap) == 0) {
if (loTail == null)
loHead = e;
else
loTail.next = e;
loTail = e;
}
// If the value is not 0, the index position after expansion is: the index position of the old table + the index position before expansion
else {
if (hiTail == null)
hiHead = e;
elsehiTail.next = e; hiTail = e; }}while((e = next) ! =null);
if(loTail ! =null) {
loTail.next = null;
newTab[j] = loHead;
}
if(hiTail ! =null) {
hiTail.next = null;
newTab[j + oldCap] = hiHead;
}
}
}
}
}
return newTab;
}
Copy the code
```

(e.hash & oldCap) == 0; (e.hash & oldCap) == 0;

```
Assume that the array size before expansion is16Suppose there are two keys: key1(hash&hash>>>16) 0000 0000 0011 0011 0111 1010 1011 1000
key2(hash&hash>>>16) 0000 0000 0011 0011 0111 1010 1010 1000
&
length-1 = 15 0000 0000 0000 0000 0000 0000 0000 1111-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- key1:1000Decimal conversion8
key2: 1000Decimal conversion8Hash two conflicting keys in the expansion to32After key1(key hash&hash>>>16) 0000 0000 0011 0011 0111 1010 1011 1000Key2 (key hash&hash > > >16) 0000 0000 0011 0011 0111 1010 1010 1000
&
length-1 = 31 0000 0000 0000 0000 0000 0000 0001 1111-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- key1:1 1000Transposed binary24=16+8
key2: 0 1000Transposed binary8
Copy the code
```

We can see that the code uses hash & oldCap instead of Hash & NewCap-1. Why is that?

From the above situation, we can see that the positions of the two conflicting keys are determined by the newly added bit. Therefore, we can directly use hash & to determine whether the new bit is 0 or 1, and then we can know whether it is in the original position or the original position +oldCap position.

```
Hash two conflicting keys in the expansion to32After key1(key hash&hash>>>16) 0000 0000 0011 0011 0111 1010 1011 1000Key2 (key hash&hash > > >16) 0000 0000 0011 0011 0111 1010 1010 1000
&
oldCap=16 0000 0000 0000 0000 0000 0000 0001 0000
Copy the code
```

It can also be seen from the above that the two keys in the same position before will either be in the original position or oldCap+ in the original position after expansion. Instead of recalculating the hash as in JDK1.7, we just need to see if the new bit is 1 or 0. 0 is the same index, and 1 is old +oldCap. It is also a fuller illustration of why the capacity of a HashMap must be 2 to the n.

For those of you who know HashTable, the default size of HashTable is 11, and the size of HashTable is length*2+1. HashTable is designed to ensure that the hash bucket array is prime to ensure that the hash hash is as uniform as possible. If you are interested in the HashTable, you can see why modulo a prime number is better than a hash hash of a composite number. You can refer to http://blog.csdn.net/liuqiyao_01/article/details/14475159. Although HashMap is intended to reduce hash conflicts by using a hash algorithm similar to that used in HashTable (by modulo primes), the hash algorithm in HashMap is obviously much better than the hash algorithm in HashTable.

HashMap ensures that the size of the hash array is 2 to the power of n. This design is indeed very clever. By using the feature that the binary code of 2 to the power of n is all 1, the time of recalculating the hash value is saved during expansion. Therefore, the resize process evenly distributes the previously conflicting nodes into the new bucket.

## conclusion

- Scaling is a very performance intensive operation, so when using HashMap, estimate the size of the map and give a rough value during initialization to avoid frequent scaling of the map.
- Do not use HashMap in a concurrent environment. Use ConcurrentHashMap or SynchronizedMap
- The load factor can be modified or greater than 1, but it is recommended not to change it lightly.