The HashMap is one of the most commonly asked questions in job interviews, and is often used in development. So I spent a lot of time on research and analysis to write this article. This article is based on JDK1.8 to analyze. It’s long, but it’s progressive. Read it patiently and I believe you will gain something.

**One, with problem analysis**

This article hopes to solve the following problems.

(1) What is the underlying data structure of HashMap?

(2) What is the bottom implementation principle of add, delete, change and check operation in HashMap?

(3) How does HashMap achieve capacity expansion?

(4) How does HashMap resolve hash conflicts?

(7) Why is HashMap non-thread-safe?

Let’s take a look at HashMap with these questions in mind.

**2. Get to know HashMap**

HashMap was first introduced in jdk1.2 and has not changed much since jdk1.7. But suddenly there was a big change in jdk1.8. One of the most notable changes:

In JDk1.8, the storage structure is array + list + red-black tree.

In addition, a HashMap is non-thread-safe, meaning that if multiple threads are adding, deleting, or modifying elements of a HashMap at the same time, data consistency cannot be guaranteed.

Let’s start with a step by step analysis.

**Third, take a closer look at HashMap**

**1. Underlying data structure**

For a comparative analysis, let’s first present a storage structure diagram of JDK1.7

As we can see from the figure above, in jdk1.7, elements are first placed in arrays, and then more and more data elements are stored, resulting in linked lists. For each element in an array, there can be a linked list to store the elements. This is known as “pull-through” storage.

After a few years of use, as more and more elements were stored and the linked list became longer and longer, the efficiency of finding an element was not improved (the linked list was not suitable for finding, adding and deleting), but decreased a lot, so an improvement was made to the linked list. How can it be improved? I’m going to turn this list into a tree, a red black tree. The storage data structure for a HashMap then looks like this.

We’ll see that the optimization part is turning the list structure into a red-black tree. The advantage of JDK1.7 is high efficiency in addition and deletion. Therefore, in JDK1.8, not only the efficiency of addition and deletion is high, but also the efficiency of search is improved.

**If the list length is at least 8 and the array length is at least 64, the list will be converted to a red-black tree.**

**Q1: What is a red-black tree?**

Red black tree is a self-balanced binary search tree, that is to say, the search efficiency of red black tree is very high, the search efficiency of the linked list from O (n) to O (logn). If you haven’t seen red-black trees before, just remember that red-black trees are very efficient.

**Q2: Why not turn the whole list into a red-black tree at once?**

So the question is, why do we have to wait until the length of the list is at least 8 before we turn into a red-black tree? There are two ways to explain this

(1) The construction of red-black tree is more complex than that of linked list. When there are not many nodes in the linked list, the structure of array + linked list + red-black tree may not be higher than that of array + linked list in terms of overall performance. It’s like killing a chicken with a sword.

(2) Frequent expansion of HashMap will cause the bottom red-black tree to be constantly split and reorganized, which is very time-consuming. Therefore, it is when the list length is long that switching to a red-black tree will significantly improve efficiency.

OK, at this point we believe we have an idea of the underlying data structure of the hashMap. Now, with the structure diagram above, see how to store an element.

**2. Store the element PUT**

When we store an element, we mostly use the following method.

```
public class Test {
public static void main(String[] args) {
HashMap<String, Integer> map= new HashMap<>();
// Store an element
map.put("Zhang".20); }}Copy the code
```

In this case HashMap<String, Integer>, the first argument is the key and the second argument is the value, which together is called a key-value pair. All you need to do is call the PUT method. How does the underlying implementation work? Again, here is a flow chart

For the process above, I don’t know if you can see it. The three judgment boxes written in red are also turning points. Let’s use words to sort out this process:

(1) Step 1: Call the PUT method to pass in the key-value pair

(2) Step 2: Calculate the hash value using the hash algorithm

(3) Step 3: Determine the storage location according to the hash value and determine whether there is a conflict with other key/value pairs

(4) Step 4: If there is no conflict, just store it in the array

(5) Step 5: If a conflict occurs, what is the data structure at this time?

(6) Step 6: If the data structure is a red-black tree, insert it directly into the red-black tree

(7) Step 7: If the data structure at this time is a linked list, judge whether it is greater than or equal to 8 after insertion

(8) Step 8: After insertion is greater than 8, it is necessary to adjust to the red black tree, before insertion

(9) Step 9: After insertion is not greater than 8, then directly insert to the end of the list can be.

Above is the whole process of inserting data, just look at the process is not enough, we also need to dive into the source code to see how to write code according to this process.

By focusing on the PUT method, press F3 to access the put source. Take a look:

```
public V put(K key, V value) {
return putVal(hash(key), key, value, false.true);
}
Copy the code
```

That is, the put method actually calls putVal. The putVal method takes five arguments:

(1) The first parameter hash: The hash method is invoked to calculate the hash value

(2) The second parameter key: is the key value we pass in, which is zhang SAN in the example

(3) The third parameter value: is the value we pass in, that is, 20 in the example

(4) onlyIfAbsent: Does not modify the existing value when the key is the same

(5) The fifth argument, evict: if false, the array is in create mode, so it is generally true.

Now that we know what these five parameters mean, we go to the putVal method.

```
final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
boolean evict) {
Node<K,V>[] tab; Node<K,V> p; int n, i;
// The first part
if ((tab = table) == null || (n = tab.length) == 0)
n = (tab = resize()).length;
// Part 2
if ((p = tab[i = (n - 1) & hash]) == null)
tab[i] = newNode(hash, key, value, null);
// Part 3
else {
Node<K,V> e; K k;
// Section 1 of Part 3
if(p.hash == hash && ((k = p.key) == key || (key ! =null && key.equals(k))))
e = p;
// Section 2 of Part 3
else if (p instanceof TreeNode)
e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
// Section 3 of Part 3
else {
for (int binCount = 0; ; ++binCount) {
// The first paragraph of section 3
if ((e = p.next) == null) {
p.next = newNode(hash, key, value, null);
if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
treeifyBin(tab, hash);
break;
}
// The first paragraph of section 3
if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k))))
break;
// Section 3 of section 3p = e; }}// Section 4 of Part 3
if(e ! =null) { // existing mapping for key
V oldValue = e.value;
if(! onlyIfAbsent || oldValue ==null)
e.value = value;
afterNodeAccess(e);
return oldValue;
}
}
++modCount;
// Part 4
if (++size > threshold)
resize();
afterNodeInsertion(evict);
return null;
}
Copy the code
```

At first glance, this code does not have the desire to read, the first time I look at it really disgusting to vomit, but combined with the flow chart drawn at the beginning of the analysis, I believe it will be much better. We split the code into four parts:

(1) Node<K,V>[] TAB Node<K,V> p where p stands for the Node being inserted

(2) Part I:

```
if ((tab = table) == null || (n = tab.length) == 0)
n = (tab = resize()).length;
Copy the code
```

If the array is empty, create a new array using the resize method. The resize method is not explained here and will be covered in the next section when expanding capacity.

(3) Part II:

```
if ((p = tab[i = (n - 1) & hash]) == null)
tab[i] = newNode(hash, key, value, null);
Copy the code
```

I represents the insertion position in the array, evaluated as (n-1) & hash. If not, insert newNode into the array. This corresponds to the first check box in the flow chart.

If the inserted hash values conflict, go to part 3 to handle the conflict

(4) Part III:

```
else {
Node<K,V> e; K k;
// Part 3 A
if(p.hash == hash && ((k = p.key) == key || (key ! =null && key.equals(k))))
e = p;
// Part 3 b
else if (p instanceof TreeNode)
e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
// Part 3 c
else {
for (int binCount = 0; ; ++binCount) {
// The first paragraph of section 3
if ((e = p.next) == null) {
p.next = newNode(hash, key, value, null);
if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
treeifyBin(tab, hash);
break;
}
// The first paragraph of section 3
if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k))))
break;
// Section 3 of section 3p = e; }}// Part 3 d
if(e ! =null) { // existing mapping for key
V oldValue = e.value;
if(! onlyIfAbsent || oldValue ==null)
e.value = value;
afterNodeAccess(e);
returnoldValue; }}Copy the code
```

As we’ll see, conflict management is a real hassle, but fortunately we’ve divided this up

A) Section 1 of Part III:

```
if(p.hash == hash &&((k = p.key) == key || (key ! =null && key.equals(k))))
e = p;
Copy the code
```

If the table[I] element is the same as the inserted key, replace the old value e with the inserted value P.

B) Section 2 of Part III:

```
else if (p instanceof TreeNode)
e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
Copy the code
```

If the data structure is a red-black tree or a linked list, putTreeVal directly into the red-black tree. This corresponds to the second judgment box in the flowchart.

C) Section 3 of Part III:

```
// Part 3 c
else {
for (int binCount = 0; ; ++binCount) {
// The first paragraph of section 3
if ((e = p.next) == null) {
p.next = newNode(hash, key, value, null);
if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
treeifyBin(tab, hash);
break;
}
// The first paragraph of section 3
if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k))))
break;
// Section 3 of section 3p = e; }}Copy the code
```

If the data structure is a linked list, first check whether the table array exists, if not newNode(hash, key, value, null). If one exists, replace the old one with the new value.

Note that binCount >= treeify_threshold-1 is judged when an element does not exist and is inserted at the end of the list. That is, determine whether the length of the current list is greater than the threshold 8. If it is greater than that, the current list will be transformed into a red-black tree, using the method of treeifyBin. This corresponds to the third judgment box in the flowchart.

(5) Part IV:

```
if (++size > threshold)
resize();
afterNodeInsertion(evict);
return null;
Copy the code
```

After the insertion is successful, check whether the actual number of key-value pairs size is greater than threshold. If it’s greater than that, it’s time to expand.

**3,**

Why expand it? It is clear that the current capacity is insufficient, that is, too many elements are put. To this end, we still give a flow chart first, and then to analyze.

HaspMap capacity expansion is a process that computes the new hash table capacity and the new capacity threshold, initializes a new hash table, and remaps the old key-value pairs into the new hash table. If the red-black tree was involved in the old hash table, then mapping to the new hash table also involves splitting the red-black tree. The whole process is also in line with our normal capacity expansion process. We analyze it according to the flowchart and codes:

```
final Node<K,V>[] resize() {
Node<K,V>[] oldTab = table;
int oldCap = (oldTab == null)?0 : oldTab.length;
int oldThr = threshold;
int newCap, newThr = 0;
// Part 1: Capacity expansion
if (oldCap > 0) {
if (oldCap >= MAXIMUM_CAPACITY) {
threshold = Integer.MAX_VALUE;
return oldTab;
}
else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
oldCap >= DEFAULT_INITIAL_CAPACITY)
newThr = oldThr << 1; // double threshold
}
// Part 2: Set thresholds
else if (oldThr > 0) // initial capacity was placed in threshold
newCap = oldThr;
else { // zero initial threshold signifies using defaults
newCap = DEFAULT_INITIAL_CAPACITY;
newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
}
if (newThr == 0) {
float ft = (float)newCap * loadFactor;
newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
(int)ft : Integer.MAX_VALUE);
}
threshold = newThr;
@SuppressWarnings({"rawtypes"."unchecked"})
Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
table = newTab;
// Old data is stored in new array
if(oldTab ! =null) {
for (int j = 0; j < oldCap; ++j) {
Node<K,V> e;
if((e = oldTab[j]) ! =null) {
oldTab[j] = null;
// There is only one node, which is mapped directly through the index position
if (e.next == null)
newTab[e.hash & (newCap - 1)] = e;
// If it is a red-black tree, you need to split the tree and map it
else if (e instanceof TreeNode)
((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
else {
// If multiple nodes are linked, split the original list into two linked lists
Node<K,V> loHead = null, loTail = null;
Node<K,V> hiHead = null, hiTail = null;
Node<K,V> next;
do {
next = e.next;
if ((e.hash & oldCap) == 0) {
if (loTail == null)
loHead = e;
else
loTail.next = e;
loTail = e;
}
else {
if (hiTail == null)
hiHead = e;
elsehiTail.next = e; hiTail = e; }}while((e = next) ! =null);
// list 1 is stored in the original index
if(loTail ! =null) {
loTail.next = null;
newTab[j] = loHead;
}
// List 2 is stored in the original index plus the offset of the original hash bucket length
if(hiTail ! =null) {
hiTail.next = null;
newTab[j + oldCap] = hiHead;
}
}
}
}
}
return newTab;
}
Copy the code
```

The amount of code is also disgusting, but let’s break it down into sections:

(1) Part I:

```
// Part 1: Capacity expansion
if (oldCap > 0) {
if (oldCap >= MAXIMUM_CAPACITY) {
threshold = Integer.MAX_VALUE;
return oldTab;
}
else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
oldCap >= DEFAULT_INITIAL_CAPACITY)
newThr = oldThr << 1; // double threshold
}
Copy the code
```

If the size of the array is exceeded, the threshold is set to the maximum integer value. If the size is not exceeded, the threshold is doubled. Note that oldThr << 1 is the shift operation.

(2) Part II:

```
// Part 2: Set thresholds
else if (oldThr > 0) // The threshold is already initialized
newCap = oldThr;
else { // Initialize a default capacity and threshold if there is no threshold
newCap = DEFAULT_INITIAL_CAPACITY;
newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
}
if (newThr == 0) {
float ft = (float)newCap * loadFactor;
newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
(int)ft : Integer.MAX_VALUE);
}
// Set the current capacity threshold
threshold = newThr;
@SuppressWarnings({"rawtypes"."unchecked"})
Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
table = newTab;
Copy the code
```

First of all, the first else if says if the threshold is already initialized, use the old threshold. Then the second else indicates that if none is initialized, a new array size and a new threshold are initialized.

(3) The third part

The third part is just as complicated, copying the old data into the new array. This need to pay attention to the following situations:

A: After expansion, if the hash bit is 0, the position of the element after expansion is the original position

B: After expansion, if the added hash bit is 1, the position of the element after expansion = original position + old position after expansion.

What are the new bits to hash? We convert the hash value to a binary number, adding the fifth digit from the bottom.

Split the hash table into two halves: low and high. If you can place half of the key value pairs in the original list in the low and half in the high, and judge by e.hash & oldCap == 0, what are the advantages of this judgment?

For example: n = 16, binary 10000, and 5th bit 1. Whether E.hash & oldCap equals 0 depends on whether the 5th bit of E.hash is 0 or 1, which is equivalent to a 50% chance of being placed at the bottom of the new hash and a 50% chance of being placed at the top of the new hash.

OK, to this step basically even if the expansion of this part is finished, there is still a problem that is not solved, that is to say, the principle of storage is clear, the storage of more elements how to expand also understand, after the expansion of address conflict?

**4. Resolve address conflicts**

Address conflicts can be resolved if the hash values are duplicated. Let’s take a look at how the hash values are computed in a HashMap.

```
static final int hash(Object key) {
int h;
return (key == null)?0 : (h = key.hashCode()) ^ (h >>> 16);
}
Copy the code
```

The code is super simple. The hash value is actually computed using hashCode and 16 xor. Why use xor? Draw a picture and you’ll see:

In other words, the hashes that can be calculated by xOR are more uniform and less prone to conflicts. But there is a conflict, how to solve it?

**In data structures, we often use methods to deal with hash conflicts: developing addressing method, rehashing method, chain address method, and establishing a public overflow area. The way to handle hash collisions in hashMap is the chained address method.**

The basic idea of this method is to form a single linked list called synonym chain of all elements whose hash address is I, and store the head pointer of the single linked list in the ith element of the hash table, so the search, insertion and deletion are mainly carried out in the synonym chain. The chained address method is suitable for frequent inserts and deletions.

I believe we can understand, address conflict, one by one to line up a chain is OK. This corresponds to the underlying data structure of the HashMap.

**Construct a HashMap**

The problems which may arise above have been explained, but his method of construction is belated. Let’s take a closer look at his construction method:

His construction methods are altogether four:

The first:

```
public HashMap(a) {
this.loadFactor = DEFAULT_LOAD_FACTOR;
}
Copy the code
```

The second:

```
public HashMap(int initialCapacity) {
this(initialCapacity, DEFAULT_LOAD_FACTOR);
}
Copy the code
```

The third:

```
public HashMap(Map<? extends K, ? extends V> m) {
this.loadFactor = DEFAULT_LOAD_FACTOR;
putMapEntries(m, false);
}
Copy the code
```

Fourth:

```
public HashMap(int initialCapacity, float loadFactor) {
if (initialCapacity < 0)
throw new IllegalArgumentException("Illegal initial capacity: " +
initialCapacity);
if (initialCapacity > MAXIMUM_CAPACITY)
initialCapacity = MAXIMUM_CAPACITY;
if (loadFactor <= 0 || Float.isNaN(loadFactor))
throw new IllegalArgumentException("Illegal load factor: " +
loadFactor);
this.loadFactor = loadFactor;
this.threshold = tableSizeFor(initialCapacity);
}
Copy the code
```

Of the four constructors, the fourth is obviously the most troublesome, so let’s look at the fourth constructor, and the other three will follow. There are two new terms: loadFactor and initialCapacity. Let’s break it down one by one:

(1) initialCapacity initialCapacity

The official requirement is that we input a value of 2 to the NTH power, such as 2, 4, 8, 16, etc., but what if we accidentally input a value of 20? It doesn’t matter, the virtual machine, depending on the value you input, will find the nearest value of 20 to the NTH power of 2 greater than 20, for example, 32 is larger and closest to it, and take 32 as the initial capacity.

(2) loadFactor

Load factor. The default value is 0.75. The loadFactor represents the amount of space used in a hash table, with the formula initailCapacity*loadFactor= the capacity of HashMap. Therefore, the larger the load factor, the more loaded the hash table will be, that is, the more elements it can hold, and the more elements, the larger the linked list, so the index efficiency will be reduced. On the contrary, the smaller the load factor is, the more sparse the data in the linked list will be, which will cause waste to the space, but the index efficiency is high at this time.

Why is the default 0.75? Let’s grab a snippet of the JDK document:

People who are not good at English see that I really have a face mengbi, but fortunately the meaning can be understood. See the third line Poisson_distribution. And the most important thing is

By the time the bucket reaches eight elements, the probability becomes very small, meaning that with 0.75 as the loading factor, it is almost impossible to have more than eight linked lists per collision location. By the time the bucket reaches eight elements, the probability becomes very small, meaning that with 0.75 as the loading factor, it is almost impossible to have more than eight linked lists per collision location.

**6. Why is HashMap non-thread-safe?**

To solve this problem, the answer is simple, because the source code is all non-thread-safe methods, you will not find synchronized keyword. Thread safety is not guaranteed. Hence the ConcurrentHashMap.

Write here finally be regarded as some core content to write down. Of course, there are a lot of interview questions involved in HashMap, so you can’t cover them all. If there are any omissions, we will supplement them in the future. Criticisms and corrections are welcome.