Problems in HashMap concurrency

The hashMap is one of the most commonly used data structures in our programming, so what might happen if we use hashMap in high concurrency?

1. The result is not what we want. (Not thread-safe)

2, expansion leads to program loop. The CPU100%; (JDK1.7 has been expanded. 1.8 does not have this problem. Serious)

Why is there an infinite loop? Let’s analyze it next. Let’s take a look at the source code for hashMap and the PUT operation.

1.1 1.7HashMap source code brief analysis

1.1.1 Constructor parsing

Now that we are familiar with the cost of hashMap, here are a few key points. The data structure of HashMap is the data structure of array + linked list, as follows

static class Entry<K,V> implements Map.Entry<K,V> { final K key; V value; Entry<K,V> next; // Store a reference to the next Entry, single linked list structure int hash; // Hash the hashcode value of the key, which is stored in Entry, / / entry (int h, K K,V V, entry <K,V> n) {value = V; next = n; key = k; hash = h; }Copy the code

When instantiating a HashMap, the system creates an array of entries with the length of Capacity, which is called Capacity. In this array, the locations where elements can be stored are called buckets. Each bucket has its own index. The system can quickly look up elements in the bucket based on the index. Each bucket stores one element, an Entry object, but each Entry object can have a reference variable that points to the next element, so it is possible to create a chain of entries within a bucket. Entry is the basic unit of a HashMap, and each Entry contains a key-value pair.

Each element in a 16-length array stores the head of a linked list. So what are the rules for how these elements are stored in the array? It is usually obtained by hash(key)%len, which is the hash of the element’s key modulo the length of the array. For example, in the hash table above, 12%16=12,28%16=12,108%16=12,140%16=12. So 12, 28, 108, and 140 are all stored at 12 subscript.

When storing a pair of values (Key — >Value pair), it is actually stored in an Entry object E, and the program calculates the storage location of the Entry object through the Key. In other words, the correspondence between Key — >Value is realized through the process of Key — Entry — Value, so the Value exists wherever the superficially known Key exists.

The parameter values in the constructor

Static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; Static final int MAXIMUM_CAPACITY = 1 << 30; Static final float DEFAULT_LOAD_FACTOR = 0.75f; // The storage structure inside the HashMap is an array, where the array is empty, i.e. the state before initialization static final Entry<? ,? >[] EMPTY_TABLE = {}; // Empty storage entity TRANSIENT Entry<K,V>[] TABLE = (Entry<K,V>[]) EMPTY_TABLE; // Number of key-value pairs stored transient int size; // Threshold. When table == {}, this value is the initial capacity (default initial capacity is 16). When the table is filled, that is, after the memory space is allocated for the table, the threshold is generally capacity*loadFactory. For HashMap capacity expansion, see threshold int threshold. // loadFactor, which represents how full the table is. The default value is 0.75. // For quick failures, since the HashMap is not thread-safe, if the structure of the HashMap changes (such as put, remove, etc.) during the iteration of the HashMap due to participation of other threads, Need to throw an exception ConcurrentModificationException transient int modCount;Copy the code

1.2.2 Put source code Analysis

1. When we put an element into a hashMap, we first determine whether the map is empty and, if so, populate the array

2. If the Key is null, store the value in table[0] or in table[0]

Hash the key to make the hash uniform, and then determine the position of the table using indexFor (h & (length-1)). It is essentially the same as the mod of the hash value divided by the table length.

4, then loop through the list to see if the key exists in the list, if so, replace the old value. Then return the old value

5. If no entry node exists, add a new entry node at the end.

Public V put(K key, V value) {// If the table array is an empty array {}, inflateTable(threshold); If (key == NULL) return putForNullKey(value); if (key == null) return putForNullKey(value); int hash = hash(key); Int I = indexFor(hash, table.length); int I = indexFor(hash, table.length); For (Entry<K,V> e = table[I]; e ! = null; E = e.ext) {// If the corresponding data already exists, perform overwrite operation. Replace the old value with the new value and return the old value Object k; if (e.hash == hash && ((k = e.key) == key || key.equals(k))) { V oldValue = e.value; e.value = value; e.recordAccess(this); Return oldValue; // Return oldValue; } } modCount++; AddEntry (hash, key, value, I); addEntry(hash, key, value, I); // Add an entry return null; }Copy the code

1.2.3 addEntry and Capacity Expansion

void addEntry(int hash, K key, V value, int bucketIndex) { if ((size >= threshold) && (null ! = table[bucketIndex])) { resize(2 * table.length); // When the size exceeds the threshold and a hash conflict is about to occur, expand the capacity. The new capacity is twice the old capacity hash = (null! = key) ? hash(key) : 0; bucketIndex = indexFor(hash, table.length); CreateEntry (Hash, key, Value, bucketIndex); } void createEntry(int hash, K key, V value, int bucketIndex) {Entry<K,V> e = table[bucketIndex]; Table [bucketIndex] = new Entry<>(hash, key, value, e); // Perform the link operation so that the newly inserted element points to the original element. // this ensures that newly inserted elements are always in the list's head, size++; // Number of elements +1}Copy the code

1. After we determine the location of the elements to be inserted, we need to judge whether the current linked list needs to be expanded

2. If the current table array is larger than Threashold (default table length x load factor), expand the array twice as long as the current array length

3. When the array is expanded, the positions of all elements need to be recalculated and then inserted into the corresponding positions. Capacity expansion methods mainly include REsize and transfer method in REsize, which will not be discussed in detail

As shown in figure

Before capacity expansion, 3,7, and 5 are all at the position of 1 when the molos of array length 2 are taken.

After the expansion, the length of array table is 4, 7, and 3. After the rubbing of 3, the rubbing of 5 is 1

4. Execute createEntry to insert the generated node into the head of the corresponding position.

1.3 Analysis of HashMap infinite loop

Now that we know about the PUT and capacity expansion operations in the hashMap, we simulate a scenario. There is the original hashMap as follows

In a multi-threaded environment, thread 1 and thread 2 can be used to expand the map.

Suspended when thread 1 executes the following code

At this point, the data structure of thread 1 is as follows: table has been expanded, and the table is suspended when the position of each element is recalculated. At this point, key 3 is still in position 1, and 3. Next is 7.

In this case, thread 2 completes capacity expansion, and the data structure after capacity expansion is

When thread 1 is scheduled to execute, because thread 1 executes e.next =newTable[I]; Insert key3 into position 3 with 3.next=key7. Now e=key3, next=key7;

When e=next, e=key7, next=key7;

Then start the next round of while. Next is null when thread 2 executes, but key7.next is key3 when thread 1 executes. So when the second loop starts, next=e.next, next= key3.

Key7 is then inserted in front of key3 by head insertion. After next=e.next, e=key3, next=key3;

Then the third cycle begins. E and next are both key3. So according to the head insertion method. Key3 is inserted before key7, which causes key3.next to be key7 and key7.next to be key3

So when we go to get value, when we get to 3, there will be an infinite loop. You never get the data back.

conclusion

HashMap is expanded under concurrent cause an infinite loop, because multiple threads concurrently, because a thread with the first to complete the expansion, the original Map linked list hash again to their own table, and list into the reverse, a thread after expansion, and do their own hash, again will list into positive sequence list in reverse chronological order. The result is a circular linked list that creates an infinite loop when there are no elements in the GET table. In 1.8, linked lists were expanded to red-black trees without related problems.