Java 8 ConcurrentHashMap source analysis

The import

ConcurrentHashMap is a thread-safe implementation of HashMap. For a summary of HashMap analysis, see article Java 8 HashMap source code analysis. This article will analyze ConcurrentHashMap based on the Java 8 implementation in Java 8. From the perspective of the ConcurrentHashMap implementation in Java 8, a major change has been made in Java 8. This article will only analyze the Implementation of Java 8, but not the implementation before Java 8. To better import this article, first show the structure of ConcurrentHashMap, see the image below:

ConcurrentHashMap structure

Similar to HashMap, ConcurrentHashMap uses a table to store nodes. ConcurrentHashMap also uses the hashCode of the record’s key to find the record’s storage index, and handles hash collisions similarly. Conflicting records are stored in the same location, forming a linked list that is converted to a red-black tree when the length of the list is greater than 8, reducing the complexity of lookups from O(N) to O(lgN). The implementation of ConcurrentHashMap and how ConcurrentHashMap ensures thread-safety in concurrent environments will be examined in detail below.

ConcurrentHashMap,

The hash bucket Table was initialized

If the table has not been initialized, the initTable method will be called to initialize the table. The following shows the specific process code for initializing the table:

private final Node<K,V>[] initTable() { Node<K,V>[] tab; int sc; while ((tab = table) == null || tab.length == 0) { if ((sc = sizeCtl) < 0) Thread.yield(); // lost initialization race; just spin else if (U.compareAndSwapInt(this, SIZECTL, sc, -1)) { try { if ((tab = table) == null || tab.length == 0) { int n = (sc > 0) ? sc : DEFAULT_CAPACITY; @SuppressWarnings("unchecked") Node<K,V>[] nt = (Node<K,V>[])new Node<? ,? >[n]; table = tab = nt; sc = n - (n >>> 2); } } finally { sizeCtl = sc; } break; } } return tab; }Copy the code

SizeCtl is the sizeCtl variable. Here is its definition:


    /**
     * Table initialization and resizing control.  When negative, the
     * table is being initialized or resized: -1 for initialization,
     * else -(1 + the number of active resizing threads).  Otherwise,
     * when table is null, holds the initial table size to use upon
     * creation, or 0 for default. After initialization, holds the
     * next element count value upon which to resize the table.
     */
    private transient volatile int sizeCtl;

Copy the code

SizeCtl is a shared variable used to synchronize multiple threads. If its current value is negative, then the table is being initialized or expanded by a thread. Therefore, if a thread wants to initialize the table or expand the table, it needs to compete with the shared variable sizeCtl. The thread that got the variable is allowed to do the next operation, the thread that didn’t get the variable will keep spinning to try to get the shared variable, so the thread that got the sizeCtl variable needs to set it back after it’s done so that the other thread can spin to do the next operation. In initTable we can see that when the thread finds sizeCtl less than 0, it will give up CPU time and try again later. When it finds sizeCtl less than 0, I’m going to make the sizeCtl shared variable -1 by calling the method compareAndSwapInt to tell any other thread that’s trying to get the sizeCtl shared variable, that it’s currently being consumed by this thread, that you can rest for a while while I finish my task, and I’ll try it again later, I’ll release it when I’m done. Other threads will understand this communication when they find sizeCtl less than zero, and they will give up CPU time and wait for the next schedule to try to get sizeCtl to do their job. After completing the task of initializing the table, the thread needs to set sizeCtl to a state that allows other threads to obtain variables. Another point to note is that the thread checks sizeCtl twice before and after it is set through the U.compareAndSwapInt method. To check whether the table has been initialized, this check is necessary because in a concurrent environment, the previous thread may be initializing the table but has not initialized it successfully, i.e. the table is still null. And one thread finds that the table is null and it contests sizeCtl to initialize the table, but after the current thread completes the initialization, the thread that tried to initialize the table gets sizeCtl, but the table is already initialized, so, without checking again, It is extremely unsafe to overwrite updates from threads that later perform a PUT operation.

ConcurrentHashMap Description of the query record method

To query a record in ConcurrentHashMap, we first need to know the position of the table in which the record is stored (which can be a card slot). Each slot has a linked list or a red-black tree), which may be null. If null, the record to be queried does not already exist in ConcurrentHashMap. Otherwise, the record is searched in the linked list or red-black tree. ConcurrentHashMap get (); ConcurrentHashMap

public V get(Object key) { Node<K,V>[] tab; Node<K,V> e, p; int n, eh; K ek; int h = spread(key.hashCode()); if ((tab = table) ! = null && (n = tab.length) > 0 && (e = tabAt(tab, (n - 1) & h)) ! = null) { if ((eh = e.hash) == h) { if ((ek = e.key) == key || (ek ! = null && key.equals(ek))) return e.val; } else if (eh < 0) return (p = e.find(h, key)) ! = null ? p.val : null; while ((e = e.next) ! = null) { if (e.hash == h && ((ek = e.key) == key || (ek ! = null && key.equals(ek)))) return e.val; } } return null; }Copy the code

First, calculate the hashCode of the key of the record, then obtain the index of the record in the table by using the calculation method (hashCode & (length-1)), then determine whether the position is null, if it is null, return NULL, otherwise, If the first element at that location (the head of the list or the root of the red-black tree) matches the record we are looking for first, then the value of that node is returned directly. Otherwise, if the hashCode of the node is less than 0, then the location is a red-black tree. Why a hashCode value less than 0 is a red-black tree and not a linked list depends on the following code:

static final int TREEBIN = -2; // hash for roots of trees TreeBin(TreeNode<K,V> b) { super(TREEBIN, null, null, null); . }Copy the code

If the hashCode value of the first node in the table is less than 0, the value of the first node in the table is a red black tree. If the hashCode value of the first node is less than 0, the value of the first node in the table is a red black tree. If it is a red-black tree, the Node is found by calling the find method of a Node whose find method is overridden in a subclass. In another case, the index position of the table is a linked list, so records are searched through the search method of the linked list. Finally, note that ConcurrentHashMap is a thread-safe HashMap, but we haven’t found any lock-equivalent components used to do thread synchronization during get methods. Why? For reads, it is normal to allow multiple threads to read at the same time, and ConcurrentHashMap does some tricks with Node’s implementation:

static class Node<K,V> implements Map.Entry<K,V> { final int hash; final K key; volatile V val; volatile Node<K,V> next; . } /** * The array of bins. Lazily initialized upon first insertion. * Size is always a power of two. Accessed directly by iterators. */ transient volatile Node<K,V>[] table;Copy the code

We found that the table array was volatile, which meant that we did not need to worry about thread visibility of the table array, and therefore did not need to lock to achieve concurrency.

ConcurrentHashMap Explains how to insert records

The query method of ConcurrentHashMap has been analyzed in the above article. Now let’s analyze how the insertion operation of ConcurrentHashMap is completed. Note that when performing the PUT operation, we may find that the table array is not initialized or the number of records in the table exceeds the threshold. In the former case, we need to initialize the table, while in the latter case, we need to expand the table. The process of initializing a table has been analyzed in the preceding section. The following section only analyzes table expansion operations. First, we need to calculate the hashCode of the key of the record, calculate the index that should be stored in the table array according to the hashCode, and then store it in the corresponding linked list or red-black tree. And in some cases to carry out linked list conversion red-black tree operations, as well as table expansion operations. Another important thing is to change the size of the table, which will be discussed later in this article. The following shows the process involved in a PUT operation:

public V put(K key, V value) { return putVal(key, value, false); } /** Implementation for put and putIfAbsent */ final V putVal(K key, V value, boolean onlyIfAbsent) { if (key == null || value == null) throw new NullPointerException(); int hash = spread(key.hashCode()); int binCount = 0; for (Node<K,V>[] tab = table;;) { Node<K,V> f; int n, i, fh; if (tab == null || (n = tab.length) == 0) tab = initTable(); else if ((f = tabAt(tab, i = (n - 1) & hash)) == null) { if (casTabAt(tab, i, null, new Node<K,V>(hash, key, value, null))) break; // no lock when adding to empty bin } else if ((fh = f.hash) == MOVED) tab = helpTransfer(tab, f); else { V oldVal = null; synchronized (f) { if (tabAt(tab, i) == f) { if (fh >= 0) { binCount = 1; for (Node<K,V> e = f;; ++binCount) { K ek; if (e.hash == hash && ((ek = e.key) == key || (ek ! = null && key.equals(ek)))) { oldVal = e.val; if (! onlyIfAbsent) e.val = value; break; } Node<K,V> pred = e; if ((e = e.next) == null) { pred.next = new Node<K,V>(hash, key, value, null); break; } } } else if (f instanceof TreeBin) { Node<K,V> p; binCount = 2; if ((p = ((TreeBin<K,V>)f).putTreeVal(hash, key, value)) ! = null) { oldVal = p.val; if (! onlyIfAbsent) p.val = value; } } } } if (binCount ! = 0) { if (binCount >= TREEIFY_THRESHOLD) treeifyBin(tab, i); if (oldVal ! = null) return oldVal; break; } } } addCount(1L, binCount); return null; }Copy the code

First, calculate the hashCode of the key, then calculate the index position of the table, and obtain the index value. If the position is still null, then insert the new record into the index position of the table by calling casTabAt method. Otherwise, insert the new record into the index position of the table. Use the synchronized keyword to lock the index position of the table. It should be noted that only the index position of the table is locked by the current thread, but no other positions are locked. Therefore, other threads can safely obtain other table positions for operation. This improves the concurrency of ConcurrentHashMap. If hashCode is less than 0, it is a red-black tree. If hashCode is not less than 0, it is still a linked list. If it is a linked list, it looks to see if the existing record key is the same as the current record to be inserted. If so, the effect of this put is replace, otherwise, the record is added to the list. If it is a red-black tree, the insert is performed by calling the putTreeVal method. After the insertion operation is completed, it is necessary to determine whether this operation is an update operation. If it is an update operation, the size will not change. Otherwise, if this PUT operation is an add operation, the size update operation will be required, which involves the concurrent environment, so it is complicated. In addition, the expansion operation of the table will also occur when the size is updated. If the number of records in the table reaches the threshold after the size is updated, the expansion operation needs to be performed, which is also a relatively complicated step. The difference between ConcurrentHashMap and HashMap is that a HashMap allows a key and value to be null, while ConcurrentHashMap does not allow a key and value to be null. If the key or value is found to be null, the NPE will be thrown. This also means that in ConcurrentHashMap you can use the get operation to test for a record, because as long as the GET method returns NULL, In HashMap, the get operation returns NULL, possibly because the value of the queried record is null. Therefore, the GET method cannot be used to test whether a certain record exists in the table.

ConcurrentHashMap The number of records is updated

When analyzing the PUT operation, the number of records in the table needs to be updated. If the number of records exceeds the threshold after the update, the table needs to be expanded. The following describes the context of this process. Updating the number of records is done by calling the method addCount, the details of which are as follows:

private final void addCount(long x, int check) { CounterCell[] as; long b, s; if ((as = counterCells) ! = null || ! U.compareAndSwapLong(this, BASECOUNT, b = baseCount, s = b + x)) { CounterCell a; long v; int m; boolean uncontended = true; if (as == null || (m = as.length - 1) < 0 || (a = as[ThreadLocalRandom.getProbe() & m]) == null || ! (uncontended = U.compareAndSwapLong(a, CELLVALUE, v = a.value, v + x))) { fullAddCount(x, uncontended); return; } if (check <= 1) return; s = sumCount(); } if (check >= 0) { Node<K,V>[] tab, nt; int n, sc; while (s >= (long)(sc = sizeCtl) && (tab = table) ! = null && (n = tab.length) < MAXIMUM_CAPACITY) { int rs = resizeStamp(n); if (sc < 0) { if ((sc >>> RESIZE_STAMP_SHIFT) ! = rs || sc == rs + 1 || sc == rs + MAX_RESIZERS || (nt = nextTable) == null || transferIndex <= 0) break; if (U.compareAndSwapInt(this, SIZECTL, sc, sc + 1)) transfer(tab, nt); } else if (U.compareAndSwapInt(this, SIZECTL, sc, (rs << RESIZE_STAMP_SHIFT) + 2)) transfer(tab, null); s = sumCount(); }}}Copy the code

ConcurrentHashMap maintains the baseCount to represent the current number of records, which will be used later in the size method to get the number of records, and is updated by calling addCount during put and remove operations. If the array of counterCells is empty, the array of counterCells is initialized by calling the method fullAddCount, because this section is too complex to analyze at this point. When updating the number of records in the table, there is another case that needs to be considered. When the number of records reaches the threshold, the expansion operation is required. This part of the code is too complicated, and the condition of the expansion operation of ConcurrentHashMap seems to be different from that of HashMap. If it is not expanded, it needs to be expanded “. Expansion requires the transfer method to migrate the old records to the new table. For now, we need to know that ConcurrentHashMap may expand when we update the number of records in the table if “the table is too small and has not been expanded”. This part of the code will be analyzed and summarized at a suitable time in the future.

ConcurrentHashMap Deletes a record

Now let’s examine how ConcurrentHashMap performs record removal. Here is the code for calling the remove method:

public V remove(Object key) { return replaceNode(key, null, null); } final V replaceNode(Object key, V value, Object cv) { int hash = spread(key.hashCode()); for (Node<K,V>[] tab = table;;) { Node<K,V> f; int n, i, fh; if (tab == null || (n = tab.length) == 0 || (f = tabAt(tab, i = (n - 1) & hash)) == null) break; else if ((fh = f.hash) == MOVED) tab = helpTransfer(tab, f); else { V oldVal = null; boolean validated = false; synchronized (f) { if (tabAt(tab, i) == f) { if (fh >= 0) { validated = true; for (Node<K,V> e = f, pred = null;;) { K ek; if (e.hash == hash && ((ek = e.key) == key || (ek ! = null && key.equals(ek)))) { V ev = e.val; if (cv == null || cv == ev || (ev ! = null && cv.equals(ev))) { oldVal = ev; if (value ! = null) e.val = value; else if (pred ! = null) pred.next = e.next; else setTabAt(tab, i, e.next); } break; } pred = e; if ((e = e.next) == null) break; } } else if (f instanceof TreeBin) { validated = true; TreeBin<K,V> t = (TreeBin<K,V>)f; TreeNode<K,V> r, p; if ((r = t.root) ! = null && (p = r.findTreeNode(hash, key, null)) ! = null) { V pv = p.val; if (cv == null || cv == pv || (pv ! = null && cv.equals(pv))) { oldVal = pv; if (value ! = null) p.val = value; else if (t.removeTreeNode(p)) setTabAt(tab, i, untreeify(t.first)); } } } } } if (validated) { if (oldVal ! = null) { if (value == null) addCount(-1L, -1); return oldVal; } break; } } } return null; }Copy the code

The deletion operation is a write operation. Therefore, lock the index position in the table during the deletion. ConcurrentHashMap locks the index position in the table using the synchronized keyword and then deletes the table. The remove method calls the replaceNode method to perform the actual operation. The delete operation first evaluates the hashCode of the record and then evaluates the index of the table based on the hashCode. The table index is then deleted using different methods depending on whether the index position in the table is a linked list or a red-black tree. The deletion operation requires an update of the number of records (by calling the addCount method).

ConcurrentHashMap size method

The ConcurrentHashMap method is used to obtain the number of records in the table. The ConcurrentHashMap method is used to obtain the number of records.

public int size() { long n = sumCount(); return ((n < 0L) ? 0 : (n > (long)Integer.MAX_VALUE) ? Integer.MAX_VALUE : (int)n); } final long sumCount() { CounterCell[] as = counterCells; CounterCell a; long sum = baseCount; if (as ! = null) { for (int i = 0; i < as.length; ++i) { if ((a = as[i]) ! = null) sum += a.value; } } return sum; }Copy the code

The number of ConcurrentHashMap records can be obtained by combining baseCount and counterCells arrays. The total number of ConcurrentHashMap records can be obtained by accumulating the number of baseCount and counterCells arrays.

Implementation of ConcurrentHashMap in java7

From the previous analysis of the ConcurrentHashMap implementation in Java8, let’s examine the implementation in Java7. The java7 implementation uses the array + linked list method at the bottom. The ConcurrentHashMap data structure in Java7 is shown below:

ConcurrentHashMap structure diagram in java7

The size of the segment array determines the concurrency of ConcurrentHashMap. The default value is 16. This is because the java7 ConcurrentHashMap implementation uses a method called segmented locking, which stores records in segments. The access of different segments does not affect each other. When one thread wants to access a segment, it needs to be locked, while other threads cannot access the segment locked by other threads. This is a great improvement over the whole locking approach, and the analysis of the specific actions in java7 will not be covered here, but will be summarized in another article if necessary.

This article analyzes the implementation details of ConcurrentHashMap according to the source code in JDK 8. However, because the implementation of ConcurrentHashMap is too complex, this article only analyzes the superficial content. The significance of analyzing the source code of ConcurrentHashMap is to understand why ConcurrentHashMap is thread-safe and what ConcurrentHashMap adds to HashMap for thread-safe implementation. In this paper, by analyzing the implementation of ConcurrentHashMap and comparing the implementation of HashMap, it can be obviously found that their implementation complexity is not at the same level, and HashMap is more about playing with data structure. ConcurrentHashMap, on the other hand, needs to consider how to efficiently implement thread safety in concurrent environment. In short, ConcurrentHashMap not only implements all functions supported by HashMap, but also implements thread safety under the premise of maintaining the same efficiency as HashMap. ConcurrentHashMap is thread-safe based on CAS. CAS is a lightweight lock that does not block threads, but waits until a variable is acquired, and then performs business operations. This is a big improvement over locking that blocks threads to achieve thread-safe. Because the synchronization control based on lock need to get the thread lock, and gets the lock is blocked waiting for before, it needs another thread lock to awaken it let him again to competition, about this part of the content can refer to the article AbstractQueuedSynchronizer Java synchronization frame, AQS is Java to achieve the bottom support lock, but also for programmers to achieve thread synchronization framework, based on AQS to achieve their own thread synchronizer is very simple. Due to the complexity of ConcurrentHashMap, this paper is regarded as the beginning of the analysis of ConcurrentHashMap, which will be supplemented from time to time in the future.