preface

For ConcurrentHashMap analysis, there is a lot of information available on the web, and the source code analysis of ConcurrentHashMap is helpful in consolidating and improving thinking about Java concurrency and Map collections. This article is the author’s own analysis and summary of ConcurrentHashMap. Due to my limited level, there may be flaws and mistakes in the analysis process, I hope you can point out, learn together, progress together.

This analysis is expected to achieve the following objectives:

  1. To understandConcurrentHashMapOptimization methods for concurrency.
  2. Understand the combination of concurrency measures and Map collections.

ConcurrentHashMap structure

Here is the class inheritance diagram for ConcurrentHashMap:

It’s not surprising that ConcurrentHashMap inherits AbstractMap, as this is what every Map collection inherits. ConcurrentMap is a novelty, and from its name we can guess that it includes concurrent operations on the Map collection. As the ConcurrentMap annotation shows, it is an interface that improves thread-safety and atomicity guarantees.

The code for ConcurrentHashMap is 6000+ lines annotated and would probably die if you started with the source code directly. So let’s start with the overall architecture and then dive into the key methodology. Let’s take a look at its internal structure first.

Node types

From the figure above, we can see that ConcurrentHashMap has five types of nodes, which are stored in a table: Node, TreeBin, TreeNode, ForwardingNode, ReservationNode. We can understand Node and TreeNode. After all, one is a linked list Node and the other is a TreeNode. But why is the root tree TreeBin and not TreeNode, and what are ForwardingNode and ReservationNode?

The red and black nodes of the tree in the figure above are not labeled randomly. In fact, the tree structure of ConcurrentHashMap is a red and black tree. Those who know about the red and black tree will surely remember the complexity of the left and right rotation balancing operation. So the TreeBin is used as a proxy node for these operations, while the TreeNode only has lookup methods.

One might wonder here, why not use a ready-made TreeMap instead of TreeBin?

TreeMapIn the callputWhen storing data, the default comparison sort is performed for keys, if passed to TreeMapComparatorOr the key itself is implementedComparableInterfaces are sorted according to custom rules. whileConcurrentHashMapThis functionality is not required, and TreeMap strongly keys toComparableType:

The problem is that the following code gets an error:

When Double.compareTo is called, Integer is cast, but Double and Integer are not parent-child, so a type conversion exception is reported.

For the ForwardingNode node, its comment reads:

A node inserted at head of bins during transfer operations.
Copy the code

A node inserted into the header during the conversion operation is associated with the expansion and reduction of ConcurrentHashMap, which will be discussed later

A ReservationNode is a placeholder for a node of the ReservationNode type based on its annotations, as if there is nothing to do with it at the moment.

/** * A place-holder node used in computeIfAbsent and compute. * Placeholder node for computeIfAbsent and compute */
Copy the code

We also need to understand the constant and variable fields of ConcurrentHashMao to make it easier to understand when we analyze the code later.

constant

/** * Maximum capacity */
private static final int MAXIMUM_CAPACITY = 1 << 30;

/** * The default capacity */
private static final int DEFAULT_CAPACITY = 16;

/ * * * the maximum capacity of the array, the in the ArrayList is, most people agree that the answer is: * reduce possibility of memory some machines [https://www.zhihu.com/question/27999759] * /
static final int MAX_ARRAY_SIZE = Integer.MAX_VALUE - 8;

/** * The default concurrency level for this hash table, for compatibility with previous versions */
private static final int DEFAULT_CONCURRENCY_LEVEL = 16;

/** * load factor */
private static final float LOAD_FACTOR = 0.75 f;

/** * The threshold for list conversion to tree */
static final int TREEIFY_THRESHOLD = 8;

/** * tree conversion to the linked list threshold */
static final int UNTREEIFY_THRESHOLD = 6;

/** * The number of current slot nodes must be greater than this value */ before the list can be converted to a tree
static final int MIN_TREEIFY_CAPACITY = 64;

/** * The number of nodes in the current slot must not exceed this value */ before the tree can be converted to a linked list
private static final int MIN_TRANSFER_STRIDE = 16;

/** * Seed used to generate random numbers during expansion */
private static final int RESIZE_STAMP_BITS = 16;

/** * Maximum number of concurrent threads */
private static final int MAX_RESIZERS = (1< < (32 - RESIZE_STAMP_BITS)) - 1;

/** * The number of offset bits required to record the sizeCtl size
private static final int RESIZE_STAMP_SHIFT = 32 - RESIZE_STAMP_BITS;


static final int MOVED     = -1; // Identifies the ForwardingNode node
static final int TREEBIN   = -2; // Identifies red and black root nodes
static final int RESERVED  = -3; // Identifies the ReservationNode node
static final int HASH_BITS = 0x7fffffff; // Identifies a common node

/** Number of CPU cores used for capacity expansion */
static final int NCPU = Runtime.getRuntime().availableProcessors();
Copy the code

variable

/** * Node array */
transient volatile Node<K,V>[] table;

/** * The Node array used in the expansion process uses progressive data migration */
private transient volatile Node<K,V>[] nextTable;

/** * The base counter, which is used in threadless contention conditions, is also used to perform fallback operations during table initialization, and is updated via CAS */
private transient volatile long baseCount;

* >0: array capacity after initialization * 0: default initial value * -1: single-thread capacity expansion * -1(1 + nThread): multi-thread capacity expansion */
private transient volatile int sizeCtl;

/** * subscript */ of another table during expansion
private transient volatile int transferIndex;

/** * The spin lock when expanding or creating CounterCells. */
private transient volatile int cellsBusy;

/** * CounterCell array for hotspot data segmentation, similar to LongAddr's Cell */
private transient volatile CounterCell[] counterCells;
Copy the code

Operation analysis

put

PutVal putVal putVal putVal putVal putVal putVal putVal putVal putVal putVal

final V putVal(K key, V value, boolean onlyIfAbsent) {
    if (key == null || value == null) throw new NullPointerException();
    // Compute the hash value based on the key
    int hash = spread(key.hashCode());
    int binCount = 0;
    for (Node<K,V>[] tab = table;;) {
        Node<K,V> f; int n, i, fh; K fk; V fv;
        // Initialize the array if it is empty
        if (tab == null || (n = tab.length) == 0)
            tab = initTable();
        // calculate the subscript :(table.length-1) & key.hashcode
        // If the calculated slot element is empty, put it in and return it
        else if ((f = tabAt(tab, i = (n - 1) & hash)) == null) {
            if (casTabAt(tab, i, null.new Node<K,V>(hash, key, value)))
                break;                   // no lock when adding to empty bin
        }
        // Check whether the slot status is MOVED, that is, the node is a Fowarding node,
        // Capacity expansion is being performed to assist in data migration.
        else if ((fh = f.hash) == MOVED)
            tab = helpTransfer(tab, f);
        // The hash of the first node is the same as the hash of the element to be added.
        // Set the value directly
        else if(onlyIfAbsent && fh == hash && ((fk = f.key) == key || (fk ! =null&& key.equals(fk))) && (fv = f.val) ! =null)
            return fv;
        else {
            V oldVal = null;
            synchronized (f) {
                // check if f is the first node
                if (tabAt(tab, i) == f) {
                    if (fh >= 0) {
                        binCount = 1;
                        // Iterate through the list and replace the value of the node if there is an element in the list that has the same hash value as key
                        for (Node<K,V> e = f;; ++binCount) {
                            K ek;
                            if(e.hash == hash && ((ek = e.key) == key || (ek ! =null && key.equals(ek)))) {
                                oldVal = e.val;
                                if(! onlyIfAbsent) e.val = value;break;
                            }
                            Node<K,V> pred = e;
                            if ((e = e.next) == null) {
                                pred.next = new Node<K,V>(hash, key, value);
                                break; }}}// Start traversing the tree node
                    else if (f instanceof TreeBin) {
                        Node<K,V> p;
                        binCount = 2;
                        // Put the key-value into the red-black tree
                        if((p = ((TreeBin<K,V>)f).putTreeVal(hash, key, value)) ! =null) {
                            oldVal = p.val;
                            if (!onlyIfAbsent)
                                p.val = value;
                        }
                    }
                    // ReservationNode is a placeholder secondary node, so throw an exception
                    else if (f instanceof ReservationNode)
                        throw new IllegalStateException("Recursive update"); }}// Decide whether to turn the list to red black tree
            if(binCount ! =0) {
                // TREEIFY_THRESHOLD = 8
                if (binCount >= TREEIFY_THRESHOLD)
                    treeifyBin(tab, i);
                if(oldVal ! =null)
                    return oldVal;
                break;
            }
        }
    }
    addCount(1L, binCount);
    return null;
}
Copy the code

PutVal’s method is a long one, but it’s really just a judgment of various situations. Let’s start with the treeifyBin method.

private final void treeifyBin(Node<K,V>[] tab, int index) {
    Node<K,V> b; int n;
    if(tab ! =null) {
        // MIN_TREEIFY_CAPACITY=64
        // When the capacity of the table is smaller than 64, the table is only expanded
        if ((n = tab.length) < MIN_TREEIFY_CAPACITY)
            tryPresize(n << 1);
        else if((b = tabAt(tab, index)) ! =null && b.hash >= 0) {
            synchronized (b) {
                if (tabAt(tab, index) == b) {
                    TreeNode<K,V> hd = null, tl = null;
                    // Lists are converted to red-black trees
                    for(Node<K,V> e = b; e ! =null; e = e.next) {
                        TreeNode<K,V> p =
                            new TreeNode<K,V>(e.hash, e.key, e.val,
                                              null.null);
                        if ((p.prev = tl) == null)
                            hd = p;
                        else
                            tl.next = p;
                        tl = p;
                    }
                    // Wrap it as TreeBin and put it in table[index]
                    setTabAt(tab, index, new TreeBin<K,V>(hd));
                }
            }
        }
    }
}
Copy the code

Save the initTable, helpTransfer, and putTreeVal methods for later.

get

Let’s look at the source of the get method:

public V get(Object key) {
    Node<K,V>[] tab; Node<K,V> e, p; int n, eh; K ek;
    // Computes the hash value
    int h = spread(key.hashCode());
    if((tab = table) ! =null && (n = tab.length) > 0 &&
        (e = tabAt(tab, (n - 1) & h)) ! =null) {
        // If the first node corresponds to it, return it directly
        if ((eh = e.hash) == h) {
            if((ek = e.key) == key || (ek ! =null && key.equals(ek)))
                return e.val;
        }
        / / to find it from the red-black tree/ForwardingNode/ReservationNode
        else if (eh < 0)
            return(p = e.find(h, key)) ! =null ? p.val : null;
        // Start with the list
        while((e = e.next) ! =null) {
            if(e.hash == h && ((ek = e.key) == key || (ek ! =null && key.equals(ek))))
                returne.val; }}return null;
}
Copy the code

There is no argument for starting a list from scratch, but insertion and deletion of a red-black tree will involve adjusting the entire structure. Let’s look at the find code for red-black trees:

final Node<K,V> find(int h, Object k) {
            if(k ! =null) {
                for(Node<K,V> e = first; e ! =null;) {int s; K ek;
                    // There are two ways to find elements in a linked list:
                    // 1. A thread is holding a write lock
                    // 2. A thread is waiting to acquire the write lock
                    if(((s = lockState) & (WAITER|WRITER)) ! =0) {
                        if(e.hash == h && ((ek = e.key) == k || (ek ! =null && k.equals(ek))))
                            return e;
                        e = e.next;
                    }
                    // Read thread +1, while updating read status
                    else if (U.compareAndSetInt(this, LOCKSTATE, s,
                                                 s + READER)) {
                        TreeNode<K,V> r, p;
                        try {
                            p = ((r = root) == null ? null :
                                 r.findTreeNode(h, k, null));
                        } finally {
                            Thread w;
                            // If the current thread is the last reader thread and a writer thread is blocked because of the read lock, the writer thread is told to try to acquire the write lock
                            if (U.getAndAddInt(this, LOCKSTATE, -READER) == (READER|WAITER) && (w = waiter) ! =null)
                                LockSupport.unpark(w);
                        }
                        returnp; }}}return null;
        }
Copy the code

ConcurrentHashMap uses a read-write lock. When a thread changes the red-black tree, the reader searches it in a linked list.

ForwardingNode also overrides Node’s find method. Let’s see how it works:

Node<K,V> find(int h, Object k) {
        outer: for (Node<K,V>[] tab = nextTable;;) {
            Node<K,V> e; int n;
            // The key is empty, the table is empty, and the slot corresponding to the key is empty
            if (k == null || tab == null || (n = tab.length) == 0 ||
                (e = tabAt(tab, (n - 1) & h)) == null)
                return null;
            for (;;) {
                int eh; K ek;
                if((eh = e.hash) == h && ((ek = e.key) == k || (ek ! =null && k.equals(ek))))
                    return e;
                if (eh < 0) {
                    // If e is still a ForwardingNode, go to the new table for migrating data
                    if (e instanceof ForwardingNode) {
                        tab = ((ForwardingNode<K,V>)e).nextTable;
                        // Start again from the outermost loop
                        continue outer;
                    }
                    else
                        This method calls the find method of the TreeBin node
                        return e.find(h, k);
                }
                // Return null if not found
                if ((e = e.next) == null)
                    return null; }}}}Copy the code

ReservationNode is a placeholder node that has no data of its own, so null is returned: ReservationNode

Node<K,V> find(int h, Object k) {
    return null;
}
Copy the code

size

After analyzing the GET and PUT methods, the constants and variables in the previous section have been used, but there is a CounterCell array that you don’t see where it is used. It is similar to the LongAddr Cell, so let’s look at how the ConcurrentHashMap is handled for hot data. The main application starts with the size method:

public int size(a) {
    long n = sumCount();
    return ((n < 0L)?0 :
            (n > (long)Integer.MAX_VALUE) ? Integer.MAX_VALUE :
            (int)n);
}
Copy the code

There’s nothing to say here, just get the total number of elements, deal with the integer out of bounds and return. Let’s focus on the sumCount method:

final long sumCount(a) {
    CounterCell[] cs = counterCells;
    long sum = baseCount;
    if(cs ! =null) {
        for (CounterCell c : cs)
            if(c ! =null)
                sum += c.value;
    }
    return sum;
}
Copy the code

Let’s compare LongAdder’s sum method:

public long sum(a) {
    Cell[] cs = cells;
    long sum = base;
    if(cs ! =null) {
        for (Cell c : cs)
            if(c ! =null)
                sum += c.value;
    }
    return sum;
}
Copy the code

Does it feel like a copy? It is obvious that they both use the segment counting method, so let’s look at how the slot object CounterCell handles concurrency conflicts. Remember when we analyzed the get method we ended up calling addCount(1L, binCount); AddCount = addCount; addCount = addCount; addCount = addCount

CounterCell[] cs; long b, s;
if((cs = counterCells) ! =null ||
    // If the CAS update fails, indicating a concurrent race, add the counter to the CounterCell array! U.compareAndSetLong(this, BASECOUNT, b = baseCount, s = b + x)) {
    CounterCell c; long v; int m;
    boolean uncontended = true;
    if (cs == null || (m = cs.length - 1) < 0 ||
        // Compute slot index based on thread random
        (c = cs[ThreadLocalRandom.getProbe() & m]) == null| |! (uncontended = U.compareAndSetLong(c, CELLVALUE, v = c.value, v + x))) {// Execute this method if the update still fails
        fullAddCount(x, uncontended);
        return;
    }
    if (check <= 1)
        return;
    s = sumCount();
}
Copy the code

The first half of addCount is handled as follows: If counterCells is empty, no concurrent conflicts have occurred before, and the value can be added directly to baseCount. Otherwise, try to update the value of counterCells[I]. If the update fails, the slot is also in a concurrent conflict, and the slot needs to be expanded. The method fullAddCount is called, which has the same logic as longAccumulate. Update the value again after capacity expansion.

conclusion

The above is the analysis of this paper, which mainly starts with the overall structure of ConcurrentHashMap. It first analyzes the functions of each node, constant and variable, and then has a certain understanding of its main processing logic by understanding the three main methods get, PUT and size. Later, we will start with the expansion of ConcurrentHashMap and feel the beautiful concurrent processing logic of ConcurrentHashMap step by step through methods.