One, foreword

ConcurrentHashMap (array + linked list, segment lock, industrial hash table) has some obvious disadvantages, such as:

  • SegmentOnce the array is initialized, it cannot be expanded, which is an inevitable bottleneck for high concurrency performance improvement.
  • Hash conflicts are solved in the form of linked list. The search time complexity of linked list is O(n), and the performance will decline sharply in the case of severe hash conflicts and longer and longer linked list.
  • … …

So in java8, author Doug Lea has made major changes to ConcurrentHashMap, improving it in a number of ways, such as:

  • The data structure adopts array + linked list + red black tree, scrap segmental lockSegementTo further reduce the granularity of the lock, you can add the lock directly to the array placeholder node. The nodes with hash conflicts at the same time still adopt the linked list method, but add red-black tree for retrieval optimization, that is, the linked list and red-black tree transform each other. Even in extreme cases, the time complexity of retrieval is O(LGN), and the performance is greatly improved compared with O(n).
  • When the constructor passes in the initial capacity, it preestimates it according to the expansion factor, and calculates a more reasonable initial capacity to avoid unnecessary expansion.
  • Hash function optimization, high and low bit perturbation, further reduce hash conflict.
  • Capacity expansion mechanism optimized for java7 because of eachSegmentIndependent expansion, natural thread safety and isolation environment, deprecated after java8Segment, capacity expansion is to expand the entire array, how to achieve safe and efficient capacity expansion, compared to java7 is a bit more complicated. The author adopts multi-threaded auxiliary capacity expansion mechanism and optimizes node migration.
  • … …

Java8 ConcurrentHashMap data structure

Ii. Basic definitions

Source code constants and attributes are so numerous and recurring that they can get like a lump in your throat if you don’t know what they mean in advance. Read the source code can not be read word for word, pay attention to an extensive reading, skip and precision, extensive reading and skip is to understand the role of some definitions as soon as possible and about the logic. ConcurrentHashMap (MAXIMUM_CAPACITY, DEFAULT_CAPACITY, MAXIMUM_CAPACITY)

1. Fundamental constants

Java8 deprecates segment-based locking, but retains the internal class Segment and some related constants such as DEFAULT_CONCURRENCY_LEVEL and LOAD_FACTOR for compatibility with older versions. In java8, the capacity expansion factor is fixed at 3/4 (n – (n >>> 2)) and cannot be changed. Due to the addition of red-black tree elements to the data structure and optimization of the expansion mechanism, some constants are defined:

  • TREEIFY_THRESHOLD = 8, the threshold for the list to become a red-black tree, the number of nodes in a single list >=TREEIFY_THRESHOLDWhen the linked list can turn into a red black tree, remember it can.
  • UNTREEIFY_THRESHOLD = 6, the threshold value of the red-black tree degenerates into a linked list, which only applies to the expansion phase. When data is migrated from the old array to the new array, the number of nodes in the newly assembled red-black tree <=UNTREEIFY_THRESHOLD , the red-black tree degenerates into a linked list. As to why it’s less thanTREEIFY_THRESHOLDRather than equal, my guess is to avoid frequent transitions between linked lists and red-black trees affecting performance.
  • MIN_TREEIFY_CAPACITY = 64, array length <MIN_TREEIFY_CAPACITY , even if the linked list to red black tree threshold is reached, it is not converted, but expanded.
  • MIN_TRANSFER_STRIDE = 16, assigns the minimum step size for the task of migrating array elements to the thread.
  • RESIZE_STAMP_BITS = 16, during capacity expansion, insizeCtlThe value used to generate the stamp in,resizeStamp()It’s also used (don’t panic, we’ll go into more detail when we expand).
  • RESIZE_STAMP_SHIFT = 32 - RESIZE_STAMP_BITSIn thesizeCtlThe number of shifts used to generate stamps that record the number of expansion threads. (Don’t panic, we’ll talk more about it when we expand)
  • MOVED = -1, indicates that the array is being expanded, and the node at the location has been migrated to the new array, also asForwardingNodeHash value of the node.
  • TREEBIN = -2Is used to mark the hash value of the red-black root node.
  • HASH_BITS = 0x7fffffff, the available bits of the normal hash value, inspreadIs used to ensure that the calculated hash value does not exceedHASH_BITS(spread()Will elaborate).

Some constants must be combined with the source code to understand their purpose, so just to get a first impression.

public class ConcurrentHashMap<K.V> extends AbstractMap<K.V>
    implements ConcurrentMap<K.V>, Serializable {
    private static final long serialVersionUID = 7249069246763182397L;

    /* ---------------- Constants -------------- */

    /** * The largest possible table capacity. This value must be * exactly 1<<30 to stay within Java array allocation and indexing * bounds for power of two table sizes, and is further required * because the top two bits of 32bit hash fields are used for * control purposes. */
    // 1 * 2^30 = 1073741824
    // Maximum capacity
    private static final int MAXIMUM_CAPACITY = 1 << 30;

    /** * The default initial table capacity. Must be a power of 2 * (i.e., At least 1) and at most MAXIMUM_CAPACITY. * Default capacity */
    private static final int DEFAULT_CAPACITY = 16;

    /** * The default concurrency level for this table. Unused but * defined for compatibility with previous versions of * default concurrency level. */ is not used in Java8 for compatibility with older versions
    private static final int DEFAULT_CONCURRENCY_LEVEL = 16;

    /**
     * The load factor for this table. Overrides of this value in
     * constructors affect only the initial table capacity.  The
     * actual floating point value isn't normally used -- it is
     * simpler to use expressions such as {@codeN - (n >>> 2)} for * the associated resizing threshold. * /
    private static final float LOAD_FACTOR = 0.75 f;

    /** * The bin count threshold for using a tree rather than list for a * bin. Bins are converted to trees when adding an element to a * bin with at least this many nodes. The value must be greater * than 2, and should be at least 8 to mesh with assumptions in * tree removal about conversion back to plain bins upon * Shrinkage. * Threshold for lists to become red-black trees, >= TREEIFY_THRESHOLD lists may become red-black trees */
    static final int TREEIFY_THRESHOLD = 8;

    /** * The bin count threshold for untreeifying a (split) bin during a * resize operation. Should be less than TREEIFY_THRESHOLD, and at * most 6 to mesh with shrinkage detection under removal. <= UNTREEIFY_THRESHOLD The red-black tree is converted to a linked list, which only applies during expansion */
    static final int UNTREEIFY_THRESHOLD = 6;

    /** * If the capacity is less than 64, the list will not be converted even if it reaches the threshold of red-black tree. Instead, expand * The smallest table capacity for which bins may be treeified. * (Otherwise The table is resized if too many nodes in a bin.) * The value should be at least 4 * TREEIFY_THRESHOLD to avoid * conflicts between resizing and treeification thresholds. */
    static final int MIN_TREEIFY_CAPACITY = 64;

    /** * Minimum number of rebinnings per transfer step. Ranges are * subdivided to allow multiple resizer threads. This value * serves as a lower bound to avoid resizers encountering * excessive memory contention. The value should be at Least * DEFAULT_CAPACITY. * Minimum step size allocated to the thread for migrating array elements */
    private static final int MIN_TRANSFER_STRIDE = 16;

    /** * The number of bits used for generation stamp in sizeCtl. * Must be at least 6 for 32bit arrays. The value */ used to generate the stamp in sizeCtl
    private static int RESIZE_STAMP_BITS = 16;

    /** The maximum number of threads that can help resize. * Must fit in 32 - RESIZE_STAMP_BITS
    private static final int MAX_RESIZERS = (1< < (32 - RESIZE_STAMP_BITS)) - 1;

    /** * The bit shift for recording size stamp in sizeCtl. */
    private static final int RESIZE_STAMP_SHIFT = 32 - RESIZE_STAMP_BITS;

    // MOVED=-1 Indicates that the array is being expanded. Forwarding Nodes hash=-1
    // hash for forwarding nodes
    static final int MOVED     = -1;
    // Hash to mark red and black root nodes
    static final int TREEBIN   = -2; // hash for roots of trees
    // Absent is used in the computeIfAbsent and compute
    static final int RESERVED  = -3; // hash for transient reservations
    // The available bits of a normal hash value, used in spread to ensure that the hash value calculated does not exceed HASH_BITS
    / / 2147483647
    // 1111 1111 1111 1111 1111 1111 1111 111
    static final int HASH_BITS = 0x7fffffff; // usable bits of normal node hash
Copy the code

2. Basic attributes

Some basic properties also need to understand in advance, later read the source code will be more smooth:

  • tableRepresents the current array.
  • nextTable, the new array after expansion is not null only when the array is expanded. ifgetOperate when in the old arraytableIf the node cannot be found and there is another forwarding node at the corresponding position, thegetOperation forward tonextTable.
  • transferIndex, and records the progress of capacity expansion task allocation. If the initial value is N and capacity expansion is performed in reverse order, each step is reduced to < =0, the capacity expansion task is assigned.
  • baseCount, base count of array elements, cas modified first when there is no contentionbaseCount.
  • CounterCell[] counterCellsWhen cas is modifiedbaseCountThe thread that fails will modify the correspondingcounterCellsA counting cell in an array. So the total number of elements you want to have in the array isBaseCount +counterCells the sum of all count cell records.
  • cellsBusy, simple spin lock, used forcounterCellsEnsure that multiple threads update the number of array elements thread safe.

SizeCtl is more complex, but important, because different values play a significant role in the different states of the array:

  • When the array is not initialized,sizeCtlIs assigned an initial capacity to be used when initializing the array.
  • While the array is being initialized,sizeCtl=-1, like a lock that controls only one thread going in to initialize the array.
  • Array initialization is complete,sizeCtlThe capacity expansion threshold is assigned to be determined when the capacity expansion mechanism is triggered.
  • When you expand an array,sizeCtlIs assigned a very small negative number that controls the number of threads being added or subtracted and is used to indicate the state of the array being expanded.
/* ---------------- Fields -------------- */

/** * The array of bins. Lazily initialized upon first insertion. * Size is always a power of two. Accessed directly by Iterators. * Current array */
transient volatile Node<K,V>[] table;

/** * New array after expansion, not null only when array expansion. * The next table to use; non-null only while resizing. */
private transient volatile Node<K,V>[] nextTable;

/** * Base counter value, used mainly when there is no contention, * But also as a fallback during table initialization * races. Updated via CAS
private transient volatile long baseCount;

/** * Table initialization and resizing control. When negative, the * table is being initialized or resized: -1 for initialization, * else -(1 + the number of active resizing threads). Otherwise, * when table is null, holds the initial table size to use upon * creation, or 0 for default. After initialization, Holds the * next element count value upon which to resize the table. * sizeCtl is important because different values play an important role in different array states. * /
private transient volatile int sizeCtl;

/** * transferIndex Records the expansion progress. * The next table index (plus one) to split while resizing. */
private transient volatile int transferIndex;

/** * simple Spinlock, used to control multithreaded statistics array elements * Spinlock (locked via CAS) used when resizing and/or creating CounterCells. */
private transient volatile int cellsBusy;

/** * Table of counter cells. When non-null, size is a power of 2. * If cas fails to modify baseCount, the thread will modify a counter cell. * So the total number of elements you want is the sum of all count cells in baseCount+counterCells. * /
private transient volatile CounterCell[] counterCells;
Copy the code

Constructor optimization

The constructor in Java8 is much simpler than in Java7. It doesn’t initialize various data or arrays. It just sets an initial capacity, but optimizes how to set a reasonable initial capacity.

Java7 divides the initialCapacity passed in by the length of the segment array and simply finds a number of integers greater than or equal to the nearest 2 to the average as the initialCapacity of the internal HashEntry array. Java8 considers that the passed initialCapacity is the estimated number of elements that the user wants to add later (it may be that many elements will be added in a short period of time). If the estimated number of elements is greater than or equal to the threshold of expansion, it is easy to trigger the expansion mechanism. Therefore, java8 optimizes the initial capacity based on the capacity expansion threshold.

1, ConcurrentHashMap(int initialCapacity, float loadFactor, int concurrencyLevel)

The loadFactor and concurrencyLevel constructor can be passed in from the outside. This constructor is supposed to be compatible with older versions, because java8 has no definition for explicitly setting the expansion factor. The expansion factor is fixed at 3/4. There is also no concept of concurrencyLevel concurrencyLevel. The loadFactor passed in here is only useful for calculating the initial capacity during initialization, and does not modify the later capacity expansion factor (3/4).

InitialCapacity is considered to be the number of elements that the user wants to add in the short term. How do you find an appropriate initialCapacity to avoid unnecessary expansion?

Assume that the calculated initialCapacity is size, then the capacity expansion threshold is size*loadFactor. As long as the capacity expansion threshold is greater than the estimated capacity initialCapacity, capacity expansion will not be triggered, and the inequality is as follows:

Size * loadFactor > initialCapacity; size > initialCapacity/loadFactor

Assuming that the left side of the inequality is only 1 larger than the right, adding 1 to the right side equalizes the left and right sides: size = (initialCapacity/loadFactor) +1. In this case, size is a reasonable initial capacity. However, if size is greater than or equal to the nearest integer power of 2, tableSizeFor processing is required.

public ConcurrentHashMap(int initialCapacity,
                         float loadFactor, int concurrencyLevel) {
    if(! (loadFactor >0.0 f) || initialCapacity < 0 || concurrencyLevel <= 0)
        throw new IllegalArgumentException();
    // concurrencyLevel is not useful for compatibility with java7 versions
    if (initialCapacity < concurrencyLevel)   // Use at least as many bins
        initialCapacity = concurrencyLevel;   // as estimated threads
    // initialCapacity and loadFactor are both passed in from the consumer, so initialCapacity can be optimized
    long size = (long) (1.0 + (long)initialCapacity / loadFactor);
    // Get a number greater than or equal to the nearest power of 2
    int cap = (size >= (long)MAXIMUM_CAPACITY) ?
        MAXIMUM_CAPACITY : tableSizeFor((int)size);
    // sizeCtl stores initial capacity
    this.sizeCtl = cap;
}
Copy the code

2, ConcurrentHashMap (int initialCapacity)

Pass only one parameter, initialCapacity, which should be more commonly used, and a no-parameter constructor, with a default initialCapacity of 16.

The optimization details are similar to the last constructor, the expansion factor is fixed 3/4, assuming the initial capacity is size, then the equation is unitary:

Size = initialCapacity * 4/3 +1, that is, size =(initialCapacity + initialCapacity * 1/3) +1.

But why isn’t that in the code? Instead, size = (initialCapacity+ initialCapacity*1/2) + 1, so that the threshold is 2/3, not 3/4.

Personal guess: in order to pursue computing performance, 1/3 can not use bit operation, 1/2 can use bit operation >>>1, and 1/2 is slightly larger than 1/3, the calculated size will not deviate too much from the ideal value and waste idle resources.

public ConcurrentHashMap(int initialCapacity) { if (initialCapacity < 0) throw new IllegalArgumentException(); int cap = ((initialCapacity >= (MAXIMUM_CAPACITY >>> 1)) ? MAXIMUM_CAPACITY : tableSizeFor(initialCapacity + (initialCapacity >>> 1) + 1)); SizeCtl = cap; // Set the size of the array to the power of 2. SizeCtl = cap; }Copy the code

3, tableSizeFor

TableSizeFor is a function that simply retrieves the number of integers greater than or equal to the nearest value of 2. Its operation process is to make a number of binary leftmost 1 to the right all into 1, and then +1 can get a 2 integer power of the number.

/** * Find the nearest 2 integer greater than or equal to c *@param c
 * @return* /
private static final int tableSizeFor(int c) {
    // Why subtract 1?
    int n = c - 1;
    / / move to the right one, and the old value |, logically can get 2 to 1
    n |= n >>> 1;
    / / move to the right two, and the old value |, logically 4 1 can be obtained
    n |= n >>> 2;
    / / four shift to the right, and the old value |, logically eight 1 can be obtained
    n |= n >>> 4;
    / / move eight to the right, and the old value |, logically 16 1 can be obtained
    n |= n >>> 8;
    / / move to the right to 16, and the old value |, logically 32 1 can be obtained
    n |= n >>> 16;
    // n+1 is the nearest 2 integer greater than or equal to c
    return (n < 0)?1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
}
Copy the code

The calculation process is shown as follows: Why do we subtract one from the c we passed in?

Assume that c = 16 binary: 10000, c – 1 = 15, binary: 1111 after a moves to the right and the old value | operation, or 1111, or 16 + 1.

And c, instead of minus 1, if YOU take 10,000, you get 11111, plus 1 is 32, which is not quite right, because you’re just passing in a number that’s 2 to the power of an integer, so you get exactly 2 times c. So all the c’s that are passed in are subtracted by one just to accommodate this situation.

Fourth, the node definition must not be unknown

Before reading the source logic such as PUT and GET, you must also understand the definitions of the following nodes:

  1. The common nodes that make up a linked listNode.
  2. That make up a red-black treeTreeBin+TreeNode.
  3. The forwarding nodesForwardingNode.

1. Common Node Node

Similar to the java7 definition of HashEntry, it is the basic element that makes up a linked list. Hash >=0. In the following code, hash>=0 is a common node, that is, a linked list.

static class Node<K.V> implements Map.Entry<K.V> {
    final int hash;
    final K key;
    volatile V val;
    volatileNode<K,V> next; . ./** * Virtualized support for map.get(); Overridden in subclasses. * Traverse the list to find nodes with the same hash and key */
    Node<K,V> find(int h, Object k) {
        Node<K,V> e = this;
        if(k ! =null) {
            do {
                K ek;
                if(e.hash == h && ((ek = e.key) == k || (ek ! =null && k.equals(ek))))
                    return e;
            } while((e = e.next) ! =null);
        }
        return null; }}Copy the code

2, TreeBin + TreeNode

There are two types of nodes that make up a red-black tree. The TreeBin is the root node and an empty node that does not store any key-values, and the TreeNode is a node that actually stores key-values.

(1) TreeBin root node

TreeBin does not store any elements, but manages red-black trees: adding and removing nodes, finding nodes, and so on. How to construct red-black trees and how to maintain the characteristics of red-black trees are not the focus of this paper, and will be studied separately in the later stage.

When a new node is added, it is first connected to a linked list by head insertion, and then maintains the form of a red-black tree. The main reason for maintaining the structure of the linked list is that when the red-black tree is doing the balancing algorithm, it can still traverse the nodes in the way of traversing the linked list.

The TreeBin maintains a simple spin read/write lock internally to maintain the structural properties of the red-black tree as nodes are added and removed. This balancing process requires locking. The new nodes are linked by header, so the change does not affect the traversal process and the next pointer is volatile, which immediately notifies all threads to obtain the latest value.

LockState records the status of the lock. There are three flags:

  • WRITER=1, the little bit of the binary is used to identify the status of the writer thread holding the lock (non-reentrant write lock).
  • WAITER=2The second bit is used to identify the blocking state.
  • WAITER=4, the third bit of the lower binary is used to identify the lock status of the reader thread. Read not mutually exclusive,lockState+READERAcquire the lock on behalf of a reader thread,lockState-READERRelease the lock on behalf of a reader thread. Wake up if any thread is waiting to block (writer thread) while the read lock is released.

The hash value of TreeBin is TreeBin =-2, and the code will use the hash< 0 && f instanceof TreeBin to determine if the TreeBin is a red-black tree.

static final class TreeBin<K.V> extends Node<K.V> {
    // Red and black root node
    TreeNode<K,V> root;
    // A red-black tree traverses the first node in the list
    volatile TreeNode<K,V> first;
    // Block the waiting thread
    volatile Thread waiter;
    // Lock status
    volatile int lockState;
    // values for lockState
    // The little bit of the binary is used to identify the status of the writer thread holding the lock (non-reentrant lock)
    static final int WRITER = 1; // set while holding write lock
    // The second bit is used to identify the blocking state
    static final int WAITER = 2; // set when waiting for write lock
    // The third bit is used to identify the lock status of the reader thread
    static final int READER = 4; // increment value for setting read lock

    /** * Creates bin with initial set of nodes headed by b. ** hash=TREEBIN < 0 &&f instanceof TREEBIN */
    TreeBin(TreeNode<K,V> b) {
        super(TREEBIN, null.null.null);
        this.first = b; . . }/** * Acquires write lock for tree restructuring. */
    private final void lockRoot(a) {
        if(! U.compareAndSwapInt(this, LOCKSTATE, 0, WRITER))
            // Failed to acquire the write lock
            contendedLock(); // offload to separate method
    }

    /** * Releases write lock for tree restructuring. */
    private final void unlockRoot(a) {
        lockState = 0;
    }

    /** * Possibly blocks awaiting root lock. */
    private final void contendedLock(a) {
        boolean waiting = false;
        for (int s;;) {
            // ~WAITER = - (WAITER + 1) 11111111111111111111111111111101
            if (((s = lockState) & ~WAITER) == 0) {
                // If lockState=0 or 2, write lock is acquired
                if (U.compareAndSwapInt(this, LOCKSTATE, s, WRITER)) {
                    if (waiting)
                        waiter = null;
                    return; }}/ /! = 0, s =1 or 4,8,12... If a thread holds a write lock or a read lock, the current thread must block
            else if ((s & WAITER) == 0) {
                if (U.compareAndSwapInt(this, LOCKSTATE, s, s | WAITER)) {
                    waiting = true; waiter = Thread.currentThread(); }}else if (waiting)
                LockSupport.park(this); }}/** * Returns matching node or null if none. Tries to search * using tree comparisons from root, but continues linear * search when lock not available. */
    final Node<K,V> find(int h, Object k) {
        if(k ! =null) {
            for(Node<K,V> e = first; e ! =null;) {int s; K ek;
                // The lowest bit of the binary code indicates write lock and the second bit indicates block
                if(((s = lockState) & (WAITER|WRITER)) ! =0) {
                    // There may be a thread holding a write lock or blocking, indicating that the balancing algorithm is being performed and cannot use a red-black tree to traverse the node
                    // But you can traverse the nodes like a regular list
                    if(e.hash == h && ((ek = e.key) == k || (ek ! =null && k.equals(ek))))
                        return e;
                    e = e.next;
                }
                // == 0 Indicates that no thread holds the write lock or is blocking, and the read lock can be obtained
                // READER threads are not mutually exclusive, so READER threads add readers
                else if (U.compareAndSwapInt(this, LOCKSTATE, s,
                                             s + READER)) {

                    TreeNode<K,V> r, p;
                    try {
                        // Walk through the red black tree
                        p = ((r = root) == null ? null :
                             r.findTreeNode(h, k, null));
                    } finally {
                        Thread w;
                        // Release the lock, -reader, if a thread is blocking at the same time, wake it up.
                        // There would be an unnecessary wake up, because if the last reader thread had not released the lock and woken up the blocking writer thread,
                        // If the reader thread holds the lock, the writer thread continues to block.
                        if (U.getAndAddInt(this, LOCKSTATE, -READER) == (READER|WAITER) && (w = waiter) ! =null)
                            LockSupport.unpark(w);
                    }
                    returnp; }}}return null;
    }

    /** * The return value is an existing node, so it Finds or adds a node. *@return null if added
     */
    final TreeNode<K,V> putTreeVal(int h, K k, V v) { Class<? > kc =null; . . }/** * returns true if the current tree is small and needs to be reduced to a linked list *@return true if now too small, so should be untreeified
     */
    final boolean removeTreeNode(TreeNode<K,V> p) {... . }/ * -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - * /
    // Red-black tree methods, all adapted from CLR
	/** * Balance algorithm in the process of left-handed */
    static <K,V> TreeNode<K,V> rotateLeft(TreeNode
       
         root, TreeNode
        
          p)
        ,v>
       ,v> {... . }/** * Dextral */ in the balancing algorithm
    static <K,V> TreeNode<K,V> rotateRight(TreeNode
       
         root, TreeNode
        
          p)
        ,v>
       ,v> {... . }/** * Insert balancing algorithm, each node is inserted to maintain the structure of the red-black tree, and this process is required to lock *@param root
     * @param x
     * @param <K>
     * @param <V>
     * @return* /
    static <K,V> TreeNode<K,V> balanceInsertion(TreeNode
       
         root, TreeNode
        
          x)
        ,v>
       ,v> {... . }static <K,V> TreeNode<K,V> balanceDeletion(TreeNode
       
         root, TreeNode
        
          x)
        ,v>
       ,v> {... . }}Copy the code

(2) the TreeNode

The tree node that essentially stores the element.

/** * Nodes for use in TreeBins */
static final class TreeNode<K.V> extends Node<K.V> {
    TreeNode<K,V> parent;  // red-black tree links
    TreeNode<K,V> left;
    TreeNode<K,V> right;
    TreeNode<K,V> prev;    // needed to unlink next upon deletion
    boolean red;

    TreeNode(int hash, K key, V val, Node<K,V> next,
             TreeNode<K,V> parent) {
        super(hash, key, val, next);
        this.parent = parent;
    }

    Node<K,V> find(int h, Object k) {
        return findTreeNode(h, k, null);
    }

    /** * Returns the TreeNode (or null if not found) for the given key * starting at given root. */
    final TreeNode<K,V> findTreeNode(inth, Object k, Class<? > kc) {
        // Walk through the red-black tree to find the node. . }}Copy the code

3. ForwardingNode

ForwardingNode is an empty node and a temporary placeholder node. It has two main functions:

  • Placeholder identifier, used to indicate that the element at that position in the array has been migrated but is still in the expanded state.
  • Forward the retrieval, find the operation in the old array to find the element node, if encounteredForwardingNodeIt will be forwarded to a new array to keep looking.

The key, value, and next attributes are null, and the hash value of this table is MOVED=-1. The placeholder node is a ForwardingNode based on the hash=MOVED. Elements at that location have been migrated.

If a get operation encounters a ForwardingNode, it forwards the node. If a PUT operation encounters a ForwardingNode, it helps expand the node.

static final class ForwardingNode<K.V> extends Node<K.V> {
    final Node<K,V>[] nextTable;
    ForwardingNode(Node<K,V>[] tab) {
        super(MOVED, null.null.null);
        this.nextTable = tab;
    }
    // Forward to nextTable for further retrieval
    Node<K,V> find(int h, Object k) {
        // loop to avoid arbitrarily deep recursion on forwarding nodes
        outer: for (Node<K,V>[] tab = nextTable;;) {
            Node<K,V> e; int n;
            if (k == null || tab == null || (n = tab.length) == 0 ||
                (e = tabAt(tab, (n - 1) & h)) == null)
                // Return null if the slot of the new array map is empty
                return null;
            for (;;) {
                int eh; K ek;
                if((eh = e.hash) == h && ((ek = e.key) == k || (ek ! =null && k.equals(ek))))
                    return e;
                if (eh < 0) {
                    if (e instanceof ForwardingNode) {
                        tab = ((ForwardingNode<K,V>)e).nextTable;
                        // Another forward node is encountered, a loop is skipped, and the new TAB is retrieved,
                        // Will not be added to the new array during the expansion phase? Subject to subsequent verification
                        continue outer;
                    }
                    else
                        // Here is the red black tree
                        return e.find(h, k);
                }
                if ((e = e.next) == null)
                    // Return null if not found
                    return null; }}}}Copy the code

5. Hash high-low perturbation spread

In Java7, hash is used to map the Segment array using the high hash value. In java7, hash is used to map the Segment array using the low hash value. In this way, hash conflicts can be reduced to some extent.

In java8, there is only one kind of array, and either high or low hash values will partially disable it. Therefore, the author mixes the high 16 bits of the hash value with the low 16 bits, in order to make the low 16 bits also have the characteristics of the high 16 bits, making the hash value less likely to conflict.

For example, the hash values of the lower 16 bits of two keys are similar, while the higher 16 bits are very different. In this way, it is easy to generate hash conflicts by participating in the array mapping with the lower 16 bits, while spreading the higher features to the lower 16 bits can greatly reduce the conflicts.

So how do you do high and low perturbations?

A spread function is called to compute the hash value of a key as follows.

int h = spread(key.hashCode());
Copy the code
static final int spread(int h) {
    // High and low perturbations
    return (h ^ (h >>> 16)) & HASH_BITS;
}
Copy the code

First we shift h 16 bits to the right, and then we do xor with the old h. But why can’t do & operations and | or operations? So this is a probability problem:

  • & : bitwise and,1&1 = 1, 1&0 = 0, 0&1 = 0, 0&0 = 0So the probability of 1 is 1/4 and the probability of 0 is 3/4;
  • | : bitwise or,1 1 = 1, 1 | | 0 = 1 1 = 1, 0, 0 | | 0 = 0So the probability of 1 is 3/4 and the probability of 0 is 1/4;
  • ^ : bitwise xor,1 ^ 1 = 0, 1 ^ 0 = 1, 0 ^ 1 = 1, 0 ^ 0 = 0So the probability of 1 is 1/2 and the probability of 0 is 1/2.

^ Xor operations 1 and 0 occur with equal probability, so perturbation is used to make the hash more uniform.

Finally, it does ampersand with HASH_BITS. What’s the purpose of that?

HASH_BITS is the largest positive 32-bit integer. The binary number is 1111 1111 1111 1111 1111 1111 1111 1111 111, which becomes a negative number by adding 1. The hashes disturbed by the high and low bits are ampersand with HASH_BITS to ensure that the final hash does not overflow into negative numbers.

Six, summarized

It is not easy to read the source code, it is not easy to read the source code, but word for word is not advisable, can’t do anything, easy to give up halfway. Not all functions need to be understood, and some functions that you haven’t used or used very often don’t need to be read. However, it is not easy to understand the basic optimization points of ConcurrentHashMap and very important points such as expansion. A few points need to be made:

  1. ConcurrentHashMapJava8 data structure: array + linked list + red-black tree.
  2. ConcurrentHashMapDeprecated in java8Segment, concurrent level, expansion factor and other definitions are only reserved for compatibility with older versions. Expansion factor is fixed at 0.75 and cannot be modified.
  3. sizeCtlVery important, again: when an array is uninitialized,sizeCtl>0Represents the initial capacity; At initialization,sizeCtl=-1As a lock; Initialization is complete,sizeCtl=n - (n >>> 2)Indicates capacity expansion threshold. Add capacity,sizeCtl<0It is used to record the increase or decrease of expansion threads.
  4. Constructor optimization, incoming initial capacity according to the capacity expansion mechanism ((initialCapacity / loadFactor) +1) to predict a more scientific initial capacity.
  5. Node judgment:hash>=0Is a common linked list node;hash=MOVED=-1Is forwarding nodeForwardingNode, can determine whether to help expand capacity or forward search;hash< 0 && f instanceof TreeBinIs the red and black root node.
  6. The high and low hash values of key are disturbed to make the low ones have the characteristics of the high ones and reduce hash conflicts.

PS: If there is any misunderstanding in this article, we welcome your criticism and correction, and look forward to your comments, likes and favorites. I am Xu, willing to make progress with you!

Sky blue and so on misty rain, and I am waiting for you, wechat public number search: Xu students ah, continue to update liver goods, come to pay attention to me, and I study together ~