1. The background

ConcurrentHashMap is a highly used data structure in concurrent programming.

Advantages:

  • Thread safety
  • Compared with the HashTable and Collections. SynchronizedMap () with high efficiency, using subsection locking technology.

2.ConcurrentHashMap Data structure

  • Segment

The Segment inherits a ReentrantLock, so it is both a container and a lock. TryLock (), lock(), unLock() can be used in segments.

  • HashEntry
    static final class HashEntry<K,V> {
        final int hash;
        final K key;
        volatile V value;
        volatile HashEntry<K,V> next;
    }
Copy the code

A one-way linked list that holds the hash values of key-value pairs and keys.

3. Key methods and implementation principles

3.1 Initialization Method

/** * @param initialCapacity initialCapacity * @param loadFactor loadFactor, capacity expansion opportunity * @param concurrencyLevel concurrencyLevel */ public ConcurrentHashMap(int initialCapacity,floatLoadFactor, int concurrencyLevel) {// Check related parametersif(! (loadFactor > 0) || initialCapacity < 0 || concurrencyLevel <= 0) throw new IllegalArgumentException();if(concurrencyLevel > MAX_SEGMENTS) concurrencyLevel = MAX_SEGMENTS; int sshift = 0; // segments[] specifies the size of the array, assuming the concurrency is 17 and ssize is 32. Int ssize = 1;while(ssize < concurrencyLevel) { ++sshift; ssize <<= 1; } // Segment offset this. SegmentShift = 32 - sshift; // segmentMask: binary 111111, used to locate segments[] subscript this. SegmentMask = ssisize - 1;if(initialCapacity > MAXIMUM_CAPACITY) initialCapacity = MAXIMUM_CAPACITY; // Calculate the size of the HashEntry[] array in each sgment int c = initialCapacity/ssize;if (c * ssize < initialCapacity)
            ++c;
        int cap = MIN_SEGMENT_TABLE_CAPACITY;
        while (cap < c)
            cap< < = 1; // create segments and segments[0] Segment<K,V> s0 = new Segment<K,V>(loadFactor, (int)(cap * loadFactor),
                             (HashEntry<K,V>[])new HashEntry[cap]); Segment<K,V>[] ss = (Segment<K,V>[])new Segment[ssize]; PutOrderedObject (ss, SBASE, s0); // Unsafe. putOrderedObject(ss, SBASE, s0); // ordered write of segments[0] this.segments = ss; }Copy the code

3.2 the put () method

    public V put(K key, V value) {
        Segment<K,V> s;
        if(value == null) throw new NullPointerException(); // Calculate keyhashValue of the inthash = hash(key); / / will behashValue is shifted to the right offset bit and with 31(11111), so j is a number between 0 and 31 int j = (hash>>> segmentShift) & segmentMask; // Retrieve the segment with subscript j. If not, use the CAS operation provided by UNSAFE to create the segment. Ensure that multiple threads are created correctly at the same time.if ((s = (Segment<K,V>)UNSAFE.getObject          // nonvolatile; recheck
             (segments, (j << SSHIFT) + SBASE)) == null) //  in ensureSegment
            s = ensureSegment(j);
        return s.put(key, hash, value, false); Private Segment<K,V> ensureSegment(int K) {final Segment<K,V>[] ss = this.segments; long u = (k << SSHIFT) + SBASE; // raw offset Segment<K,V> seg; // Use the lightweight synchronization volatile primitive to keep the read objects up to date.if ((seg = (Segment<K,V>)UNSAFE.getObjectVolatile(ss, u)) == null) {
            Segment<K,V> proto = ss[0]; // use segment 0 as prototype
            int cap = proto.table.length;
            float lf = proto.loadFactor;
            int threshold = (int)(cap * lf);
            HashEntry<K,V>[] tab = (HashEntry<K,V>[])new HashEntry[cap];
            if ((seg = (Segment<K,V>)UNSAFE.getObjectVolatile(ss, u))
                == null) { // recheck
                Segment<K,V> s = new Segment<K,V>(lf, threshold, tab);
                while((seg = (Segment<K,V>)UNSAFE. GetObjectVolatile (ss, u)) == null) {// CASif (UNSAFE.compareAndSwapObject(ss, u, null, seg = s))
                        break; }}}returnseg; } /** * segment put() */ final V put(K key, inthash, V value, Boolean onlyIfAbsent) {// Try to obtain the lock, if not, call scan.. The node is pre-created and returns (a bit of a spinlock). HashEntry<K,V> node = tryLock() ? null : scanAndLockForPut(key,hash, value); V oldValue; try { HashEntry<K,V>[] tab = table; Int index = (tab.length - 1) & // Calculate the subscript of this key in the HashEntry[] arrayhash; HashEntry<K,V> first = entryAt(TAB, index); // Iterate over the listfor (HashEntry<K,V> e = first;;) {
                    if(e ! = null) { K k;if ((k = e.key) == key ||
                            (e.hash == hash && key.equals(k))) {
                            oldValue = e.value;
                            if(! onlyIfAbsent) { e.value = value; ++modCount; }break;
                        }
                        e = e.next;
                    }
                    else {
                        if(node ! = null) node.setNext(first);else
                            node = new HashEntry<K,V>(hash, key, value, first);
                        int c = count + 1;
                        if(c > threshold && tab.length < MAXIMUM_CAPACITY) // If the threshold is exceeded, expand the capacityrehash(node);
                        else
                            setEntryAt(tab, index, node);
                        ++modCount;
                        count = c;
                        oldValue = null;
                        break;
                    }
                }
            } finally {
                unlock();
            }
            returnoldValue; } /** * Expansion method, because after expansion, the subscript of each node will only (unchanged) or change (existing subscript + original array length), so it traverses the list with the last small chain to change. */ private voidrehash(HashEntry<K,V> node) { HashEntry<K,V>[] oldTable = table; Int oldCapacity = oldtable. length; // Length after capacity expansion *2 int newCapacity = oldCapacity << 1; threshold = (int)(newCapacity * loadFactor); HashEntry<K,V>[] newTable = (HashEntry<K,V>[]) new HashEntry[newCapacity]; int sizeMask = newCapacity - 1;for (int i = 0; i < oldCapacity ; i++) {
                HashEntry<K,V> e = oldTable[i];
                if(e ! = null) { HashEntry<K,V> next = e.next; int idx = e.hash & sizeMask;if (next == null)   //  Single node on list
                        newTable[idx] = e;
                    else { // Reuse consecutive sequence at same slot
                        HashEntry<K,V> lastRun = e;
                        int lastIdx = idx;
                        for(HashEntry<K,V> last = next; last ! = null; last = last.next) { int k = last.hash & sizeMask;if(k ! = lastIdx) { lastIdx = k; lastRun = last; } } newTable[lastIdx] = lastRun; // Clone remaining nodesfor(HashEntry<K,V> p = e; p ! = lastRun; p = p.next) { V v = p.value; int h = p.hash; int k = h & sizeMask; HashEntry<K,V> n = newTable[k]; newTable[k] = new HashEntry<K,V>(h, p.key, v, n); } } } } int nodeIndex = node.hash & sizeMask; // add the new node node.setNext(newTable[nodeIndex]); newTable[nodeIndex] = node; table = newTable; }Copy the code

3.3 Get (Object Key) Method

// The get method does not need to be locked. Since the shared variables involved are volatile, volatile guarantees memory visibility, so stale data will not be read. public V get(Object key) { Segment<K,V> s; // manually integrate access methods to reduce overhead HashEntry<K,V>[] tab; int h =hash(key);
        long u = (((h >>> segmentShift) & segmentMask) << SSHIFT) + SBASE;
        if((s = (Segment<K,V>)UNSAFE.getObjectVolatile(segments, u)) ! = null && (tab = s.table) ! = null) {for(HashEntry<K,V> e = (HashEntry<K,V>) UNSAFE.getObjectVolatile (tab, ((long)(((tab.length - 1) & h)) << TSHIFT) + TBASE);  e ! = null; e = e.next) { K k;if ((k = e.key) == key || (e.hash == h && key.equals(k)))
                    returne.value; }}return null;
    }
Copy the code

3.4 Remove (Object Key) Method

The purpose of remove() is to remove key-value pairs. It acquires the Segment’s mutex before deleting it, and releases the lock after deleting it. It will first find the “HashEntry” node in the “HashEntry array of Segment” based on the hash value. From the data structure of the Segment, we know that the Segment contains a HashEntry array object, and each HashEntry is essentially a one-way linked list. Once the HashEntry node is found, the linked list corresponding to the HashEntry node is traversed, the node corresponding to the key-value pair is found, and then deleted.

3.5 the size () method

If we want to count the size of elements in the entire ConcurrentHashMap, we must count the size of elements in all segments and sum them. The Segment count is volatile. In multithreaded scenarios, add count of all segments to obtain the ConcurrentHashMap size. Count () {count (); count () {count (); count (); Therefore, the safest method is to lock all the put, remove and clean methods of all segments during size statistics, but this method is obviously very inefficient.

In the process of counting, it is very rare for the count accumulated before to change. Therefore, ConcurrentHashMap tries to count the size of each Segment twice by not locking the Segment. If the count of the container changes during counting, Then the size of all segments is counted by locking.

So how does ConcurrentHashMap determine if the container has changed during the count? The modCount variable is incremented by one before the elements are operated on in the PUT, remove, and clean methods. Then the modCount variable is compared before and after the size count to see if the container size has changed.

4. To summarize

ConcurrentHashMap is a thread-safe hash table that is implemented with segment-locking. Mutex must be obtained for both put and remove. For reads, it is volatile. The HashEntry array is volatile, and the value and next attributes in the HashEntry are volatile. Volatile guarantees visibility and order. This is how ConcurrentHashMap is thread safe.