Java containers, such as ArrayList, HashMap, and HashSet, are used too often in common projects, but it is important to note that these three containers are not thread-safe in a concurrent environment.

So if we want to use containers in a concurrent environment, the java.util.concurrent package provides us with some thread-safe containers that do not suffer from poor thread-safe performance. For example, ConcurrentHashMap, CopyOnWriteArrayList, and CopyOnWriteArraySet.

1. The HahMap thread is not safe

Solution: (1Use Hashtable instead of HashMap (2) Collections. SynchronizedMap (),3) Use ConcurrentHashMap instead of HashMap (recommended)Copy the code

Data loss problem

First look at the source code:

If (p = TAB [I = (n-1) & hash]) == null), the corresponding position is assumed to be null.

This means that two threads can enter the code block contained in an IF at the same time (thread context switch), just when the two PUTS are in the same position (such as equal keys or hash conflicts, which should normally be resolved by zipping). The result is that data from the next thread overwrites data from the previous thread.

Data loss.

Visibility problem

Visibility: When one thread operates on the container, the operation needs to be visible to other threads, meaning that other threads are aware of the operation.

HashMap can’t do that.

Death chain

In JDK 1.7, the linked list header method was used. In this case, the expansion caused the linked list to loop endlessly.

JDK 1.7:

void transfer(Entry[] newTable, boolean rehash) {
    int newCapacity = newTable.length;
    for (Entry<K,V> e : table) {
        while(null! = e) { Entry<K,V> next = e.next;if (rehash) {
                e.hash = null == e.key ? 0 : hash(e.key);
            }
            inti = indexFor(e.hash, newCapacity); e.next = newTable[i]; newTable[i] = e; e = next; }}}Copy the code
  • E and next are local variables that point to the current node and the next node
  • The temporary variables E and Next of thread 1 (green) just referenced these two nodes, but before moving the nodes, a thread switch occurred, and thread 2 (blue) completed the expansion and migration

Thread 2 has completed capacity expansion, and the linked list is out of order due to head insertion. However, thread 1’s temporary variables E and Next also reference these two nodes, which need to be migrated again

First cycle

  • The loop then runs before a thread switch, noting that e points to node A and next points to node B
  • E is inserted into node A, notice that there are two copies of node A in the diagram, but there is only one.
  • When the loop ends, e will point to next, which is node B

Second cycle

  • Next points to node A
  • E Insert node B
  • When the loop ends, E points to next, which is node A

Third cycle

  • Next points to NULL
  • E head node A, A next points to B (a.ext was always null), B next points to A, dead link is complete
  • When the loop ends, e points to next, which is null, so the fourth loop exits normally

2. ConcurrentHashMap analysis

What is the structure of ConcurrentHashMap in JDK 1.7 and JDK 1.8? What are the similarities and differences between them?

JDK 1.7

Overall structure drawing:

As can be seen, segments are segmented inside ConcurrentHashMap. Segments inherit ReentrantLock, which can be understood as a lock. Each Segment is locked independently of each other and does not affect each other.

Compared to the need to lock the entire object for every operation of Hashtable, this greatly improves the concurrency efficiency. Because its locks are independent of each other, rather than having only one lock for the entire object.

The underlying data structure of each Segment is similar to that of a HashMap, which is still a zipper structure of arrays and linked lists. There are 16 segments by default, so you can support up to 16 concurrent operations on different segments at the same time. 16 The default value can be set to another value during initialization, but cannot be expanded once the initialization is confirmed.

JDK 1.8

Overall structure drawing:

There are three types of nodes in the diagram:

  • The first is the simplest, with an empty position indicating that there are no elements to fill yet.
  • The second is the sumHashMapA very similar zipper structure, in each slot the first node will be filled first, but if the subsequent calculation of the sameHashThe value is then extended as a linked list.
  • The third structure is the red-black tree structureJDK 1.7ConcurrentHashMapThe structure that is not present in theHashMapYou should know that.

When the list length is greater than 8 and the array length >= 64, the list is converted to a red-black tree.

Red black tree features:

The red-black tree is each node binary search trees with a color attribute, color is red or black, red and black tree of binary search trees is the nature of a balance of BST strategy, we can understand as a balanced binary search trees, look for high efficiency, automatically balance, to prevent the extreme imbalance which affects the search efficiency.

The search time is O(logN), because if the list is too long in some extreme cases, the search time is O(N), which is why red-black trees were introduced, mainly for performance.

Node color rules:

  • Each node is either red or black, but the root node is always black.
  • The red node cannot be continuous, that is, neither the children nor the parents of the red node can be red.
  • The path from any node to each of its leaf nodes contains the same number of black nodes.

JDK 1.8 source code analysis

The Node Node

ConcurrentHashMap Node

static class Node<K.V> implements Map.Entry<K.V> {
    final int hash;
    final K key;
    volatile V val;
    volatile Node<K,V> next;
}
Copy the code

(2) HashMap Node source:

static class Node<K.V> implements Map.Entry<K.V> {
    final int hash;
    final K key;
    V value;
    Node<K,V> next;
}
Copy the code

Obviously, the volatile keyword was added to the value and next attributes. To ensure visibility.

Put method

The source code is as follows:

// The putVal method is called
public V put(K key, V value) {
    return putVal(key, value, false);
}

// Real add element logic implementation
final V putVal(K key, V value, boolean onlyIfAbsent) {
    if (key == null || value == null) throw new NullPointerException();
    // Calculate the hash value, the source of the method is written below, which is a little different from the HashMap
    int hash = spread(key.hashCode());
    int binCount = 0;
    for (Node<K,V>[] tab = table;;) {
        Node<K,V> f; int n, i, fh;
        // If the hash bucket is empty, initialize it accordingly
        if (tab == null || (n = tab.length) == 0)
            tab = initTable();
        // Find the hash bucket subscript corresponding to the hash value
        else if ((f = tabAt(tab, i = (n - 1) & hash)) == null) {
            // Add the new value as CAS
            if (casTabAt(tab, i, null.new Node<K,V>(hash, key, value, null)))
                break;                   // no lock when adding to empty bin
        }
        // If the hash value is equal to MOVED, the system is expanding
        else if ((fh = f.hash) == MOVED)
            tab = helpTransfer(tab, f);
        // The corresponding hash bucket index has a value
        else {
            V oldVal = null;
            Synchronized locks the current location to ensure concurrency security. Synchronized was optimized after JDK 1.6
            synchronized (f) {
                if (tabAt(tab, i) == f) {
                    // If it is a linked list
                    if (fh >= 0) {
                        binCount = 1;
                        // Iterate over the list
                        for (Node<K,V> e = f;; ++binCount) {
                            K ek;
                            if(e.hash == hash && ((ek = e.key) == key || (ek ! =null && key.equals(ek)))) {
                                oldVal = e.val;
                                if(! onlyIfAbsent) e.val = value;break;
                            }
                            Node<K,V> pred = e;
                            // Add the value to the end of the list because it did not exist before
                            if ((e = e.next) == null) {
                                pred.next = new Node<K,V>(hash, key,
                                                          value, null);
                                break; }}}// If it is a red-black tree
                    else if (f instanceof TreeBin) {
                        Node<K,V> p;
                        binCount = 2;
                        // Call putTreeVal to add data to the red-black tree
                        if((p = ((TreeBin<K,V>)f).putTreeVal(hash, key, value)) ! =null) {
                            oldVal = p.val;
                            if(! onlyIfAbsent) p.val = value; }}}}if(binCount ! =0) {
                // Check if the criteria are met and convert the list to a red-black tree. The default TREEIFY_THRESHOLD is 8
                if (binCount >= TREEIFY_THRESHOLD)
                    treeifyBin(tab, i);
                // putVal returns the old value before adding, so oldVal is returned
                if(oldVal ! =null)
                    return oldVal;
                break;
            }
        }
    }
    addCount(1L, binCount);
    return null;
}

static final int spread(int h) {
    return (h ^ (h >>> 16)) & HASH_BITS;
}

static final <K,V> boolean casTabAt(Node<K,V>[] tab, int i,
                                    Node<K,V> c, Node<K,V> v) {
    return U.compareAndSwapObject(tab, ((long)i << ASHIFT) + ABASE, c, v);
}
Copy the code

The get method

public V get(Object key) {
    Node<K,V>[] tab; Node<K,V> e, p; int n, eh; K ek;
    int h = spread(key.hashCode());
    if((tab = table) ! =null && (n = tab.length) > 0 &&
        (e = tabAt(tab, (n - 1) & h)) ! =null) {
        if ((eh = e.hash) == h) {
            if((ek = e.key) == key || (ek ! =null && key.equals(ek)))
                return e.val;
        }
        else if (eh < 0)
            return(p = e.find(h, key)) ! =null ? p.val : null;
        while((e = e.next) ! =null) {
            if(e.hash == h && ((ek = e.key) == key || (ek ! =null && key.equals(ek))))
                returne.val; }}return null;
}
Copy the code

Contrast JDK 1.7 with JDK 1.8

The data structure

JDK 1.7: Segment locking

JDK 1.8: Array + linked list + red-black tree

Concurrent degree

JDK 1.7: Lock each Segment independently. The maximum number of concurrent segments is 16.

JDK 1.8: The lock granularity is finer, ideally the number of elements in the table array (i.e. the length of the array) is the maximum number of concurrent elements it can support, which is better than before.

Ensure concurrency security

JDK 1.7: Segment locking is used for security, and segments are inherited from ReentrantLock.

JDK 1.8: Use Node + CAS + synchronized to ensure thread safety.

Encounter a Hash collision

JDK 1.7: For Hash collisions, use the zip method, which is the form of a linked list.

JDK 1.8: Use the zipper method first, when the list length is greater than 8, array length >= 64, the list will be converted to a red-black tree, to improve the efficiency of the search.

Query time Complexity

JDK 1.7: Time complexity for traversing a list is O(N), where N is the list length.

JDK 1.8: If it becomes an traversal red-black tree, the time complexity is reduced to O(logN), where n is the number of nodes in the tree.

Differences from Hashtable

Thread-safety is implemented in different ways

Hashtable uses synchronized for thread safety.

ConcurrentHashMap uses CAS + synchronized + volatile to ensure thread safety.

Performance of different

Whereas Hashtable locks the entire object, ConcurrentHashMap locks only a single node, so concurrency performance is better than Hashtable.

Changes are different during iteration

Hashtable (including a HashMap) are not allowed to modify the content during the iteration, otherwise it will throw ConcurrentModificationException, its principle is to detect modCount variables, Hashtable next () method of the source code is as follows:

public T next(a) {
    if(modCount ! = expectedModCount)throw new ConcurrentModificationException();
    return nextElement();
}
Copy the code

ConcurrentHashMap even modify the content during the iteration, ConcurrentModificationException.

CopyOnWriteArrayList analysis

Since JDK 1.5, Java has provided a concurrent container CopyOnWriteArrayList that uses CopyOnWrite as the main concurrent List. The concurrent collection of CopyOnWrite also includes CopyOnWriteArraySet, whose underlying implementation is based on CopyOnWriteArrayList.

The core is read and write separation.

Applicable scenario

  • Reading can be as fast as possible, while writing can be slower

For example, system-level information that needs to be loaded or modified only a few times is frequently accessed by all modules in the system.

  • Read more to write less

Reading and writing rules

  • Rules for reading and writing locks

The idea of read/write locks is: read shared, everything else mutually exclusive (write mutually exclusive, read/write mutually exclusive, write/read mutually exclusive).

  • Update read/write lock rules

The idea of CopyOnWriteArrayList goes one step further than the idea of read-write locks. To maximize read performance, CopyOnWriteArrayList reads without locking at all, and even better, writes don’t block reads, which means you can read at the same time as writing. Only writes and writes need to be synchronized, meaning that multiple writes can’t happen at the same time. However, a read is allowed to occur simultaneously while a write occurs. As a result, the performance of read operations is greatly improved.

The characteristics of

  • The meaning of CopyOnWrite

CopyOnWrite means that when the container needs to be modified, it does not directly modify the current container, but first Copy the current container, Copy a new container, and then modify the new container, and then reference the original container to the new container. This completes the modification process.

CopyOnWriteArrayList takes advantage of the immutability principle, because each change to the container creates a new copy, so the old container is actually immutable and thread-safe without further synchronization. So the CopyOnWrite container can be read concurrently without locking, because the current container does not add any elements or make changes.

The core idea is to separate reading and writing.

  • The contents of the collection can be modified during iteration

ArrayList if modify the contents of the collection during the iteration, will throw ConcurrentModificationException. Determine if modCount has changed.

Once a CopyOnWriteArrayList iterator has been created, adding new elements to a previous CopyOnWriteArrayList object will not be displayed in the iterator, nor will errors be reported.

disadvantages

  • Memory usage problem

Due to CopyOnWrite’s copy-on-write mechanism, the memory of both objects will be stationed in memory during the write operation, which will occupy extra memory space.

  • In the case of large or complex elements, replication is expensive

The replication process not only consumes double memory, but also consumes CPU resources, which reduces overall performance.

  • Data consistency issues

Since changes to the CopyOnWrite container are made to the copy first, the changes are not visible to other threads in real time, but only after the changes are made. If you want to write data that is immediately visible to other threads, the CopyOnWrite container is not suitable.

Source code analysis

The data structure

// Unfair lock
final transient ReentrantLock lock = new ReentrantLock();
// The underlying array is volatile to ensure visibility
private transient volatile Object[] array;

// Get the array
final Object[] getArray() {
    return array;
}

// Set the array
final void setArray(Object[] a) {
    array = a;
}

// Empty constructor
public CopyOnWriteArrayList(a) {
    setArray(new Object[0]);
}
Copy the code

The add method

public boolean add(E e) {
    final ReentrantLock lock = this.lock;
    lock.lock();
    try {
        // Get the original array
        Object[] elements = getArray();
        // Get the length of the original array
        int len = elements.length;
        // Copy a new array
        Object[] newElements = Arrays.copyOf(elements, len + 1);
        // Add the new element to the new array
        newElements[len] = e;
        // Set the new array as a reference to array
        setArray(newElements);
        return true;
    } finally{ lock.unlock(); }}Copy the code

The idea of CopyOnWrite is that the write is done on a copy of the original container and the list is not locked while reading data. Also, you can see that if a new reader thread comes in during a copy operation on the container, the old data will still be read because the object’s reference has not been changed at that point.

The get method

private E get(Object[] a, int index) {
    return (E) a[index];
}

public E get(int index) {
    return get(getArray(), index);
}
Copy the code

As you can see, the GET operation is not locked.

Iterator COWIterator class

private COWIterator(Object[] elements, int initialCursor) {
    // Cursor for iterator
    cursor = initialCursor;
    // Array snapshot
    snapshot = elements;
}
Copy the code

All subsequent iterators perform operations based on the snapshot array, such as:

public E next(a) {
    if (! hasNext())
        throw new NoSuchElementException();
    return (E) snapshot[cursor++];
}
Copy the code