preface

For those of you who have studied data structures, you probably already know what a HashMap is. It’s really simple, just an array plus a linked list. What is the difference between a Hashtable and a HashMap? Thread safety thread unsafe Hashtable does not allow null values. How does HashMap optimize performance? How did you react? If you know red black trees, you can answer it; If you don’t know, it will be cold, because even ConcurrentHashMap will have to give up answering at this point!!

Part of the picture is taken from JDK1.7 HashMap resulting in a circular linked list

HashMap source code guide

In fact, the idea is basically the same, so here is just one HashMap to analyze, first post a few of its common uses.

HashMap hashMap = new HashMap();
hashMap.put(key, value);
hashMap.get(key);
Copy the code

The whole workflow of HashMap is analyzed mainly from this aspect.

HashMap()

public HashMap(int initialCapacity, float loadFactor) {
        if (initialCapacity < 0)
            throw new IllegalArgumentException("Illegal initial capacity: " +
                                               initialCapacity);
        // A protection for arrays that cannot exceed the maximum value of an int
        if (initialCapacity > MAXIMUM_CAPACITY)
            initialCapacity = MAXIMUM_CAPACITY;
        if (loadFactor <= 0 || Float.isNaN(loadFactor))
            throw new IllegalArgumentException("Illegal load factor: " +
                                               loadFactor);
        this.loadFactor = loadFactor;
        this.threshold = tableSizeFor(initialCapacity);
    }

    public HashMap(int initialCapacity) {
        this(initialCapacity, DEFAULT_LOAD_FACTOR);
    }

    public HashMap(a) {
        this.loadFactor = DEFAULT_LOAD_FACTOR; // all other fields defaulted
    }

    public HashMap(Map<? extends K, ? extends V> m) {
        this.loadFactor = DEFAULT_LOAD_FACTOR;
        putMapEntries(m, false);
    }
Copy the code

In fact in the absence of constructor, we don’t see the so-called array initialization, he only load factor of us made a initialization, namely 0.75 f we always used to say, but why is 0.75 f, can only say that a experience value, also is experience, because 0.5 f is a waste of space, 1 f prone to extremes, Of course, it is not arbitrary, the designer must have done a lot of testing, but it is still an empirical value, or the best solution after testing.

Going back to our previous question, since we learned that a HashMap is an array plus a linked list. So the first thing to think about is why is the initialization missing? Keep going with that question and keep going.

You’ll start by looking at the do-it-yourself initialization capacity constructor, which will eventually call the tableSizeFor() method below.

static final int tableSizeFor(int cap) {
        int n = cap - 1;
        n |= n >>> 1;
        n |= n >>> 2;
        n |= n >>> 4;
        n |= n >>> 8;
        n |= n >>> 16;
        return (n < 0)?1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
    }
Copy the code

It essentially means that you take the number and make it an exponential of 2, and that has the advantage of being easy to compute. But the same thing happens, there’s no initialization, and you only see the capacity. The question remains.

put(key, value)

public V put(K key, V value) {
        return putVal(hash(key), key, value, false.true); / / 1
    }
// The method putVal() called directly from comment 1
final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
        Node<K,V>[] tab; Node<K,V> p; int n, i;
        // When first checked, it is obvious that TAB is empty, because we do not see its initialization in the constructor, so we must call the resize() method.
        if ((tab = table) == null || (n = tab.length) == 0)
            n = (tab = resize()).length; // 2, a method that cannot be initialized and must be called
        if ((p = tab[i = (n - 1) & hash]) == null)
            tab[i] = newNode(hash, key, value, null);
        else {
            Node<K,V> e; K k;
            if(p.hash == hash && ((k = p.key) == key || (key ! =null && key.equals(k))))
                e = p;
            else if (p instanceof TreeNode)
                e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
            else {
                for (int binCount = 0; ; ++binCount) {
                    if ((e = p.next) == null) {
                        p.next = newNode(hash, key, value, null);
                        if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                            treeifyBin(tab, hash);
                        break;
                    }
                    if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k))))
                        break; p = e; }}if(e ! =null) { // existing mapping for key
                V oldValue = e.value;
                if(! onlyIfAbsent || oldValue ==null)
                    e.value = value;
                afterNodeAccess(e);
                return oldValue;
            }
        }
        ++modCount;
        if (++size > threshold)
            resize(); / / 2
        afterNodeInsertion(evict);
        return null;
    }
// The method called directly from comment 2
// Call from multiple methods:
// 1. Not initialized yet
// 2. The saved data exceeds the capacity * load factor
// 3. The data is not deleted enough to support the tree
final Node<K,V>[] resize() {
        Node<K,V>[] oldTab = table;
        int oldCap = (oldTab == null)?0 : oldTab.length;
        int oldThr = threshold;
        int newCap, newThr = 0;
        //...
        // Set the initial capacity to 16
        //...
        if (newThr == 0) {
            float ft = (float)newCap * loadFactor;
            newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                      (int)ft : Integer.MAX_VALUE);
        }
        threshold = newThr;
        // The entire table is initialized
        // The table is an array of nodes
        // Node is a list of nodes. The next Node can be seen by clicking on it
        @SuppressWarnings({"rawtypes"."unchecked"})
        Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
        table = newTab;
        //...
    }
Copy the code

So now we know that the original initialization process is defined here, and that solves our first problem. But then comes the second question, why? Here’s an answer to my question: What if you just build it, but you don’t use it? That would take up at least 16 data types of memory, and this creation is a mechanism to protect memory.

The third question is, why convert to a tree (which of course has a nice name, red black tree)? In fact, the conversion of the structure is nothing more than a few reasons for efficiency problems, space occupation problems. If a linked list is used, its query speed is O(n), while red black tree query speed is O(logn). As a binary tree, it needs to store both the left and right nodes, while a single-linked list has only one node, so the problem of memory consumption comes out. Tree construction can cover a whole blog, so I won’t cover it here.

get(key)

public V get(Object key) {
        Node<K,V> e;
        return (e = getNode(hash(key), key)) == null ? null : e.value;
    }
Copy the code

To find our corresponding node using the hash value, we need to first look at how the hash is computed.

static final int hash(Object key) {
        int h;
        return (key == null)?0 : (h = key.hashCode()) ^ (h >>> 16);
    }
Copy the code

The answer is straightforward: get hashCode() and perform xOR on 0000_0000_0000_0000_0000B. To figure out the lower 16 bits of hashCode(). So once we get the hash value, we need to find our node.

final Node<K,V> getNode(int hash, Object key) {
        Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
        if((tab = table) ! =null && (n = tab.length) > 0 &&
            (first = tab[(n - 1) & hash]) ! =null) {
            if (first.hash == hash && // always check first node((k = first.key) == key || (key ! =null && key.equals(k))))
                return first;
            if((e = first.next) ! =null) {
                if (first instanceof TreeNode)
                // Find the data in the tree
                    return ((TreeNode<K,V>)first).getTreeNode(hash, key);
                // Find the data in the linked list
                do {
                    if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k))))
                        return e;
                } while((e = e.next) ! =null); }}return null;
    }
Copy the code

Then we get the data we need and give it back to us. Wow!! It turns out the whole thing is just that simple. (In fact, it is not simple to write, but simple to analyze, heh heh.)

Why is hash xOR high and low 16?

Have you ever thought about this problem, but it might happen too often in your code?

Premise: Key1 and key2 have different values, but hashCode has different high and same low values.

As you can see, in our projects, one Hashmap is not going to be overly large, and we probably only used 32 of them (this seems to be the end of my project). The range I gave was 16 digits, 32 only reached 5 digits, far less than the requirements, so the occurrence frequency of this situation will be unusually exaggerated.

So Java engineers have come up with a solution of their own, namely high-low xOR operations, which have a nice name called perturbation functions.

Arithmetic is essentially, you could say, allowing other bits of binary to join in the feast of arithmetic. By doing this, maybe the last object of Key1 will change from 123A000 to 123A123A, and Key2 will change from BCD00000 to BCD0BCD0, so their xOR and the last 0000000F will be very different from the original result.

Let’s take a look at the conflict situation after processing the disturbance function.

This is An experiment from the column “An Introduction to Optimising a Hashing Strategy” : He took 352 strings at random, masked them, and took array subscripts, with no conflicting hash values.

At this time, the length of a HashMap is 512, and when the mask is 9 bits lower, the number of conflicts can be reduced from 103 to 92, an efficiency improvement of nearly 10%.

Of course, readers will ask, why not interfere more often?

It is true that multiple interruptions can lead to better results, but there is a problem with efficiency. Computers are all about efficiency, and a single interference can be accepted as a compromise if it is good enough to pay off.

What’s the difference between HashMap and Hashtable

Now that we know the composition of the entire HashMap, the main object to understand should be the Hashtable. Let’s take a look at the constructors of Hashtable.

// the constructor is initialized with no arguments and the processing capacity is 11 and the load factor is 0,75
public Hashtable(a) {
        this(11.0.75 f);
    }
// The list creation is already created in the last constructor used by default
// Here we find the first difference.
public Hashtable(int initialCapacity, float loadFactor) {
        if (initialCapacity < 0)
            throw new IllegalArgumentException("Illegal Capacity: "+
                                               initialCapacity);
        if (loadFactor <= 0 || Float.isNaN(loadFactor))
            throw new IllegalArgumentException("Illegal Load: "+loadFactor);

        if (initialCapacity==0)
            initialCapacity = 1;
        this.loadFactor = loadFactor;
        table = newEntry<? ,? >[initialCapacity]; threshold = (int)Math.min(initialCapacity * loadFactor, MAX_ARRAY_SIZE + 1);
    }
Copy the code

There is the first difference, but there is another difference, have you found it? It’s capacity. The capacity calculations in HashMap are all close to exponential multiples of 2, but Hashtable doesn’t make that choice, but the load factors are surprisingly consistent.

Look again at the PUT (key, value) method of Hashtable.

public synchronized V put(K key, V value) {
        // The existence of nulls, and HashMap does not nulls, thus allowing null as a key
        if (value == null) {
            throw new NullPointerException();
        }

        // Makes sure the key is not already in the hashtable.Entry<? ,? > tab[] = table;int hash = key.hashCode();
        int index = (hash & 0x7FFFFFFF) % tab.length;
        @SuppressWarnings("unchecked")
        Entry<K,V> entry = (Entry<K,V>)tab[index];
        for(; entry ! =null ; entry = entry.next) {
            if ((entry.hash == hash) && entry.key.equals(key)) {
                V old = entry.value;
                entry.value = value;
                return old;
            }
        }

        addEntry(hash, key, value, index);
        return null;
    }
Copy the code

We often say that Hashtable is a thread-safe class, and here we have an answer: It adds synchronized to its methods to do our synchronization. But after thinking about it, I have a question: did you see his resize() function? Yes, there is no resize() function.

So let’s move on, because there’s an addEntry() method in this function, and see if there’s an expansion mechanism in there.

private void addEntry(int hash, K key, V value, int index) { modCount++; Entry<? ,? > tab[] = table;if (count >= threshold) {
            // Rehash the table if the threshold is exceeded
            rehash();

            tab = table;
            hash = key.hashCode();
            index = (hash & 0x7FFFFFFF) % tab.length;
        }

        // Creates the new entry.
        @SuppressWarnings("unchecked")
        Entry<K,V> e = (Entry<K,V>) tab[index];
        tab[index] = new Entry<>(hash, key, value, e);
        count++;
    }
Copy the code

In the addEntry() method, we see that his refactoring function is a function called rehash(). The expansion mechanism is the same as that of HashMap. But in terms of efficiency, since arrays and linked lists exist, even without thread-safe mechanisms, overall efficiency is worse than HashMap.

ConcurrentHashMap is optimized for thread-safe performance

When it comes to ConcurrentHashMap, it differs from HashMap before and after JDK1.8.

You can find a lot of information on the web about the pre-Version 1.8 mechanism, which is segmented locking, as a combination of multiple Hashtables. After Version 1.8, slots are locked. Do a detailed analysis later.

Since it is performance optimization, there should be a point of performance optimization.

(1) In the same way as HashMap, array + linked list + red-black tree has better search performance than Hashtable. Prerequisite: The capacity used is greater than 8.

(2) Piecewise locking mechanism/slot locking mechanism: instead of locking the entire array, single or several linked lists and red-black trees can be locked, which enables multiple different hash operations at the same time.

Since I’m using JDK1.8 locally, let’s look at what JDK1.8 does.

final V putVal(K key, V value, boolean onlyIfAbsent) {
        if (key == null || value == null) throw new NullPointerException();
        int hash = spread(key.hashCode());
        int binCount = 0;
        for (Node<K,V>[] tab = table;;) {
            Node<K,V> f; int n, i, fh;
            if (tab == null || (n = tab.length) == 0)
                //...
            }
            else if ((f = tabAt(tab, i = (n - 1) & hash)) == null) {
                // The CAS mechanism is introduced
                if (casTabAt(tab, i, null.new Node<K,V>(hash, key, value, null)))
                    break;                   // no lock when adding to empty bin
            }
            else if ((fh = f.hash) == MOVED)
                //...
            else {
                V oldVal = null;
                synchronized (f) {
                    //...
                }
                //...
            }
        }
        addCount(1L, binCount);
        return null;
    }
Copy the code

Synchronized (f) is a synchronized object. The variable f represents a list corresponding to a hash, and the lock is added to the list or to the red-black tree. In addition, if the index is empty, a new node is created by CAS. This is the new CAS+ locking mechanism introduced in JDK 1.8.

Let’s take a look at JDK 1.7 and use a graphic to get a feel for it

In version 1.7, locks are assigned to each chain based on the Segment, but the problem is that the hash search takes longer. However, it is better than Hashtable in terms of performance. Since segmental locking does not affect the locking problem between two segments, it also improves performance.

Compared with Version 1.8, the performance is really insufficient. First, the red-black tree is introduced, and the second Segment maintenance is a more troublesome process. The latter is adjusted to a single Node, reducing the range of adjustments, bringing two benefits, one is easier to manage, and the other is an increase in the number of simultaneous operations.

Loop chains in a HashMap

This is an extension, again divided into Version 1.7 and 1.8. Of course, the reason for this is nothing more than high concurrency, how can this happen in the case of single thread processing?

Let’s talk about version 1.7’s loop chain problem and why it happened.

Let’s take a look at some of the problems with Version 1.7.

void transfer(Entry[] newTable)
{
    Entry[] src = table;
    int newCapacity = newTable.length;
    // Very easily is a copy project
    for (int j = 0; j < src.length; j++) {
        Entry<K,V> e = src[j];
        if(e ! =null) {
            src[j] = null;
            do {
                Entry<K,V> next = e.next;
                int i = indexFor(e.hash, newCapacity);
                e.next = newTable[i];
                newTable[i] = e;
                e = next;
            } while(e ! =null); }}}Copy the code

The whole thing is a head plug pattern.

Single thread case

High concurrency

Step one:

Thread 1 gets e = 3, suspends next = 7, and thread 2 has finished.

Step 2:

Now thread 1 is going to do the new operation, but actually the whole data has been modified, the data in his hand is a dirty data, he keeps the previous data, and the previous data 7, has become the current head.

Step 3:

3 -> 7 -> 3 -> 7……. 3 -> 3 -> 7……. Is in an infinite loop.

In fact, 1.8 has been improved, but it will also cause cyclic chain problems.

In fact, there are a lot of blogs that don’t say version 1.8 has put an end to this issue, but why mention it? It’s because it was discovered that the issue still exists. But this is not a header method, is in the tree and linked list data structure transformation, specific will not repeat.

In fact, it can be known, because who uses HashMap for high concurrency situations, it is not a stupid way to know that there is a danger.

conclusion

ConcurrentHashMap > Hashtable. ConcurrentHashMap > Hashtable. It is for these reasons that we later use ConcurrentHashMap instead of Hashtable.

Red-black tree is also one of our strategies for performance optimization, but from the perspective of construction, red-black tree construction is really quite troublesome and requires a certain logical foundation.