HashMap? ConcurrentHashMap? I believe no one can stop you after reading this article!

preface

Key values such as maps are classic structures in software development and are often used to store data in memory.

This article focuses on ConcurrentHashMap, a ConcurrentHashMap container. Before we get started, I think we need to talk about HashMap, without which there would be no ConcurrentHashMap.

HashMap

It is well known that the underlying HashMap is based on arrays and linked lists, but the implementation is slightly different in jdk1.7 and 1.8.

The Base 1.7

Data structure diagram in 1.7:

Let’s take a look at the implementation in 1.7.

These are the member variables that compare the core of the HashMap; What do they mean?

table

Map

Focus on the load factor:

Since the size of a given HashMap is fixed, such as default initialization:

public HashMap() {

this(DEFAULT_INITIAL_CAPACITY, DEFAULT_LOAD_FACTOR);

}

public HashMap(int initialCapacity, float loadFactor) {

if (initialCapacity < 0)

throw new IllegalArgumentException(“Illegal initial capacity: ” +

initialCapacity);

if (initialCapacity > MAXIMUM_CAPACITY)

initialCapacity = MAXIMUM_CAPACITY;

if (loadFactor <= 0 || Float.isNaN(loadFactor))

throw new IllegalArgumentException(“Illegal load factor: ” +

loadFactor);

this.loadFactor = loadFactor;

threshold = initialCapacity;

init();

}

Given a default capacity of 16 with a load factor of 0.75. When Map is used, data is continuously stored in it. When the amount of data reaches 16 x 0.75 = 12, the current capacity of 16 needs to be expanded. This process involves rehash and data replication, which consumes high performance.

Therefore, it is recommended to estimate the size of the HashMap in advance to minimize performance loss caused by capacity expansion.

As you can see from the code, what’s really storing the data is

transient Entry[] table = (Entry[]) EMPTY_TABLE;

This array, how is it defined?

Entry is an inner class in a HashMap, as can be easily seen from its member variables:

Key is the key to write to.

Value is value.

From the beginning, we mentioned that HashMap is made up of arrays and linked lists, so this next is used to implement the linked list structure.

Hash stores the hashcode of the current key.

Knowing the basic structure, let’s take a look at the important write and fetch functions:

Put method

public V put(K key, V value) {

if (table == EMPTY_TABLE) {

inflateTable(threshold);

}

if (key == null)

return putForNullKey(value);

int hash = hash(key);

int i = indexFor(hash, table.length);

for (Entry e = table[i]; e ! = null; e = e.next) {

Object k;

if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {

V oldValue = e.value;

e.value = value;

e.recordAccess(this);

return oldValue;

}

modCount++;

addEntry(hash, key, value, i);

return null;

}

Determines whether the current array needs to be initialized.

If key is null, put a null value into it.

Calculate the Hashcode based on the key.

Locate the bucket based on the calculated HashCode.

If the bucket is a linked list, we need to traverse to determine whether the hashcode and key in the bucket are equal to the incoming key. If they are equal, we overwrite them and return the original value.

If the bucket is empty, no data is stored in the current location. Add an Entry object to the current location.

void addEntry(int hash, K key, V value, int bucketIndex) {

if ((size >= threshold) && (null ! = table[bucketIndex])) {

resize(2 * table.length);

hash = (null ! = key) ? hash(key) : 0;

bucketIndex = indexFor(hash, table.length);

}

createEntry(hash, key, value, bucketIndex);

}

void createEntry(int hash, K key, V value, int bucketIndex) {

Entry e = table[bucketIndex];

table[bucketIndex] = new Entry<>(hash, key, value, e);

size++;

}

When calling addEntry to write Entry, you need to determine whether to expand the capacity.

Double the size if necessary, and rehash the current key and position it.

CreateEntry passes the bucket with the current location to the new bucket, and if the bucket has a value, it forms a linked list of locations.

The get method

Let’s look at the get function:

public V get(Object key) {

if (key == null)

return getForNullKey();

Entry entry = getEntry(key);

return null == entry ? null : entry.getValue();

}

final Entry getEntry(Object key) {

if (size == 0) {

return null;

}

int hash = (key == null) ? 0 : hash(key);

for (Entry e = table[indexFor(hash, table.length)];

e ! = null;

e = e.next) {

Object k;

if (e.hash == hash &&

((k = e.key) == key || (key ! = null && key.equals(k))))

return e;

}

return null;

}

Key and hashcode of key

The Base 1.8

Do you see any points that need to be optimized?

In fact, one obvious place is:

When Hash conflicts are serious, the linked list formed on the bucket becomes longer and longer, resulting in lower query efficiency. Time complexity is O(N).

So 1.8 focuses on optimizing this query efficiency.

1.8 HashMap Structure:

Let’s start with a few core member variables:

static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16

/ * *

The maximum capacity, used if a higher value is implicitly specified
by either of the constructors with arguments.
MUST be a power of two <= 1<<30.

* /

static final int MAXIMUM_CAPACITY = 1 << 30;

/ * *

The load factor used when none specified in constructor.

* /

Static final float DEFAULT_LOAD_FACTOR = 0.75f;

static final int TREEIFY_THRESHOLD = 8;

transient Node[] table;

/ * *

Holds cached entrySet(). Note that AbstractMap fields are used
for keySet() and values().

* /

transient Set> entrySet;

/ * *

The number of key-value mappings contained in this map.

* /

transient int size;

It’s basically the same as 1.7, with a few important differences:

TREEIFY_THRESHOLD

In fact, the core composition of Node is the same as HashEntry in 1.7, which stores key value, Hashcode next and other data.

Now look at the core method.

By the way, I recommend an architecture exchange group: 617434785, which will share some videos recorded by senior architects: Spring, MyBatis, Netty source code analysis, high concurrency, high performance, distributed, microservice architecture principle, JVM performance optimization, which have become necessary knowledge system for architects. You can also receive free learning resources. I believe that for those who have already worked and met technical bottlenecks, there will be content you need in this group.

Put method

It seems to be more complicated than 1.7, so we break it down step by step:

Check whether the current bucket is empty. If the bucket is empty, initialize it (resize will determine whether to initialize it).

Locate the bucket according to the hashcode of the current key and check whether it is empty. If it is empty, it indicates that there is no Hash conflict. Create a new bucket at the current location.

If the current bucket has a value (Hash conflict), the key in the current bucket is compared, and the hashcode of the key is equal to the written key. If the key is equal, the value is assigned to E. At step 8, the assignment and return are performed uniformly.

If the current bucket is a red-black tree, write data in red-black tree fashion.

If it is a linked list, encapsulate the current key and value into a new node and write it to the end of the bucket (form a linked list).

Then determine whether the size of the current list is greater than the preset threshold, and if so, convert to a red-black tree.

If the same key is found during the traversal, exit the traversal directly.

If e! = null equals the existence of the same key, which needs to be overwritten.

Determine whether capacity expansion is required.

The get method

public V get(Object key) {

Node e;

return (e = getNode(hash(key), key)) == null ? null : e.value;

}

final Node getNode(int hash, Object key) {

Node[] tab; Node first, e; int n; K k;

if ((tab = table) ! = null && (n = tab.length) > 0 &&

(first = tab[(n – 1) & hash]) ! = null) {

if (first.hash == hash && // always check first node

((k = first.key) == key || (key ! = null && key.equals(k))))

return first;

if ((e = first.next) ! = null) {

if (first instanceof TreeNode)

return ((TreeNode)first).getTreeNode(hash, key);

do {

if (e.hash == hash &&

((k = e.key) == key || (key ! = null && key.equals(k))))

return e;

} while ((e = e.next) ! = null);

}

return null;

}

The GET method looks a lot simpler.

The key is first hash and the bucket is retrieved.

If the bucket is empty, null is returned.

Otherwise, it checks whether the key in the first position of the bucket (such as a linked list or red-black tree) is the query key and returns value directly.

If the first one does not match, it is determined whether the next one is a red-black tree or a linked list.

The red-black tree returns the value as the tree looks up.

Otherwise, the match returns are iterated through in a linked list.

From these two core methods (get/ PUT), we can see that the large linked list is optimized in 1.8, and the query efficiency is directly improved to O(logn) after the modification to red-black tree.

However, the original problems of HashMap also exist, such as the tendency for an infinite loop when used in concurrent scenarios.

final HashMap map = new HashMap();

for (int i = 0; i < 1000; i++) {

new Thread(new Runnable() {

@Override

public void run() {

map.put(UUID.randomUUID().toString(), “”);

}

}).start();

}

But why? Just a quick analysis.

The resize() method is called when the HashMap is expanded, because the concurrent operation is easy to form a circular list on a bucket; So when you get a key that doesn’t exist, and the index is the index of the linked list, there will be an infinite loop.

The diagram below:

Traverse the way

It is also worth noting that HashMap traversal is usually done in one of the following ways:

Iterator> entryIterator = map.entrySet().iterator();

while (entryIterator.hasNext()) {

Map.Entry next = entryIterator.next();

System.out.println(“key=” + next.getKey() + ” value=” + next.getValue());

}

Iterator iterator = map.keySet().iterator();

while (iterator.hasNext()){

String key = iterator.next();

System.out.println(“key=” + key + ” value=” + map.get(key));

}

The first type of EntrySet is strongly recommended for traversal.

The first method can fetch the key value at the same time, while the second method needs to fetch the value through the key once, which is inefficient.

A quick summary of HashMap: either 1.7 or 1.8 shows that the JDK does nothing to synchronize it, causing concurrency problems and even an infinite loop that makes the system unusable.

Hence the JDK’s specialized ConcurrentHashMap, under the java.util.concurrent package, dedicated to concurrency issues.

Those of you who insist on seeing this have already laid the foundation for ConcurrentHashMap.

ConcurrentHashMap

ConcurrentHashMap is also available in versions 1.7 and 1.8, with slightly different implementations.

The Base 1.7

Let’s take a look at the 1.7 implementation. Here’s how it looks:

As shown in the figure, it consists of an array of segments and a HashEntry list.

Its core member variables:

/ * *

Segment array: Store data in a specific Segment.

* /

final Segment[] segments;

transient Set keySet;

transient Set> entrySet;

The Segment is an internal class of ConcurrentHashMap.

static final class Segment<K,V> extends ReentrantLock implements Serializable {

private static final long serialVersionUID = 2249069246763182397L;

// Just like HashEntry in HashMap, it is a bucket that actually holds data

transient volatile HashEntry[] table;

transient int count;

transient int modCount;

transient int threshold;

final float loadFactor;

}

Take a look at the composition of HashEntry:

Much like a HashMap, the only difference is that the core data, such as value, and the linked list are volatile, ensuring visibility upon fetching.

In principle, ConcurrentHashMap uses Segment locking technology, where Segment inherits from ReentrantLock. ConcurrentHashMap supports concurrent CurrencyLevel (number of segments). Every time a thread holds a lock on one Segment, it does not affect the other segments.

Let’s also look at the core put Get method.

Put method

public V put(K key, V value) {

Segment s;

if (value == null)

throw new NullPointerException();

int hash = hash(key);

int j = (hash >>> segmentShift) & segmentMask;

if ((s = (Segment)UNSAFE.getObject // nonvolatile; recheck

(segments, (j << SSHIFT) + SBASE)) == null) // in ensureSegment

s = ensureSegment(j);

return s.put(key, hash, value, false);

}

First, locate the Segment by key, and then perform specific PUT in the corresponding Segment.

final V put(K key, int hash, V value, boolean onlyIfAbsent) {

HashEntry node = tryLock() ? null :

scanAndLockForPut(key, hash, value);

V oldValue;

try {

HashEntry[] tab = table;

int index = (tab.length – 1) & hash;

HashEntry first = entryAt(tab, index);

for (HashEntry e = first;;) {

if (e ! = null) {

K k;

if ((k = e.key) == key ||

(e.hash == hash && key.equals(k))) {

oldValue = e.value;

if (! onlyIfAbsent) {

e.value = value;

++modCount;

}

break;

}

e = e.next;

}

else {

if (node ! = null)

node.setNext(first);

else

node = new HashEntry(hash, key, value, first);

int c = count + 1;

if (c > threshold && tab.length < MAXIMUM_CAPACITY)

rehash(node);

else

setEntryAt(tab, index, node);

++modCount;

count = c;

oldValue = null;

break;

}

} finally {

unlock();

}

return oldValue;

}

The value in HashEntry is volatile, but atomicity of concurrency is not guaranteed, so the put operation still requires locking.

The first step is to try to acquire the lock. If the lock fails and there is no doubt that other threads are competing, use the spin of scanAndLockForPut() to acquire the lock.

MAX_SCAN_RETRIES

Combine this with the diagram to see the flow of PUT.

Locate the table in the current Segment to a HashEntry using the hashcode of the key.

The HashEntry is iterated over, and if it is not empty it determines whether the passed key is equal to the currently iterated key, and if it is, the old value is overwritten.

If it is not empty, create a HashEntry and add it to the Segment. At the same time, it determines whether to expand the Segment.

The lock on the current Segment obtained in 1 is finally unlocked.

The get method

public V get(Object key) {

Segment s; // manually integrate access methods to reduce overhead

HashEntry[] tab;

int h = hash(key);

long u = (((h >>> segmentShift) & segmentMask) << SSHIFT) + SBASE;

if ((s = (Segment)UNSAFE.getObjectVolatile(segments, u)) ! = null &&

(tab = s.table) ! = null) {

for (HashEntry e = (HashEntry) UNSAFE.getObjectVolatile

(tab, ((long)(((tab.length – 1) & h)) << TSHIFT) + TBASE);

e ! = null; e = e.next) {

K k;

if ((k = e.key) == key || (e.hash == h && key.equals(k)))

return e.value;

}

return null;

}

Get logic is simple:

Just Hash the Key to the Segment and Hash it to the element.

Because the value attribute in HashEntry is decorated with a volatile keyword to ensure memory visibility, it is retrieved with the latest value each time.

The Get method of ConcurrentHashMap is very efficient because the entire process does not require locking.

The Base 1.8

1.7 has solved the concurrency problem and can support N Segment concurrency, but there are still problems with HashMap in 1.7.

Query traversal is inefficient.

So 1.8 has made some changes to the data structure.

First, let’s take a look at the underlying structure:

Does this look similar to the 1.8 HashMap structure?

The original Segment Segment lock is abandoned, and CAS + synchronized is used to ensure concurrency security.

Also change the HashEntry that holds the data in 1.7 to Node, but it does the same thing.

Val Next is volatile to ensure visibility.

Put method

Let’s focus on the put function:

hashcode == MOVED == -1

TREEIFY_THRESHOLD

The get method

Based on the calculated Hashcode addressing, the value is returned directly if it is on the bucket.

If it’s a red-black tree you get the value as a tree.

If you don’t, you just iterate through the list to get the value.

1.8 has made major changes in the data structure of 1.7, adopting red-black tree to ensure query efficiency (O(logn)), and even cancelled ReentrantLock and changed to synchronized. You can see that synchronized is well optimized in newer versions of the JDK.

conclusion

After looking at the different implementations of HashMap and ConcurrentHashMap in 1.7 and 1.8, you should have a better understanding of them.

In fact, this is also the focus of the interview, the usual routine is:

Talk about what you understand about hashmaps, and talk about the get put process.

What optimizations have been made in 1.8?

Is it thread safe?

What are the problems caused by insecurity?

How to solve it? Is there a thread-safe concurrency container?

How is ConcurrentHashMap implemented? 1.7. What are the differences in 1.8 implementations? Why do you do that?

This is a list of questions that I believe you will be able to write back to the interviewer after reading it carefully.

In addition to interview questions, there are many common applications. For example, the implementation of the Cache in Guava is based on the idea of ConcurrentHashMap.

You can also learn about optimization ideas and concurrency solutions from JDK authors.

In fact, the premise of writing this article is derived from a Issues on GitHub, and I hope you can participate in the joint maintenance of this project.

HashMap? ConcurrentHashMap? I believe no one can stop you after reading this article!

Related Posts

What’s the difference between Redis and Memcached?

Select front-end basic questions, the confidence of the interview comes from a good foundation (including the answer)

Java Concurrency practices (vii) Cancel and close