LinkedHashMap Java 8 source code detailed analysis

From Tian Xiaobo’s blog

1. An overview of the

LinkedHashMap inherits from HashMap, which solves the problem that HashMap cannot always maintain the same traversal order and insert order by maintaining a bidirectional linked list. In addition, LinkedHashMap provides support for access order. This feature is useful in some scenarios, such as caching. In terms of implementation, many methods of LinkedHashMap are directly inherited from HashMap, and only some methods are overridden for maintaining bidirectional linked lists. Therefore, to understand the source of LinkedHashMap, you need to understand the source of HashMap. The source code analysis of HashMap is not covered in this article. You can refer to my previous article “HashMap Source Code Analysis (JDK1.8)”. In that article, I illustrated more than a dozen images to help you learn the HashMap source code.

This article is structured differently from my previous two source analysis articles on Java collection classes (collection frameworks). Instead of analyzing the basic operations of collection classes (find, traverse, insert, delete), this article will focus on the maintenance of bidirectional linked lists. It includes the establishment process of linked list, the process of deleting nodes, and the process of maintaining access sequence. All right, let’s move on to analysis.

Principle 2.

The previous chapter said that LinkedHashMap inherits from HashMap, so its underlying structure is still based on a pulled hash structure. The structure consists of an array and a linked list or a red-black tree. The schematic diagram of the structure is roughly as follows:

LinkedHashMap adds a two-way list to the LinkedHashMap structure, which preserves the insertion order of key-value pairs. At the same time, the related logic of access order is realized by the operation of linked list. Its structure may look like this:

In the figure above, the light blue arrow represents the precursor reference and the red arrow represents the successor reference. Whenever a new key-value pair is inserted, the new node ends up following the node pointed to by the tail reference. The tail reference is moved to the new node, and a bidirectional list is established.

The above structure is not too hard to understand, although the introduction of red-black trees makes the structure look slightly more complex. But you can ignore the red-black tree and just focus on the list structure itself. All right, let’s get into the details.

3. Source code analysis

3.1 Inheritance system of Entry

Before we dive into the core, here’s a quick look at the key-value node inheritance system. Let’s start with the inheritance architecture diagram:

At first glance, the inheritance system is a bit complicated and confusing. TreeNode, the inner class of HashMap, does not inherit from its Node inner class, but from Node’s subclass LinkedHashMap inner class Entry. There’s a reason for doing that, but I’m not going to do that. First, a brief explanation of the inheritance system above. The LinkedHashMap inner class Entry inherits from the HashMap inner class Node with two new references, before and After. The purpose of these two references is not hard to understand, which is to maintain bidirectional linked lists. At the same time, TreeNode inherits Entry from the LinkedHashMap inner class and has the ability to form a linked list with other entries. But there’s one thing you need to think about here. When we use HashMap, TreeNode does not need to be able to form linked lists. If you inherited the LinkedHashMap inner class Entry, TreeNode would have two more references that are not needed. Wouldn’t that be a waste of space? To recap the problem (limited and not guaranteed), this is a waste of space, but it’s worth it compared to TreeNode’s ability to form linked lists through inheritance. In the comment for the design idea of HashMap, there is a paragraph like this:

Because TreeNodes are about twice the size of regular nodes, we use them only when bins contain enough nodes to warrant use (see TREEIFY_THRESHOLD). And when they become too small (due to removal or resizing) they are converted back to plain bins. In usages with well-distributed user hashCodes, tree bins are rarely used.

The TreeNode object is about twice the size of a normal Node object, and we only use it when the bucket contains enough nodes. When the number of nodes in the bucket decreases (depending on deletion or expansion), the TreeNode is converted to Node. When user-implemented hashCode methods are well distributed, tree-type buckets are rarely used.

From the above notes, we can see. In general, as long as hashCode isn’t badly implemented, Node lists are rarely turned into red-black trees of Treenodes. That means TreeNode isn’t used very much, and wasting that space is acceptable. If TreeNode is inherited from the Node class, Node needs to inherit Entry, the inner class of LinkedHashMap, in order for it to have the ability to form linked lists. At this point, the gain is not worth the loss, wasting a lot of space to acquire capabilities that are not necessarily needed.

At this point, you should be able to understand the inheritance system of node types. Here alone out to say, for the following analysis to pave the way. The narrative is a little long, forgive me.

3.1 Establishment process of linked list

The process of creating a LinkedHashMap starts when key-value pairs are inserted. Initially, the LinkedHashMap is created by having both the head and tail references point to the new node. As new nodes are inserted, the list is updated by appending new nodes to the tail reference pointing to the node.

The collection class of Map type inserts key-value pairs through the PUT (K,V) method. LinkedHashMap itself does not override the parent class’s PUT method, but uses the parent class’s implementation directly. In HashMap, however, the PUT method inserts nodes of type Node of the HashMap inner class, which do not have the ability to form a linked list with Entry and its subtypes. So how does LinkedHashMap build the list? Before expanding the description, let’s look at the code for the LinkedHashMap insert operation:

Public V put(K key, V value) {return putVal(hash(key), key, value, false, true); } // Implement final V putVal(int hash, K key, V value, Boolean onlyIfAbsent, Boolean evict) {Node<K,V>[] TAB; Node<K,V> p; int n, i; if ((tab = table) == null || (n = tab.length) == 0) {... If ((p = TAB [I = (n-1) & hash]) == null) {... } else { Node<K,V> e; K k; if (p.hash == hash && ((k = p.key) == key || (key ! = null && key.equals(k)))) e = p; else if (p instanceof TreeNode) {... } else {for (int binCount = 0; ; ++binCount) {// If ((e = p.ext) == null) {p.ext = newNode(hash, key, value, null); if (binCount >= TREEIFY_THRESHOLD - 1) {... } break; } / / insert node already exists in the singly linked lists the if (e.h ash = = hash && ((k = e.k ey) = = key | | (key! = null && key.equals(k)))) break; p = e; } } if (e ! = null) { // existing mapping for key V oldValue = e.value; if (! onlyIfAbsent || oldValue == null) {... } afterNodeAccess(e); // Return oldValue; } } ++modCount; if (++size > threshold) {... } afterNodeInsertion(evict); // return null; } // implement Node<K,V> newNode(int hash, K key, V value, Node<K,V> next) {return newNode <>(hash, key, value, next); } // LinkedHashMap overwrite Node<K,V> newNode(int hash, K key, V value, Node<K,V> e) { LinkedHashMap.Entry<K,V> p = new LinkedHashMap.Entry<K,V>(hash, key, value, e); // attach the Entry to the end of the bidirectional list linkNodeLast(p); return p; } private void linkNodeLast(linkedhashmap. Entry<K,V> p) {linkedhashmap. Entry<K,V> last = tail; tail = p; If (last == null) head = p; Else {// attach the new node p to the end of the list. last.after = p; }}Copy the code

Above is the source code associated with the LinkedHashMap insertion, leaving out some of the non-critical code. From the code above, I can see how the LinkedHashMap insert operation is called. As follows:

I’ve highlighted the newNode method in red, which is the key step. LinkedHashMap overrides this method. In this method, the LinkedHashMap creates an Entry and attaches the Entry to the end of a bidirectional list using the linkNodeLast method. Once the bidirectional list is set up, we can walk through the LinkedHashMap in insert order, and you can write your own test code to verify the insert order.

That is the correlation analysis of the insertion order maintained by LinkedHashMap. At the end of this section, a few additional things are added. If you look closely at the code above, you’ll notice that there are two methods that start with after that are not mentioned above. In the JDK 1.8 HashMap source, there are three related methods:

// Callbacks to allow LinkedHashMap post-actions
void afterNodeAccess(Node<K,V> p) { }
void afterNodeInsertion(boolean evict) { }
void afterNodeRemoval(Node<K,V> p) { }
Copy the code

As you can see from the comments on these three methods, the purpose of these methods is to give LinkedHashMap a chance to do some post-action via callbacks after adding, deleting, and so on. The specific implementation of the above three methods is in the LinkedHashMap. This section will not analyze the implementation, and relevant analysis will be carried out in the subsequent chapters.

3.2 Deletion process of linked list nodes

As with the insert operation, the code associated with the LinkedHashMap delete operation is implemented directly in the parent class. When a node is deleted, the deletion logic of the parent class does not fix the bidirectional linked list maintained by LinkedHashMap, which is not its job. So after the deletion and node, how to remove the deleted node from the double-linked list? Of course, there are options. The last section ended with the three callback methods in HashMap that run LinkedHashMap in response to some operation. So, after the node was deleted, the callback method vide-deremoval was called; LinkedHashMap overrides this method and completes the removal of the deleted node in this method. The relevant source code is as follows:

Public V remove(Object key) {Node<K,V> e; return (e = removeNode(hash(key), key, null, false, true)) == null ? null : e.value; } // Final Node<K,V> removeNode(int hash, Object key, Object value, Boolean matchValue, boolean movable) { Node<K,V>[] tab; Node<K,V> p; int n, index; if ((tab = table) ! = null && (n = tab.length) > 0 && (p = tab[index = (n - 1) & hash]) ! = null) { Node<K,V> node = null, e; K k; V v; if (p.hash == hash && ((k = p.key) == key || (key ! = null && key.equals(k)))) node = p; else if ((e = p.next) ! = null) { if (p instanceof TreeNode) {... } else {/ / traverse singly linked lists, look for to delete nodes, and assign a value to the node variables do {the if (e.h ash = = hash && ((k = e.k ey) = = key | | (key! = null && key.equals(k)))) { node = e; break; } p = e; } while ((e = e.next) ! = null); } } if (node ! = null && (! matchValue || (v = node.value) == value || (value ! = null && value.equals(v)))) { if (node instanceof TreeNode) {... } else if (node == p) TAB [index] = node.next; else p.next = node.next; ++modCount; --size; afterNodeRemoval(node); // Call the delete callback method for subsequent operations return node; } } return null; } // unlink linkedhashmap.entry <K,V> p = void void visting deremoval (Node<K,V> e) {// unlink linkedhashmap.entry <K,V> p = (LinkedHashMap.Entry<K,V>)e, b = p.before, a = p.after; P.before = p.after = null; If (b == null) head = a; else b.after = a; If (a == null) tail = b; // if (a == null) tail = b; else a.before = b; }Copy the code

The deletion process is not complicated, and all this code does is do three things:

Locate the bucket based on the hash
Iterate through a linked list or call a red-black tree-related delete method
Remove the node to be deleted from the double-linked list maintained by LinkedHashMap

As an example, let’s say we want to delete the key 3 node in the figure below.

Locate the node to bucket 3 based on the hash, and iterate over the single-linked list saved by bucket 3. Once you find the node to delete, remove it from the singly linked list. As follows:

Then remove the node from the bidirectional list:

The deletion and related repair process is not complicated and should be easy to understand with the above pictures, so I won’t go into details here.

3.3 Maintenance procedure of access sequence

In this section, we will talk about the access order. By default, LinkedHashMap maintains the linked list in insertion order. However, we can initialize the LinkedHashMap by specifying the accessOrder parameter true to maintain the list in accessOrder. Access order is not complicated in principle. When we call methods like GET /getOrDefault/replace, we simply move the nodes accessed by these methods to the end of the list. The corresponding source code is as follows:

Public V get(Object key) {Node<K,V> e; if ((e = getNode(hash(key), key)) == null) return null; If (accessOrder) afterNodeAccess(e); if (accessOrder) afterNodeAccess(e); return e.value; Void afterNodeAccess(Node<K,V> e) {// move Node to last linkedhashmap.entry <K,V> last; if (accessOrder && (last = tail) ! = e) { LinkedHashMap.Entry<K,V> p = (LinkedHashMap.Entry<K,V>)e, b = p.before, a = p.after; p.after = null; If (b == null) head = a; // If (b == null) head = a; else b.after = a; if (a ! = null) a.before = b; /* * last = b; /* * last = b; /* * last = b; if (last == null) head = p; Else {// place p at the end of the list. last.after = p; } tail = p; ++modCount; }}Copy the code

The above is access sequence implementation code, not complex. Here are some examples to help you understand. Suppose we access the node whose key value is 3 as shown in the following figure. The pre-access structure is:

Once accessed, the node with key value 3 is moved to the end of the bidirectional list, and its precursors and successors are updated. The structure after the visit is as follows:

3.4 Cache based on LinkedHashMap

How LinkedHashMap maintains insert and access order should give you some idea of how LinkedHashMap works. In this section, we’ll write some code to put things into practice, implementing a simple LRU policy cache by inheriting LinkedHashMap. Before we write the code, let’s cover the basics.

In section 3.1, I deliberately neglected to analyze some of the source code. In this section, we will fill in the omitted parts and look at the source code first:

void afterNodeInsertion(boolean evict) { // possibly remove eldest LinkedHashMap.Entry<K,V> first; If (evict && (first = head)! = null && removeEldestEntry(first)) { K key = first.key; removeNode(hash(key), key, null, false, true); }} protected Boolean removeEldestEntry(map. Entry<K,V> younger) {return false; }Copy the code

The core logic of the source code above is not normally executed, so it was not analyzed previously. The above code does a relatively simple job of deciding whether to remove the least recently accessed node using some criteria. From here, you should know what these two methods are for. When we implement caching based on LinkedHashMap, we can implement LRU caching for custom policies by overriding the removeEldestEntry method. For example, we can determine whether to remove the least recently accessed node according to the number of nodes, or whether to remove the node according to the survival time of the node. The cache implemented in this section is based on a strategy to determine whether the number of nodes exceeds the threshold. When constructing the cache object, the maximum number of nodes is passed. When the number of nodes inserted exceeds the maximum number of nodes, the least recently accessed node is removed. The implementation code is as follows:

public class SimpleCache<K, V> extends LinkedHashMap<K, V> { private static final int MAX_NODE_NUM = 100; private int limit; public SimpleCache() { this(MAX_NODE_NUM); } public SimpleCache(int limit) {super(limit, 0.75f, true); this.limit = limit; } public V save(K key, V val) { return put(key, val); } public V getOne(K key) { return get(key); } public boolean exists(K key) { return containsKey(key); } /** * whether the number of nodes exceeds the limit * @param eldest * @return Return false */ @override protected Boolean removeEldestEntry(map.entry <K, V> younger) {return size() > limit; }}Copy the code

The test code is as follows:

public class SimpleCacheTest { @Test public void test() throws Exception { SimpleCache<Integer, Integer> cache = new SimpleCache<>(3); for (int i = 0; i < 10; i++) { cache.save(i, i * i); } system.out.println (" After inserting 10 key-value pairs, cache contents: "); System.out.println(cache + "\n"); System.out.println(" Cache contents after accessing node 7: "); cache.getOne(7); System.out.println(cache + "\n"); System.out.println(" Cache contents after insert key = 1: "); cache.save(1, 1); System.out.println(cache); }}Copy the code

The test results are as follows:

In the test code, set the cache size to 3. After 10 key-value pairs were inserted into the cache, only the last three were saved and the others were removed. The node with key value 7 is then moved to the end of the bidirectional list by accessing it. When we insert a key pair again, the node with the key value of 7 is not removed.

This section complements the previous one with a brief introduction to the other uses of LinkedHashMap. The content of this section and the related code is not difficult to understand, so I won’t repeat it here.

4. To summarize

This paper analyzes the source of LinkedHashMap from the point of view of maintaining a bidirectional linked list of LinkedHashMap, and at the end of the article based on LinkedHashMap to achieve a simple Cache. LinkedHashMap is not used as often as HashMap in daily development, but it is an important implementation. In the Java collection framework, HashMap, LinkedHashMap and TreeMap map classes are based on different data structures and realize different functions. The underlying HashMap is based on a pull-down hash structure, and red-black trees were introduced in JDK 1.8 to optimize long lists. Based on this structure, HashMap provides efficient add, delete, change and query operations. LinkedHashMap, on top of this, implements an ordered traversal of hash data structures by maintaining a bidirectional linked list. The bottom layer of TreeMap is based on red-black tree, which uses the properties of red-black tree to implement key-value pair sorting. In my previous articles, I examined HashMap and TreeMap and the red-black trees they both use in detail, for those interested.

At this point, this article is finished, thank you for reading!