The original

Java collections are our most frequently used tool and a hot topic for interviews, but our understanding of them is limited to usage, and in most cases we haven’t considered their usage specifications. This series of articles will follow the source code thinking, analyzing every detail of the implementation, in order to avoid various irregularities in use. Here, we are amazed by the great design of our developers, grateful for the hard work of our predecessors, and more importantly, aware of how to make fewer mistakes and write great code.

Many people have a violent understanding of collection classes, using ArrayList when you need to save objects, HashMap when you need to save key-value pairs, HashSet when you need to be non-repeatable, and so on. And the way of use is relatively simple:

List<String> list = new ArrayList<>();

Map<String, String> map = new HashMap<>();

Set<String> set = new HashSet<>();

// ...
Copy the code

Here we won’t consider multithreading security problems, this problem usually have special class implements, or can be Collections. SynchronizedXXX methods to solve. Besides, can we really use collections so simply?

If the data is only a few hundred or a few thousand, it doesn’t make much difference how you implement it. But when we need to process large orders of magnitude of data, the efficiency of different approaches can vary by a hundred times or more, and performance becomes particularly important. For example, for the 1 million pieces of data stored in ArrayList and LinkedList, fetching the element at position I can be done instantaneously in ArrayList and may take seconds in LinkedList. At this point, which collection class to use and how to use it reasonably are the skills we must master.

Why read this series

If you use collections like this, or if you don’t know how to optimize collections, you should read this series. If you only have a few points that are not clear, you can also find the answer here. Or if you just don’t want to read the boring source code, but are curious about the principles, you can read this series. If you just want to handle the interview, I think you’ll feel like it’s not that important when you stick to reading these articles.

This series of articles is based on an in-depth understanding of the principles and implementation of Java collections. By the end of these articles you will have gained the following knowledge:

  • Extensive knowledge of data structures.

  • ArrayList has so many constructors, does it make a difference to use different constructors?

  • How does ArrayList scale up?

  • How does LinkedList provide the ability to retrieve data by location, and is it really very inefficient to query?

  • Can you implement queues with arrays?

  • What are the factors that affect the performance of a HashMap?

  • How is a complex red-black tree implemented?

  • What is the underlying principle of LRUCache?

Overview of Basic Knowledge

The operation of data is to add, delete, change, search, and obtain data according to location at some time, sometimes it may also need to sort. Change and search can be understood as consistent operations, because to modify a piece of data, you need to find it and then replace it. Next, we will briefly analyze several widely used data structures from the three points of add, delete and check.

An array of


An array occupies a contiguous chunk of memory in which all data is sequentially arranged. Its size is fixed, which makes arrays unfriendly for insertion operations, as we’ll see when we analyze arrayLists. But arrays are extremely location-friendly, and they support the so-called RandomAccess feature, which allows location-based operations to be completed quickly in O(1) time. The order of array data is the same as the order of insertion, so the query operation needs to traverse, and its time complexity is O(n).

So the biggest advantage of arrays is location-based access, which is weak in extensibility.

The list


Unlike arrays, linked lists represent the location of data through pointer fields, so the complexity of inserting data into the head or tail of a linked list is only O(1). Linked lists don’t have RandomAccess, so they don’t provide location-based access. Its query operation must also be traversed from beginning to end, with a complexity of O(n).

So the biggest advantage of linked lists is insertion, while the performance of queries is mediocre.

Is there a structure that combines the advantages of arrays and linked lists so that both queries and inserts perform well? The answer is yes, this is a hash table.

Hash table


A HashTable is a HashTable that stores data in a key-value format. HashMap and HashTable are based on it.

Arrays and linked lists perform poorly in queries because they do not remember the location of data, so they can only be compared with the data to be queried and stored. Hash tables use a clever way to reduce or even avoid this sort of alignment by converting any key to an int through a function that can be quickly located only once per lookup. Is this process like looking up a dictionary?

Hash tables are not perfect because there is no single function that guarantees that all key conversions will be different, which is called hash collisions, and it must depend on other data structures, as we’ll learn more about in future articles.

A well-designed hash table can make the time complexity of add, delete, and search operations O(1).

Binary sort tree


Binary sort trees are another solution to the query problem. If the data is ordered at insertion time, dichotomy can be used at query time. The principle of dichotomy is very simple, such as guessing a number between 0 and 100, the first guess 50 can directly exclude half of the data, each time according to this rule can quickly get the correct answer. The time complexity of dichotomy is O(lg n).

The structure of a tree has natural support for dichotomy (but this is not the most important use of trees). Binary sort trees sacrifice some insertion time, but improve query speed, and ordered data can be done with other operations. We should consider this structure if the operational importance of the query exceeds that of the insert. Binary sorting trees also have some problems of efficiency loss caused by imbalance, so with AVL tree, red black tree, B tree for database index, B+ tree and other concepts, knowledge about binary sorting trees will also be introduced in the following articles.

The analysis process

Above introduced the data structure of knowledge is the foundation of our understanding of Java collection classes, master the core principles, we analysis the collection class source code won’t difficulty, we will start with these data structures are briefly introduced, and the other does not involve the concept of this series of articles has nothing to do, you can consult relevant professional books to carry on the system study.

Because the source code of the collection class is very large, from the interface abstract design to the concrete implementation involves dozens of classes, we can not analyze every line of code, some points analyzed in the previous part will also be skipped in the follow-up part, but for the points we should pay attention to will be detailed interpretation. Some of the code is too complex and is visually illustrated with diagrams to help you understand the mechanics.

There will inevitably be a lot of source code pasted into the article, but all sections will be annotated in detailed Chinese. In addition, the pasted code is not cut (or deleted if it’s not necessary), so it’s easy to understand and you don’t have to look for a line of code in the source code.

Learning the realization of the source code is only one of our purposes, we should master the author’s excellent programming ideas, understand the original intention of doing so, stand in a higher Angle of thinking.

This series of article source code all based on JDK1.8, different versions of the implementation of the code may be slightly different, but the core idea is the same, I hope you do not be specific implementation with the way.

The Java Collection class is divided into two main parts: Collection and Map. Collection is mainly composed of List, Queue and Set modules. This series of articles will build on that structure as well, taking a look at some of the data structures used and then demythologize the collection step by step from interface abstraction to implementation.

Because the structure of a Set is exactly the same as that of a Map, and the interior of a Set is based on the corresponding Map, so you only need to know what a Set is, its specific implementation if interested in reading the source code.

This series of articles does not consider the issue of multithreaded security; the issue of multithreading is very complex and will be explored in the future.

This series of more than 20 articles will take some patience to finish, but I’m sure I’ll have a better understanding of data structures and collections and a better idea of what to pay attention to when using them.

In addition, due to my limited ability, if there is any unclear expression or wrong explanation in the article, I hope you can give criticism and correction.

The directory structure

This series of articles will be structured as follows:

  • The data structure

  • Iterable overview

  • Description of the Collection

  • List series analysis

  • Queue series analysis

  • Map overview and series analysis

  • Introduction of the Set

Here are the links to the full article:

Java collection source code analysis foundation (a) : arrays and linked lists

Java set source code analysis of the foundation (two) : hash table

Java collection source code analysis foundation (three) : tree and binary tree

Java set source code analysis of the basis (four) : binary sort tree

Java set source code analysis foundation (five) : balanced binary Tree (AVL Tree)

Java collection source code analysis foundation (six) : red black Tree (RB Tree)

Iterable overview of Java collection source code analysis

Java Collection source code analysis super interface: Collection

Java collection source code analysis of the List (a) : superinterface List

Java collection source code analysis of List (two) : ArrayList

Java collection source code analysis of Queue (a) : the super interface Queue

Java collection source code analysis Queue (2) : interface Deque

Java Collection source analysis Queue (iii) : ArrayDeque

Java collection source code analysis LinkedList

Java collection source code analysis Map (a) : super interface Map

Java collection source analysis of Map (two) : SortedMap interface

Java collection source code analysis Map (three) : interface NavigableMap

Java collection source code analysis Map (4) : TreeMap

Java collection source code analysis Map (five) : HashMap

Java collection source code analysis Map (six) : LinkedHashMap

Set overview of Java collection source code analysis

This series of articles has been updated. Thank you for your attention

This is the end of this article, if you like my article, you can follow my wechat official account: Big Paper Plane

Or scan the qr code below to add directly:

You can also pay attention to my github:https://github.com/LtLei/articles

The path of programming is long and difficult. However, the road ahead is long, I see no end, I will search high and low.