1. Map collection of the container

In the source code of collection system, the design of HashMap in Map is the most classic, involving data structure, programming idea, hash calculation and so on. It is necessary to refer to some ideas of source code in daily development.

  • Basis: element add delete, container information;
  • Advanced: storage structure, capacity, hash;

The API system

Of all the maps and sets APIs, the most important is how HashMap is implemented:

  • HashMap: manages elements based on a hash table;
  • LinkedHashMap: Based on HashMap and bidirectional linked list;
  • HashSet: The underlying maintenance of the HashMap structure;
  • LinkedHashSet: Inherit HashSet, bidirectional linked list;

So in the series of maps and sets, except for the special API, the rationale is that they rely on HashMap, but they apply different characteristics of element management in their respective implementations.

Second, data structure

Before you look at HashMap, understand one data structure: the array + linked list structure.

The location of elements is managed based on array, and the storage of elements forms a linked list structure. Since it is a linked list, it can be a single and double direction structure, which needs to be analyzed according to specific API. Through this structure, several key information can be obtained:

  • Capacity expansion: Based on array, the problem of capacity expansion is faced;
  • 1. A mechanism for forming a linked list structure;
  • Hash: Hash value calculation and conflict processing;

3, HashMap detail

1. Structure encapsulation

Since the above simple description of the array + linked list structure, then from the source point of view is how to encapsulate:

transient Node<K,V>[] table;

The array structure variable in the HashMap is named table and is based on Node

Node:
,v>

static class Node<K,V> implements Map.Entry<K,V> {
    final int hash;
    final K key;
    V value;
    Node<K,V> next;
}

Implement the Map.Entry interface, and define the node structure variables, and the node itself related methods.

2. Construction method

Once you know the infrastructure in a HashMap, you can look at its associated constructors and what variables are initialized:

No arguments structure

public HashMap() {
    this.loadFactor = DEFAULT_LOAD_FACTOR;
}
  • Float DEFAULT_LOAD_FACTOR = 0.75 f;
  • this.loadFactor = DEFAULT_LOAD_FACTOR;

There is actually one core parameter to focus on:

static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; / / 16

That is, DEFAULT_INITIAL_CAPACITY of the array is 16 by default, and loadFactor is 0.75 by threshold, which means that the array will be expanded when the number of elements reaches 12.

Have the cords structure

Two parameters can also be set by the parameter constructor: the threshold of capacity and capacity expansion:

public HashMap(int initialCapacity, float loadFactor) ;

The source code for both constructors shows that when a new HashMap is created directly, the hash array is not initialized immediately, but key variables can be customized.

3. Load elements

Follow the usage of the HashMap to see the element addition:

public V put(K key, V value) {
    return putVal(hash(key), key, value, false, true);
}

Instead of doing much direct work with PUT, two key methods are called:

  • Hash () : Calculates the hash value of the key;
  • PutVal () : element addition process;

Here we must look at one key method, the hash calculation:

static final int hash(Object key) {
    int h;
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}

Instead of directly obtaining the return value of hashCode in the Object, calculating the hashCode value corresponding to the key, and shifting the value of hashCode value to the right by 16 bits, and performing XOR operation on the two results, thereby reducing the probability of hash conflict.

Looking at the putVal() method, this is a pretty neat operation:

Summary of the core steps:

  • Execute the judgment for the first time and initialize the underlying array.
  • Add elements based on the result of the hash calculation;
  • According to the capacity after adding elements to judge whether capacity expansion;

One more thing to note here:

HashMap deals with hash conflicts based on red-black tree. If there are too many hash conflicts, the query performance of O(n) will be greatly affected. When the number of conflicting elements in the linked list of conflicting nodes reaches 8 and the length of array reaches 64, the red-black tree structure will be used instead of the linked list to deal with the query performance problem of hash conflicts. The previous article on the tree structure can be moved.

4, automatic capacity expansion

The core mechanism of a container that can add elements within a certain boundary is capacity expansion. HashMap capacity expansion follows the principle of minimum availability, and of course, automatic capacity expansion is triggered when the capacity reaches a threshold.

Threshold: threshold=capacity*loadFactor, which is 16*0.75=12 by default.

Core method: resize;

Summary of the core steps:

  • Determining the boundary parameter of capacity expansion: threshold;
  • Core parameter calculation: capacity and threshold;
  • Create a new empty array based on the new parameter.
  • If the original array is null, the procedure can be understood as initialization;
  • If the original array is not null, expand and migrate the data.

It is obvious that array scaling is inefficient, so in daily development, you can estimate the size of the HashMap in advance, ensure that the threshold is greater than the number of elements stored, and avoid multiple scaling operations as much as possible.

5. Query elements

GetNode search method, through the calculation of hash value, and then through the array, red-black tree, linked list to traverse the query:

6. Delete elements

RemoveNode delete method, first through the calculation of hash value, to find the node to be deleted, and then determine the index location is a red-black tree or linked list structure, respectively to execute their own delete process:

7. Supplementary Notes

Here’s a quick overview of two methods: hashCode() and equals(). Generally speaking, you need to override hashCode when you override equals.

Both methods can be used to compare whether two objects are equal or not. However, there may be conflicts between hash values of two objects. In this case, equals can be used to determine whether the object value is the same, == can be used to determine the value of the object, and address can be used to determine the reference object.

In the structure of a HashMap, if the hash values of the linked list are the same, the equals method is used to determine whether the hash values are the same, and to find the corresponding object.

Four, the source code address

Making address GitEE, https://github.com/cicadasmile/java-base-parent, https://gitee.com/cicadasmile/java-base-parent

Read labels

【 JAVA Foundation 】【 Design Patterns 】【 Structure and Algorithms 】【Linux System 】【 Database 】

[Distributed Architecture] [Micro Services] [Big Data Components] [SpringBoot Advanced] [Spring&Boot Foundation]

【 Data Analysis 】【 Technical Map 】【 Workplace 】