1. An overview of the

HashMap is one of the commonly used classes in daily Java development. It is a very classic class in Java design. Its ingenious design idea and implementation, as well as the data structure and algorithm involved, are worth our in-depth study.

A HashMap is a hash table that implements a Map interface based on a hash table. It stores key-value mappings that are allowed to be null(only one is allowed for keys).

1.1 Precautions

① Store data according to the hashCode of the key. (String, Long, Double, and other wrapper classes rewrite the hashCode method. String uses ascil and Double uses its own algorithm. The structure in HashMap cannot have a primitive type. On the one hand, the primitive type does not have a hashCode method, and on the other hand, HashMap is a generic structure. Generics require the inclusion of object types, whereas primitive types are not objects in Java. ② The storage unit of HashMap is Node

, which can be considered as a Node. ③ The number of expansions in the Hashmap is for size(the total number of internal elements (nodes)), not the number of arrays. For example, if the initial capacity is 16, the thirteenth node is put in, regardless of the first twelve positions in the array, and the expansion begins.
,v>

1.2 Features of hashMap

Characteristics of the	instructions
Whether to allow duplicate data	Duplicate key overwrites value
Whether the hashMap is in order	Unordered, by which I mean that when you iterate over a HashMap, the order you get is usually not the same as the order you put in
Whether hashMap is thread safe	Non-thread-safe, because the implementation inside is not synchronous, recommended if you want thread-safe
Whether key values are allowed to be null	Both key and value can be null, but only one can be null

2. Some concepts

2.1. An operation

Bitwise operations operate on the binary bits of an integer in memory.

In Java, >> indicates a right shift. If the number is positive, the high value is 0; if it is negative, the high value is 1

<< is the opposite of moving to the left if it’s positive and filling in the low place with zeros

For example, the binary of 20 is 0001 0100 20>>2 is 0101 0000 and the result is 5(high on the left and low on the right)

20<<2 is 0101 and 0000 is 80

Difference between >>> and >> in Java

>>> indicates an unsigned right shift, also known as a logical right shift. It doesn't matter if the number is positive or negative, the high point is a complement of 0Copy the code

There are many uses of bitwise operations in the hashMap source code. Such as:

// Use 1 << 4 instead of 16 0000 0001 -> 0001 0000 instead of 16 0000 0001 -> 0001 0000 instead of 16 0000 -> 0001 0000 instead of 16 0000 -> 0001 0000 is 16. It is more efficient in the JVM. static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // Initialize the capacityCopy the code

Note: Left shift does not have the <<< operator

2.2 operators – (and (&), not (~),, or (|), or (^))

① And operation (&)

We all know that ampersand stands for bitwise and in Java, where bits are binary bits. (1) is true only if it’s both 1, otherwise it’s 0, just to give you a simple example

System.out.println(9 & 8); //1&1=1, 1&0, 0&1, 0&0 =0, so 1001 1000 -> 1000 is 8Copy the code

② Non-operation (~)

Source -> take inverse -> inverse -> add 1 -> complement -> take inverse -> by bit non – value

In Java, all data is represented in the form of complement. If there is no special explanation, the default data type in Java is int. The length of int data type is 8 bits, and one bit is 4 bytes, which is 32 bytes, 32 bits.

For example, the binary of 5 is 0101

After the complement, the value is 00000000 00000000 00000000 00000101

Inverse 1111111111 11111111 11111111 11111010

[because the high level is 1 so the source code is negative, negative complement is its absolute value of the source code, plus 1 at the end]

So let’s do the reverse and subtract 1 from the end to get the inverse and then take a negative number

1111111111 11111111 11111111 11111001

1010 minus 1 is 10-1 is 9. The last four digits are 1001.

Negative number: 00000000 00000000 00000000 00000110 is -6

System.out.println(~ 5); / / output - 6Copy the code

(3) or operations (|)

As long as one of them is 1, it’s 1, otherwise it’s 0

System.out.println(5 | 15); // The output is 15,0101 or 1111, resulting in 1111Copy the code

④ XOR operation (^)

Identical is 0(false), different is true (1)

System.out.println(5 ^ 15); // output 10 0101 xor 1111 results in 1010Copy the code

2.3 hashcode

Hash means hash, and hashcode is an int that the JDK calculates based on the address of an Object or a String or number. The Object class contains a hashcode method (native method). Some classes override this method, such as String.

Reasons for rewriting. To ensure consistency, if an object’s equals method is overridden, then the object’s hashcode() is overridden as much as possible.

In short, hashcode() and equals() must be consistent. If equals returns true, then hashcode must return the same for both objects.

Otherwise this might happen.

If a class overwrites equals (), return true if the equals attribute is equal. If the class does not overwrite hashcode (), then it must compare the memory addresses of two objects. This does not comply with the rules of HashCode, and this situation can cause a number of problems.

Therefore, in a hashMap, it is advisable to overwrite the Equals and HashCode methods of the Object class if the key uses a custom class.

2.4 the hash bucket

Hash bucket is fuzzy, the concept of personal understanding is an array of an area results in the table of the singly linked list below, in a hashmap, the singly linked list of the head is where the first element in an array, singly linked list if too long for more than eight, then the “barrel” may become red and black tree (premise is array length up to 64).

2.5 the hash function

In programming, to correspond an object to an integer by some algorithm or transformation mechanism.

Used primarily for conflict resolution.

2.6 a hash table

Also known as a hash table, this is also a data structure that produces a hashCode that is an integer based on an object.

Hash conflict

In a HashMap, the hash value is computed using the hashCode of the key. As long as the hashCode is the same, the hash value is also the same. However, there may be too many objects, and different objects may compute the same hash value. This is hash conflict.

For example

HashMap<String,String> map = new HashMap<String,String>();
map.put("Aa"."haha");
map.put("BB"."heihei");
System.out.println("Aa".hashCode()); //2112
System.out.println("BB".hashCode()); // Aa and BB are String, String class overriddenhashCode method (according to asCIL Code and a specific algorithm to calculate, although very clever but also difficult to avoid the objecthashCode same case), Aa and BBhashThe Code value is the same as the HashCode valuehashKey.hashcode () returns the same key if the key is not the samehash, resulting inhashConflict static final inthash(Object key) {// Get the key keyhashInt h;return(key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16); // Any number less than 2 to the 16th is 0 if you move it 16 places to the right. 2 to the 16th >>>16 is exactly 1. Any number and 0 are either 1 or both bitwisehashThe () function is nullhashValues are adjusted only if hashCode is greater than 2 16 times, which is clever because int is the middle of 32 bits.Copy the code

2.7 Binary search trees and red-black trees

Red-black tree is a self-balanced binary search tree. Is a data structure, also known as binary B tree, (→_→ 2b tree? Red black trees are essentially binary search trees. So let’s understand binary search trees.

2.7.1 Binary Search tree

Binary search tree, also known as ordered binary tree, sorted binary tree has the following three characteristics: 1. The value of all nodes in the left subtree is less than or equal to its root. 2. The value of all nodes in the right subtree is greater than or equal to the value of its root node. 3. The left and right subtrees are also binary sort trees.Copy the code

Binary tree. PNG

2.7.2 Red-Black Tree (RBTree)

Because binary search tree may be difficult to balance linear defects, so the concept of red black tree. As the name implies, a red-black tree is a binary tree with only red and black nodes. Its five properties are as follows. 1. The node is red or black. 2. The root node is black. 3. Every leaf node is black empty node (NIL node). 4 The two children of each red node are black. (There cannot be two consecutive red nodes on all paths from each leaf to the root) 5. All paths from any node to each of its leaves contain the same number of black nodes.Copy the code

To put it simply, red-black tree is a self-balancing binary search tree. Compared with ordinary binary search tree, its data structure is more complex, but it can also maintain good performance through self-balancing (discoloration, left and right rotation) in complex situations.

For a very graphic set of comics about red black trees, see here

Address 1 and address 2 for online simulation of red-black tree adding and deleting

The time complexity of red-black trees is as follows:

O(logn)

Its height is :[logN,logN+1] (Theoretically, extreme cases can occur where the height of RBTree reaches 2

LogN, but it’s actually hard to come across).

Also, due to its design any imbalance will be resolved in three rotations.

Comparison between red-black tree and AVL tree (the earliest self-balanced binary tree) : AVL is more balanced, the query speed is slightly stronger than red-black tree, but the insertion and deletion of red-black tree overburst AVL tree, probably becausehashThe Map is also frequently added and deleted, so the red black tree is chosen.Copy the code

Summary: A red-black tree is a self-balanced binary lookup tree that can be rotated by changing colors. For a hashMap, the advantage of using a red-black tree is that when there are multiple elements with the same hash index in the same array, the time complexity of using a red-black tree is reduced from traversing the list O(n) to O(logN).

2.8 complexity

Algorithm complexity includes time complexity and space complexity.

Time complexity: amount of computational work required to execute an algorithm Space complexity: amount of memory required to execute an algorithm Both time and space are reflections of computer resources. The complexity of an algorithm is reflected in the amount of resources required to run the algorithm.Copy the code

I’m going to focus on time complexity

(1) Time frequency is T(n) to represent the time consumed by the execution of an algorithm, which can not be calculated theoretically but can be known through running tests, but it is impossible and unnecessary to do computer tests for every algorithm, just know which algorithm takes more time and which takes less. The time taken by an algorithm is proportional to the number of times the algorithm is executed. In one algorithm, the number of statements executed is called the time frequency (or statement frequency), denoted by T(n), where n represents the size of the problem. Forget about what this T is for the moment, and think of it as a function. (2) Time complexity is denoted by O(f(n)). When n changes, time frequency T(n) will also change constantly, but it is an uncertain function, and we want to know what the law it presents. This is where the concept of time complexity is introduced. I said that T(n) is an indeterminate function, that the number of times the basic operation is repeated in the algorithm is some function of the size n of the problem. Suppose there is some auxiliary function f(n), when n approaches ∞, the limit value of T(n)/f(n) is not 0 tangent constant, then f(n) and T(n) can be considered as functions of the same order of magnitude, denoted as T(n)=O(f(n)), called O(f(n)) is the asymptotic time complexity of the algorithm, referred to as time complexity. F (n), although it's not specified, usually takes as simple a function as possible such as O(2n²+n +1) = O(3n²+n+3) = O(7n² +n) = O(n²), leaving out the coefficients and keeping only the highest order terms. When the time frequency is different, the time complexity may be the same. For example, T(n)=n²+3n+4 and T(n)=4n²+2n+1 have different frequency, but the time complexity is the same, which is O(n²). To sum up the relationship between the two: time complexity is a layer of packaging for the time frequency function, and its characteristics (big O notation) are ① omit the coefficient as 1 and ② retain the highest term if T(n) is regarded as a tree, then O(f(n)) only cares about its trunk.Copy the code

The time complexity of common algorithms is from small to large

Complexity comparison

The specific steps of solving the time complexity of the algorithm are as follows: ① Find out the basic statement that is executed most times in the algorithm, which is generally the innermost loop body. ② Calculate the order of magnitude of the basic statement ③ put the order of magnitude of the basic statement execution times into the big O notationCopy the code

Just a couple of examples

O(1), also known as constant order, generally there is no loop body in the algorithm, the number of executions is constant then the time complexity is O(1), for example

int sum = 0,n = 100; Sum = (1+n)*n/2; // Execute system.out.println (sum); If f(n)=3, then the time complexity of the algorithm is O(1) according to the big O notation.Copy the code

Why don’t we use base O of logN

For example, the search complexity of red-black tree is O(logN).

Now, one of the things that might be problematic about this is that sometimes the time complexity is calculated in terms of something like O(logN) without specifying what the base of n is, which is usually 2

In fact, this description is reasonable. The log-level time complexity of the algorithm is due to the divide-and-conquer idea, and this base is directly determined by the divide-and-conquer complexity. As n approaches infinity, the comparison between the two sizes is just a constant, so in this case O(logN) represents logarithmic complexity. \ lim_ {n \ \ infty rightarrow +} Ο (\ log_x {n}) / Ο (\ log_y {n}) = C

Other simple examples

describe	Order of magnitude increase	Typical code	instructions
Constant of the order	1	a = b + c	Common simple algorithm operations
The logarithmic order	logN	Dichotomy in binary trees	Binary strategy
Linear level	N	for(int i = 0; i < 10; i++) {… }	Common single-layer loop algorithm
Square level	N squared	for(int i = 0; i < 10; i++) {for(int j = 0; j < 10) {… }}	Two-level loops, such as bubble sort
The index level	Two to the n	For a backpack of a certain size, find all combinations of items no larger than the backpack. Suppose there are 3 items, A, B, and C, and there are 8 possible combinations. (A, B, C, AB, AC, BC, ABC + empty (backpack is too small for one)	Exhaustive search (backpack problemwww.cnblogs.com/tinaluo/p/5…)

3. Internal implementation of HashMap (based on JDk1.8)

When I first looked at the hashMap source code, I felt confused about what to write, so I had to start with its [data structure].

Different from the general class data structure, from the structure of HashMap = array + linkedList + red-black tree (1.8 added, to a large extent to optimize the performance of HashMap) arrayList array linkedList two-way linkedList query efficiency is slow, traversal, add or delete fast, For example, you can delete an element knowing that it has a reference up and down and change the reference point associated with the upper and lower elements.Copy the code

3.1 Arrays and linked lists

Arrays and linked lists. PNG

3.2 HashMap data structure (Array + linked list + red-black tree)

Before jdK8, if there are frequent collisions, the search time is O(1) + O(n). If n is large, the search performance is severely affected. In JDK8,O(1) + O(logN) is introduced.

hashmap.png

General train of thought

The advantage of array is fast query, the advantage of linked list is fast add and delete, red and black tree query performance is good,hashMaps are stored in a way that combines their advantages, sohashThe memory of a Map can be in an array or in a linked list under an array. It could be in a red-black tree. (2) We already know that HashMap exists as key-value pairs and can be of various types, so it exists in the form of key-value pairs. Its minimum storage unit is Node Node. The Node structure probably has keys, values, the index of the array in which the record is located, and things to record the linked list Pointers. Static class Node<K,V> implements map. Entry<K,V> {final inthash; final K key; V value; Node<K,V> next; . } ③ Where is the new Node? HashMap<String, String> map = new HashMap<String, String>(); map.put("1"."first"); // Put (); // Put ()hashOperation, throughhashIf the number of buckets exceeds the loading factor multiplied by the current capacity, if the number of buckets exceeds the loading factor multiplied by the current capacity, if the number of buckets exceeds the loading factor multiplied by the current capacity, if the number of buckets exceeds the loading factor multiplied by the current capacity, Resize // Notice that there is ahashFunction public V put(K key, V value) {return putVal(hash(key), key, value, false.true); } / /hashFunction static final inthash(Object key) {
   int h;
   return(key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16); } // The above code String of 1 has a Hashcode of 49 that exceeds the initial length of the HashMap of 16"1"Where is this key? Here // stored in the right place by clever design for analysishash], // where p is the Node<K,V> object, n is the current hash bucket array length, after the operation and, because this is the first element inserted, no need to expand the length of 16, then 49&15 = 1, indicating that the second position. (4) When will the expansion begin after the insertion of new nodes and the continuous insertion of elements pass throughhashAfter the function and index positions are calculated, they can be inserted into 16 different positions according to their hashing properties. When the number of elements reaches 16 * 0.75, that is, 12, the expansion begins when new elements are inserted. 5,6,7,8, and so on. For example, if the key is an Integer, there are five nodes with keys 3,19,12. At this time, 3,19 are in the same position, and 12, 28, and 44 are in the same position. At this time, 5 nodes occupy two positions. 1. Check whether the node array is empty and take its capacity (number of nodes) to create a new array. If the capacity is not empty: If the capacity exceeds the maximum value, do not expand the capacity. The default capacity of the bucket array is 16, that is, 16 buckets are stored by default. The default threshold is the default capacity multiplied by the default loading factor 12 2. If the old array is not empty, iterate over the bucket array and map the key-value pairs to the new bucket array.Copy the code

4. Source code analysis

4.1 Basic Storage Unit Node

Static class Node<K,V> implements map. Entry<K,V> {// Implements the Entry interface stores the mapping of key/value pairs final inthash; //hashFinal K key; final K key; // Use to match V value; / / value Node < K, V > next; // It is used to record the next node of the single linked list for resolutionhashConflict (i.e.,hashNode(inthash, K key, V value, Node<K,V> next) {
        this.hash = hash;
        this.key = key;
        this.value = value;
        this.next = next;
    }
    public final K getKey()        { return key; }
    public final V getValue()      { return value; }
    public final String toString() { return key + "=" + value; }
    public final int hashCode() {
        return Objects.hashCode(key) ^ Objects.hashCode(value);
    }
    public final V setValue(V newValue) {// assign V oldValue = Value; value = newValue;return oldValue;
    }
    public final boolean equals(Object o) {
        if (o == this)
            return true;
        if(o instanceof Map.Entry) { Map.Entry<? ,? > e = (Map.Entry<? ,? >)o;if (Objects.equals(key, e.getKey()) &&
                Objects.equals(value, e.getValue()))
                return true;
        }
        return false; }}Copy the code

4.2 Some important implementations of HashMap: hash functions, put, get, resize

//put
public V put(K key, V value) {
    return putVal(hash(key), key, value, false.true);
}
Copy the code

final V putVal(int hash, K key, V value, Boolean onlyIfAbsent, Boolean evict) {Node<K,V>[] TAB; Node<K,V> p; int n, i; // If it is empty, call resize to expand by 16if((tab = table) == null || (n = tab.length) == 0) n = (tab = resize()).length; // Pass (n-1) &hashThe calculation of where to put the index is very clever hereif ((p = tab[i = (n - 1) & hash]) == null) // if TAB [I] is null, create a newNode and place it in the same position. TAB [I] = newNode(hash, key, value, null);
    else{// If there is a node in the subscript, there ishashConflict Node < K, V > e; K k; // If the new node key already exists, overwrite the entire nodeif (p.hash == hash&& ((k = p.key) == key || (key ! = null && key.equals(k)))) e = p; // If it is a red-black tree nodeelse ifE = ((TreeNode<K,V>)p).puttreeval (this, TAB,hash, key, value);
        else{// if TAB [index] is empty or not, p is already on TAB [index]for (int binCount = 0; ; ++binCount) {
                if ((e = p.next) == null) {
                    p.next = newNode(hash, key, value, null); If the list length is greater than or equal to 7, call treeifyBinTree the linked listif (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                        treeifyBin(tab, hash);
                    break; } // Repeated overrides were found while iterating through the list and the loop was brokenif (e.hash == hash&& ((k = e.key) == key || (key ! = null && key.equals(k))))break; p = e; }}if(e ! = null) { // existing mappingfor key
            V oldValue = e.value;
            if(! onlyIfAbsent || oldValue == null) e.value = value; afterNodeAccess(e);returnoldValue; } } ++modCount; // Check whether the capacity reaches the threshold. For example, if the capacity reaches 16(number of buckets 16) when the 13th element is inserted, expand the capacityif (++size > threshold)
        resize();
    afterNodeInsertion(evict);
    return null;
}
Copy the code

The get method

public V get(Object key) {
    Node<K,V> e;
    return (e = getNode(hash(key), key)) == null ? null : e.value;
}

final Node<K,V> getNode(int hash, Object key) { Node<K,V>[] tab; Node<K,V> first, e; int n; K k; // Locate the key-value pair in the bucketif((tab = table) ! = null && (n = tab.length) > 0 && (first = tab[(n - 1) &hash]) != null) {
        if (first.hash == hash&& // always check first node ((k = first.key) == key || (key ! = null && key.equals(k))))return first;
        if((e = first.next) ! = null) {if(First instanceof TreeNode) // If the node is a red-black tree, use the red-black tree lookup methodreturn ((TreeNode<K,V>)first).getTreeNode(hash, key);
            do{// Search the listif (e.hash == hash&& ((k = e.key) == key || (key ! = null && key.equals(k))))return e;
            } while ((e = e.next) != null);
        }
    }
    return null;
}
Copy the code

4.4.5 the resize ()

Expansion is redefining the capacity. In a HashMap, if you keep putting elements and the array in the HashMap object cannot hold more objects, the object needs to be expanded to increase the array size. Note here:

① If the initial size is the default value 16, when to expand the capacity, we can know the threshold is 16

The 12 is the size of the hashMap (global variable, put+1.remove-1 every time). When the value is greater than 12 (13), the resize method is used to expand the capacity.

(2) In Java array is not able to automatically expand, is to use a new large array instead of the original small array, just like using a small bucket of water, if you want to use a bucket to hold more water, you can change a large bucket and then the original small bucket of water in the past.

③ After capacity expansion, all nodes on the common linked list including red-black trees have to be remapped.

For a Hashmap, when to change buckets: When to change buckets when the threshold is reached: The size of the bucket is twice the size of the original bucket, but the size of the bucket is also limited. For a Hashmap, the largest bucket can contain 2^30 numbers. (It’s actually hard to use that much capacity.)

final Node<K,V>[] resize() {// TABLE transient Node<K,V>[] table; OldTab Node<K,V>[] oldTab = table; int oldCap = (oldTab == null) ? 0 : oldTab.length; Int oldThr = threshold; int newCap, newThr = 0;if(oldCap > 0) {// If the old capacity is greater than 0 // if the old capacity is greater than the maximum, it will not be expanded, let it crash - -if (oldCap >= MAXIMUM_CAPACITY) {
            threshold = Integer.MAX_VALUE;
            returnoldTab; } // The new array is doubled before it reaches the maximum value and the threshold is multiplied by 2else if((newCap = oldCap << 1) < MAXIMUM_CAPACITY && oldCap >= DEFAULT_INITIAL_CAPACITY) newThr = oldThr << 1; // double threshold} // If the old threshold is greater than 0 and the old capacity is 0, the new capacity is set to the old thresholdelse if (oldThr > 0) 
        newCap = oldThr;
    else{// Zero initial threshold using defaults so termed // so termed newCap = DEFAULT_INITIAL_CAPACITY; newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY); } / / ifif (newThr == 0) {
        float ft = (float)newCap * loadFactor;
        newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                  (int)ft : Integer.MAX_VALUE);
    }
    threshold = newThr;
    @SuppressWarnings({"rawtypes"."unchecked"}) // Mask unnecessary warnings Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap]; table = newTab; // If the old array is not emptyif(oldTab ! = null) {// Iterate over the number groupfor(int j = 0; j < oldCap; ++j) { Node<K,V> e; // The nodes in the array are not emptyif((e = oldTab[j]) ! = null) { oldTab[j] = null; // If the bucket has only one node (there is no list, or only one list node)if(e.next == null) //e.hash (newcap-1) specifies the location of the element newTab[e.hash (newcap-1)] = e;else if(e instanceof TreeNode) // red black TreeNode ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);else<K,V> loHead = null, loTail = null; Node<K,V> hiHead = null, hiTail = null; Node<K,V> next;do{ next = e.next; // If the node is 0, the node is still in the original location. If the node is 1, the node is in the old index + oldCapif ((e.hash & oldCap) == 0) {
                            if (loTail == null)
                                loHead = e;
                            else
                                loTail.next = e;
                            loTail = e;
                        }
                        else {
                            if (hiTail == null)
                                hiHead = e;
                            elsehiTail.next = e; hiTail = e; }}while((e = next) ! = null); // The old list is migrated to the new listif(loTail ! = null) { loTail.next = null; // Set next to null newTab[j] = loHead; }if(hiTail ! = null) { hiTail.next = null; NewTab [j + oldCap] = hiHead; newTab[j + oldCap] = hiHead; } } } } }return newTab;
}
Copy the code

P = TAB [I = (n-1) & hash])

p = tab[i = (n - 1) & hash])
Copy the code

When hashCode is less than 65536, the hash is regular, and basically the position of the index is

Because less than that if you move 16 to the right, you’re either going to be 0, and you’re going to be 0, and the hashcode is going to be its own value.

This is a special value

Converted to binary: 00000000000000010000000000000000, 16 word 00000000000000000000000000000001 is not all right to 0

static final int hash(Object key) {
    int h;
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}
Copy the code

The hashcode of key is 65536

Converted to binary: h = key hashCode () 00000000000000010000000000000000

With moves to the right of 16 to make exclusive or operation. 00000000000000000000000000000001

hash = h ^(h>>>16) 00000000000000010000000000000001

Calculate the hash 00000000000000010000000000000001

00000000000000000000000000001111

Results 1

But 65536%16 = 0

The hashcode of key is 17. Different or same is 0. Different is false

Converted to binary: h = key hashCode () 00000000000000000000000000010001

With moves to the right of 16 to make exclusive or operation. 00000000000000000000000000000000

hash = h ^(h>>16) 00000000000000000000000000010001

Calculate the hash 00000000000000000000000000010001

00000000000000000000000000001111

00000000000000000000000000000001

Let’s do a little test. Let’s say the number of buckets is 16. Here’s the code

for(int key = 65533; key < 65543; Key++) {// it becomes a bit from 65536"Special"
    System.out.println("The key is:" + key +  ", index position:+ ((key ^ (key >>> 16)) & 15)); It can be found that the index position of these numbers is 1 instead of 0 from 65536, which is a little special. Then the two adjacent index positions increase by 1,3. To be specific, try to draw I: 65533, output 13i: 65534, output 14i: 65,535, output 15 I: 65536, output 1 I: 65537, output 0 I: 65538, output 3 I: 65539, output 2 I: 65540, output 5 I: 65541, output 4 I: 65542, output 7Copy the code

This code mainly calculates the index position. The length of the underlying array of the HashMap is always 2 to the power of n

When length is always a multiple of 2, h& (length-1) will be a very clever design:

Hash value	Length (assuming length is 16)	h & length – 1
5	16	5
6	16	6
15	16	15
16	16	0
17	16	1

You can see that the computed index value is always within the index of the table array. And it’s usually fairly evenly distributed

4.4 Tree-forming treeifyBin()

Before jdK8, if there are frequent collisions, the search time is O(1) + O(n). If n is large, the search performance is severely affected. In JDK8,O(1) + O(logN) is introduced.

In JDk1.8, if the number of elements in a bucket exceeds TREEIFY_THRESHOLD(8), replace the linked list with a red-black tree to improve speed (mainly lookup)

// Change all list nodes in the bucket to red-black tree nodes final void treeifyBin(Node<K,V>[] tab, int hash) { int n, index; Node<K,V> e; // If the current hash table is empty or the MIN_TREEIFY_CAPACITY element defaults to 64, it can be assumed that if the node array length is less than 64, there is no need to perform structural transformation, but resize() so that the elements of the original list may be reassigned.if(tab == null || (n = tab.length) < MIN_TREEIFY_CAPACITY) resize(); If the value is greater than or equal to 64, the list becomes a tree nodeelse if ((e = tab[index = (n - 1) & hash])! = null) { TreeNode<K,V> hd = null, tl = null; // Define the first and last nodesdo{ TreeNode<K,V> p = replacementTreeNode(e, null); // Common nodes -> tree nodesif(tl == null) // If the tail node is empty, there is no root node hd = p; // The first node (root node) points to the current nodeelse{// the tail node is not empty p.rev = tl; Tl. next = p; // the last node points to the current node} tl = p; }while((e = e.next) ! = null); // The Node object is changed into a TreeNode object, and the one-way list is changed into a two-way listif ((tab[index] = hd) != null)
            hd.treeify(tab);
    }
}
Copy the code

5. Think about

1. What is the difference between HashMap and HashTable

Both HashMap and Hashtable implement the Map interface

A HashMap is functionally almost equivalent to a Hashtable, except that a HashMap is non-synchronized and can accept NULL (a HashMap can accept null keys and values, whereas a Hashtable cannot). HashMap is non-synchronized, whereas Hashtable is synchronized, which means that Hashtable is thread-safe and since Hashtable is thread-safe and synchronized, it is slower than HashMap in a single-threaded environment. If you don’t need synchronization and only need a single thread, using HashMap performs better than Hashtable. A HashMap does not guarantee that the order of elements in the Map will remain constant over time.

HashTable is rarely used today due to performance issues and the fact that HashTable handles Hash collisions much less well than HashMap. However, due to thread safety and the use of previous projects, SUN still maintains that it is not Deprecated.

Excerpt from hashtable source

If a thread-safe implementation is not needed, it is recommended to use HashMap in place of Hashtable. If a thread-safe highly-concurrent implementation is desired, then it is recommended to use java.util.concurrent.ConcurrentHashMap in place of Hashtable.

In short, thread-safe is not required, so use HashMap, or ConcurrentHashMap if thread-safe is required.

2. Why is HashMap thread safe, and what to do if you want to be thread safe

If two threads perform a put operation at the same time, they may not calculate the correct size. It is worth saying that in jdK1.8, multi-thread PUT even causes a closed loop. 1.8 will not have this problem initially but there will still be thread safety issues.

Closed loop dead loop before JDK8.

This problem does not exist in single-threaded scenarios, but in multi-threaded scenarios it can cause an infinite loop that leads to excessive CPU usage.

If hash conflicts are large, multiple nodes under the same linked list are prone to this problem. Refer to the old cliche, the endless loop of hashMaps

1. Use ConcurrentHashMap. thread-safehashMap) 2, the use of the Collections. SynchronizedMap (Mao < V > K, m) methods the HashMap Map into a thread safe.Copy the code

3. How does HashMap resolve Hash conflicts

In practice, no matter how to construct hash function, conflict is difficult to avoid completely. In jdK8, if the length of the list is greater than 8 and the length of the array is greater than 64, all nodes in the list are converted to a red-black tree, and the node on the array is the root nodehashConflicting elements in a linked list can be determined using the key's equals() method.Copy the code

4. How is HashMap expanded

Let’s write an example to test whether hashMap is expanding.

public static void main(String[] args) throws NoSuchMethodException, InvocationTargetException, IllegalAccessException { HashMap<Integer,String> o = new HashMap<>(1); System.out.println(o.size()); If the initial capacity is not defined, the capacity is expanded to 16 by default. If the capacity is not expanded according to the put condition. // If the size of an element inserted during put is greater than the threshold (loading factor * latest capacity) /** * The code is executed after PUTif(++size > threshold) * resize(); 2 4 5 11 111 11 HashMap<Integer,String> map = new HashMap<>(1); map.put(1,"一"); // Since the method is final, we use reflection to get the capacity value Class<? > mapType = map.getClass(); Method capacity = mapType.getDeclaredMethod("capacity");
    capacity.setAccessible(true); // The capacity method gets system.out.println ("capacity : " + capacity.invoke(map)); //capacity : 2
 
    map.put(2, "二");
    capacity = mapType.getDeclaredMethod("capacity");
    capacity.setAccessible(true);
    System.out.println("capacity : "+ capacity.invoke(map)); // Capacity: 4 The current capacity is 2. After the element is inserted, the size is 2 > 2 * 3/4. Start expanding the capacity."Three");
    capacity = mapType.getDeclaredMethod("capacity");
    capacity.setAccessible(true);
    System.out.println("capacity : "+ capacity.invoke(map)); // Capacity: 4 The current capacity is 2. After the element is inserted, the size is 3 = 4 x 3/4."Four");
    capacity = mapType.getDeclaredMethod("capacity");
    capacity.setAccessible(true);
    System.out.println("capacity : "+ capacity.invoke(map)); // Capacity: 8 The current capacity is 4. At this time, there are 4.Copy the code

In the above example, you can see that hashMap does expand after put. The expansion mechanism of hashMap is different from that of other sets. It expands by multiplying the number of hash buckets by two

The hashMap is augmented by the resize() method

Suppose the hash of oldTable key is 15,7,4,5,8,1, and hashMap is an array bucket with an initial capacity of 8. The storage location is as follows

index	0	1	2	3	4	5	6	7
hash	8	1			4	5		7, 15

When put a new element is assumed to be 9, and the load factor uses the default 0.75, the new storage location in the memory space is as follows

index	0	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15
hash		1			4	5		7	8	9						15

You can see that after the expansion, 8 goes to position 9, 15 goes to position 16, and the old 8, 1, 4, and 5 have only one node on their respective lists

According to E.hash & (newcap-1) equivalent with 15, are for themselves so the position remains unchanged

But if there is more than one node on the list, like the 7,15 locations above

E. hash & oldCap is used to determine whether the element needs to be moved in the array

Let’s say 7 &8 = 0111&1000 = 0; 15&8 = 1111&1000 = 1, the rule is that the first one with the higher order let’s say 15 is the higher order, the first one is 1, if the higher order is 1 then the result after that is also 1

When e.hash & oldCap == 0

The node positions on the linked list remain the same

When e.hash & oldCap == 1

The node position on the list is the old index + oldCap, for example, 15. The new index position is 7+8, which is 15

The resize() method in jdk1.8 is a bit better than the resize() method in JDK1.7. The rehash method in JDK1.7 is a bit better than the resize() method in JDk1.8. Jdk8 is more efficient by dividing the previously conflicting nodes evenly into new buckets with values of 0 and 1 via e.hash & oldCap.

See the code [4.4.5 resize() method]

5. Why is the loadFactor 0.75 F

The load factor is a measure of how full a hash table can be before its capacity is automatically increased. It measures how much space is used in a hash table. A larger load factor means a higher degree of loading of the hash table, and vice versa. Simply put, if the loading factor is too small, the space utilization is low, and the capacity expansion is too easy, which is not friendly to performance. If the setting is too high, the conflict probability is high if the capacity expansion is not done in time, which increases the query cost. Therefore, 0.75 is a very suitable value. Through experiments, under ideal conditions, using random hash code, the frequency of node occurrence follows poisson distribution in the Hash bucket [occurrence probability is high near the frequency, and decreases symmetrically to both sides]. See why the default load factor in HashMap is 0.75

6. What types of elements are commonly used as keys in a hashMap, and why?

The main reason keys such as String and Integer are often used is because they are Immutable. The wrapper class specification for String and primitive types overwrites the hashCode() and equals() methods. As immutable classes, they are inherently thread-safe and can be optimized for caching hash values, avoiding double-counting, etc. If you use mutable object types, you may not be able to query them if put in. If you want to use a custom type as a key, the equals() and hashCode() methods are defined immutable, and the object does not change after being inserted into the map.

Can a HashMap key be a mutable object?

7. Why use transient to modify bucket array table

transient Node<K,V>[] table;
Copy the code

In Java, variables decorated with the TRANSIENT keyword are not serialized by the default serialization mechanism.

HashMap implements Serializable interface, by implementing readObject/writeObject two methods to customize the contents of the Serializable, size need not be said, generally related to the size can be directly calculated without serialization.

Why not serialize the table? The reasons are the

1. The table cannot be filled up in most cases. For example, if the bucket array size is 16 and only one element is put, this will serialize unused parts. Waste.

2. The same key-value pair may reside in different buckets under different JVMS, and deserialization errors may occur under different JVMS. (The get/put/remove methods of a Hashmap initially hash the bucket location of the key, which is the array subscript, but if the key does not override the hashCode method, the Object’s hashCode method is called. Object’s hashcode method is navtive(native), where hashcode is an int that maps the Object’s memory address. It is not clear how to calculate the result, but different JVMS may have different implementations of Hashcode that produce different hashes.

8. If the key of a HashMap is null, how can I find the value

P = TAB [I = (n-1) &hash = 0; p = TAB [I = (n-1) &hash = 0;

static final int hash(Object key) {
    int h;
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}
Copy the code

6. Usage suggestions

1. By default, the HashMap has a capacity of 16, but if the user specifies a number as a capacity through the constructor, the Hash selects the first power of 2 greater than that number as the capacity. (1->2, 7->8, 9->16)

When initializing a HashMap, you should specify its size as much as possible. Especially if you know the number of elements in your map. (Alibaba Java Development Protocol)

Here you can see the four constructors of a hashMap. Generally 3 is used, but if you already know the number, 2 is recommended (loading factor 0.75 is appropriate and not recommended).

Public HashMap(int initialCapacity,float loadFactor) {
    if (initialCapacity < 0)
        throw new IllegalArgumentException("Illegal initial capacity: " +
                                           initialCapacity);
    if (initialCapacity > MAXIMUM_CAPACITY)
        initialCapacity = MAXIMUM_CAPACITY;
    if (loadFactor <= 0 || Float.isNaN(loadFactor))
        throw new IllegalArgumentException("Illegal load factor: "+ loadFactor); this.loadFactor = loadFactor; this.threshold = tableSizeFor(initialCapacity); Public HashMap(int initialCapacity) {this(initialCapacity, DEFAULT_LOAD_FACTOR);} public HashMap(int initialCapacity) {this(initialCapacity, DEFAULT_LOAD_FACTOR); } //3 The most commonly used no-argument constructor is publicHashMap() { this.loadFactor = DEFAULT_LOAD_FACTOR; // all other fields defaulted} //4 Public HashMap(map <? extends K, ? extends V> m) { this.loadFactor = DEFAULT_LOAD_FACTOR; putMapEntries(m,false);
}
Copy the code

So let’s talk about tableSizeFor. Briefly describe what this method does:

If the custom capacity size is small (tune the 1 or 2 constructor), pass in an initial capacity size greater than the input parameter and the nearest integer power of 2. For example, 10 returns 16,75 returns 128

Disadvantages of not doing so

Let’s say the HashMap needs to place 1024 elements, and since the initial capacity is not set, the capacity is forced to expand seven times as the elements continue to grow. The resize process requires reconstructing the hash table, which can have a significant impact on performance.

/**
 * Returns a power of two size for the given target capacity.
 */
static final int tableSizeFor(int cap{/ /capThe purpose of minus one is ifcapIt's a power of two and if you don't do -1 then when you do the right shift, it's going to return twice what it was worth. If n is 0, that iscapCapacity =1 int n =1 int n =1 int n =1 int n =1cap - 1;
    n |= n >>> 1;
    n |= n >>> 2;
    n |= n >>> 4;
    n |= n >>> 8;
    n |= n >>> 16;
    return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
}
Copy the code

Explain this code

In Java, | = is used to compare two objects are equal

A | = b is the bitwise or a and b and then assigned to a

Taking 10 as an example, the overall process is as follows

Algorithm process

In simple terms, this operation will eventually result in 1 filling its own space, say 250, with binary zero

11111010, after the above or, will eventually become 11111111, in which case plus 1 is greater than the least quadratic power of this number.

7. To summarize

The design and implementation of HashMap is quite clever. Jdk8 has a lot of improvements, and the understanding of HashMap has only scratched the surface before writing this blog. I found there are a lot of knowledge in it after reading the source code, because my level is limited, although spent a lot of time to write a lot but there are many details not understand, such as red and black tree code implementation details, there may be a few places describe errors or does not reach the designated position, if the article is wrong, please correct me, so that I change in time and learning.

8. Reference links

HashMap (JDK1.8)

HashMap resize

What is the hash method of HashMap in JDK source code

HashMap infinite loop problem

Why are threads unsafe in JDK8

Yq.aliyun.com/articles/65…

HashMap source code analysis, expansion mechanism and its thinking