preface
Opening diagram
The main points of

Java8 makes changes to the HashMap of Java7, the biggest difference being the use of redblack trees.

In The Java7 structure, when looking for data, we quickly locate the index of the array based on the hash value. The speed of the query depends on the length of the linked list, and the time complexity is O(n).

To reduce the problems in 2, in Java8, lists are converted to redblack trees when the number of lists is greater than 8. So when you look up data in a redblack tree, the time complexity changes order logN.
structure
chart
describe

Arrays hold nodes.

Node if it’s a linked list, TreeNode if it’s a redblack tree.

In Node, keyt, Value, Hash, and next are the same as in Java7.

We determine whether it is a redblack tree or a linked list based on the type of nodes stored in the array
A constructor
public HashMap(int initialCapacity, float loadFactor) {
if (initialCapacity < 0)
throw new IllegalArgumentException("Illegal initial capacity: " +
initialCapacity);
if (initialCapacity > MAXIMUM_CAPACITY)
initialCapacity = MAXIMUM_CAPACITY;
if (loadFactor <= 0  Float.isNaN(loadFactor))
throw new IllegalArgumentException("Illegal load factor: " +
loadFactor);
this.loadFactor = loadFactor;
this.threshold = tableSizeFor(initialCapacity);
}
Copy the code
This constructor initializes the threshold and load factor.
In the constructor, we do not specify the size of the HashMap. Even if we use the HashMap(int initCapacity) constructor, the result of the number operation is only the initialization threshold, and the internal array is not immediately constructed. The initialization threshold is used to facilitate initialization of arrays during subsequent PUT. The details are left to the reader below, but the inner array is not initialized at constructor execution.
tableSizeFor
The function simply takes the incoming number up to the nearest power of 2 (including equal). For example, if 15 is passed in, 16 is returned; Pass in 18, return 32. Pass in 32, return 32. Let’s see how he does it!
static final int tableSizeFor(int cap) {
int n = cap  1;
n = n >>> 1;
n = n >>> 2;
n = n >>> 4;
n = n >>> 8;
n = n >>> 16;
return (n < 0)?1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
}
Copy the code
explain
Before explaining the source code, let me say that passing in a cap, which may not be a power of 2, is to find the smallest power of 2 greater than or equal to cap
How do I find it? Let’s look at the example we started with (181 first, and we’ll talk about this 1 later).
0000000000000000, 00000000, 0001 0001 Decimal system: 17
00000000000000000000, 00000000, 0010 0000 Decimal: 32
Now let’s see what 32 minus 1 is, compared to 32 and 17
00000000000000000000, 00000000, 0010 0000 Decimal: 32
0000000000000000, 00000000, 0001 1111 Decimal system: 31
0000000000000000, 00000000, 0001 0001 Decimal system: 17
We can see that if we change the last five bits of 17 to 1, it becomes binary 31, and if we add 1 to it, it becomes binary 32, and that’s what we want! In short, the purpose of this function is to start from the left 1 and go all the way to the right, and then +1 will do what we want!
process
n=n>>>1
Since n is greater than 0, there must be a 1 at the top of the binary, so if you move an unsigned 1 to the right, you must be closer to the original number and the next number becomes 1. Such as 10 XXXX, then the n  = n > > > 1, must be 11 XXXX.
n=n>>>2
In the example above, n has changed from 10xxxx to 11xxxx. So we’re going to make the following XXXX continue to be 1, so we’re going to make the first two digits of XXXX continue to be 1, so we’re going to make the first two digits of XXXX continue to be equal to 11. So if you move two unsigned Spaces to the right and align with yourself, you go from 11xxxx to 1111xx.
So now we have four ones, so we’re going to move by four. If you change it to eight ones, you move it eight bits….
And so on.
To change the 32 bits to all 1s, change the first 16 bits to all 1s, and then move the next 16 bits to the right to make all 32 bits to all 1s.
MAXIMUM_CAPACITY(1<<30); MAXIMUM_CAPACITY(1<<30);
Let me show you a picture, maybe it will be clearer. In fact, it is very clear!
summary
So here’s why you have to subtract 1 when it comes in. As we all know,
You should understand the above explanations and examples. If you go from 1 to the left of binary to the right, it’s going to be 1. If you pass in exactly two to the m, then the binary of n will be one plus m ones, and when you return plus one, it will be one plus m plus one zero. So it’s twice the number that came in. For example, if you pass in 32, the binary number is 100000. the 1 is followed by five zeros. If n is changed to 111111 and +1 is returned, it returns 1000000, which is 64 in decimal. This is clearly against our logic.
And anything else that’s not a power of two, whether you subtract 1 or not, as long as the 1 on the left doesn’t change, whatever’s on the right is going to be greater than or equal to the smallest power of two that’s passed in.
To sum up, the cap1 passed in is to prevent the cap passed in from being an exact power of 2, so as to avoid doubling when it is returned later.
Put
public V put(K key, V value) {
// For hash functions, we'll see later
return putVal(hash(key), key, value, false.true);
}
//putVal
//onlyIfAbsent True indicates that the key does not exist and the key does not exist
// false: the key exists and is stored, but overwrites and returns the old value
final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
boolean evict) {
Node<K,V>[] tab; Node<K,V> p; int n, i;
// The first put executes the resize() in the following if
// The first resize is the equivalent of initialization, usually set to 16.
if ((tab = table) == null  (n = tab.length) == 0)
n = (tab = resize()).length;
// For the first expansion, (n1) &hash is the same as modulo n of hash
// If you take the hash value modulo 15, you can get a random index
// If there is no value for this position, then simply initialize Node and place it where the hash is mapped
if ((p = tab[i = (n  1) & hash]) == null)
tab[i] = newNode(hash, key, value, null);
// The hash mapped data already has nodes
else {
Node<K,V> e; K k;
// If the first node is the one we are looking for, then e executes the first node
if(p.hash == hash && ((k = p.key) == key  (key ! =null && key.equals(k))))
e = p;
// If the first node is the root of a redblack tree, the redblack tree put operation is called
else if (p instanceof TreeNode)
e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
// At this point, the desired key may be placed after the first node or the key may not exist in the list
else {
// The binCount count is used to record whether eight nodes have been reached to transform into a redblack tree
for (int binCount = 0; ; ++binCount) {
// If the last node is reached, the key does not exist
if ((e = p.next) == null) {
// The new key node is placed after the last node in the list, and e is already null
p.next = newNode(hash, key, value, null);
// If the number of nodes reaches 7, then the number of nodes becomes 8, and then the tree is transformed into a redblack tree
if (binCount >= TREEIFY_THRESHOLD  1) // 1 for 1st
treeifyBin(tab, hash);
// There are two cases: 1 is successfully added to the end of the list and the total number does not exceed 8, 2 is added to the end of the list and the total number reaches 8, then it becomes a redblack tree and exits the loop
break;
}
// If the key already exists in the list, exit the list and overwrite it later
if(e.hash == hash && ((k = e.key) == key  (key ! =null && key.equals(k))))
break;
// Here is the operation that has not been found, traversing the listp = e; }}// If e is not null, the put key is already in the list.
// If it is null, it is not in the previous list and has been added to the end of the previous list
if(e ! =null) {
V oldValue = e.value;
//1 onlyIfAbsent, as mentioned earlier, is false to override the old value
//2 or no value before
//1 or 2 execute the following if
if(! onlyIfAbsent  oldValue ==null)
e.value = value;
// This function is only used in LinkedHashMap, which is empty
afterNodeAccess(e);
return oldValue;
}
}
++modCount;
// If the threshold is exceeded after the node is added, the capacity is expanded
if (++size > threshold)
resize();
// This function is only used in LinkedHashMap, which is empty
afterNodeInsertion(evict);
return null;
}
Copy the code
The hash method
static final int hash(Object key) {
int h;
return (key == null)?0 : (h = key.hashCode()) ^ (h >>> 16);
}
Copy the code
In Java, a hash function is a native method defined in the Object class, so all objects inherit it.
public native int hashCode(a);
Copy the code
Because this is a native method, the exact implementation is uncertain, but as you can see from the function signature, the method maps any object to an integer value. Calling this method completes the mapping of Object > int
We see this in the hash implementation

If the key is null, the value is placed at the first position in the array.

If the key is not null, then the key’s hashCode is moved 16 bits to the right and then xor with itself.
You may have some questions about point 2
That is, by making the high and low bits of HashCode xor, the high bits interfere with the low bits. The goal is to make the array subscripts of hashCode maps more even. The following paragraph is quoted from the forum anonymous user’s explanation, I think the explanation is very detailed
Author: Anonymous user
Link: www.zhihu.com/question/20…
Source: Zhihu
We create a HashMap with an entry array of a default size of 16. Now there is a pair of keys and values that needs to be stored in the hashmap. The hashcode of the key is 0ABC0000 (eight hexadecimal numbers, total 32 bits). If the hashcode is not processed by the hash function, This pair will later be stored at subscript 0 in the Entry array. Subscript = ABCD0000&(161) = 0. And then we’re going to store another pair whose key hashcode is 0DEF0000, and we’re going to get array subscript 0 again. As you can already see, this is a poorly implemented hash algorithm because the 1 bit of HashCode is concentrated in the first 16 bits, resulting in an array with a constant index of 0. Therefore, pairs with very different keys are stored in the same linked list, resulting in slower query in the future. For example, 0ABC0000 becomes A02188B, 0DEF0000 becomes D2AFC70, and their array subscripts are no longer uniform zeros. What a hash function does is it’s easy to draw a graph that affects the value of each place by the value of the other place.
In the source code we see h&(n1) operation, in fact, this is the same as h % n. It is just a very big modulus of the time will affect the efficiency, but through the bit operation is a lot faster!
resize

Resize () is used for initialization and array expansion of the HashMap.

When you expand an array, it’s twice as big

Migrate data
final Node<K,V>[] resize() {
Node<K,V>[] oldTab = table;
// oldTab must be null if it is initialized and not null if it is initialized
int oldCap = (oldTab == null)?0 : oldTab.length;
// Record the previous threshold
int oldThr = threshold;
// Define a new capacity and a new threshold
int newCap, newThr = 0;
// The capacity is already available
if (oldCap > 0) {
if (oldCap >= MAXIMUM_CAPACITY) {
threshold = Integer.MAX_VALUE;
return oldTab;
}
else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
oldCap >= DEFAULT_INITIAL_CAPACITY)
// At this point, the capacity of the front has doubled
// The threshold is also doubled
newThr = oldThr << 1; // double threshold
}
// Call new HashMap(initCapacity), put for the first time
else if (oldThr > 0)
// If the capacity is specified, for example, initCapacity is 22, newCap is 32
newCap = oldThr;
// Call new HashMap(), put for the first time
else {
// The capacity is the capacity specified inside the default class, which is 16
newCap = DEFAULT_INITIAL_CAPACITY;
The default loading factor is 0.75, so the threshold is 16*0.75 = 12
newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
}
// Call new HashMap(initCapacity) or
// New HashMap(initCapacity,loadFactor)
// Because both of the above constructors initialize loadFactor
// Initializes the new threshold based on the new capacity
if (newThr == 0) {
float ft = (float)newCap * loadFactor;
newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
(int)ft : Integer.MAX_VALUE);
}
threshold = newThr;
// Create a new array and assign it to table
@SuppressWarnings({"rawtypes"."unchecked"})
Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
table = newTab;
if(oldTab ! =null) {
// Iterate over the array for data transfer
for (int j = 0; j < oldCap; ++j) {
Node<K,V> e;
// Get the node of the corresponding array position, if null, there is no node
// If not, transfer
if((e = oldTab[j]) ! =null) {
oldTab[j] = null;
if (e.next == null)
// There is only one node in the corresponding position
// This node modulates the length of the new array
newTab[e.hash & (newCap  1)] = e;
// In the case of a redblack tree
else if (e instanceof TreeNode)
((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
else {
// Two linked lists are defined here, lo and hi
// e is the first node of the array
Node<K,V> loHead = null, loTail = null;
Node<K,V> hiHead = null, hiTail = null;
Node<K,V> next;
// The following dowhile is a list of the current position of the list, and then
// According to some rules, put the linked list nodes in different places in the expanded array
do {
// Get the next node of e
next = e.next;
// The following explanation may be a bit difficult to understand, but we will look at the following explanation and then look at here
//1 If the node has the same hash position, the lo list is used to store the node
if ((e.hash & oldCap) == 0) {
if (loTail == null)
loHead = e;
else
loTail.next = e;
loTail = e;
}
//2 then the new location is stored in the hi list
else {
if (hiTail == null)
hiHead = e;
elsehiTail.next = e; hiTail = e; }}while((e = next) ! =null);
// The lo list is not empty, so the old place of the array is the head of LO
if(loTail ! =null) {
loTail.next = null;
newTab[j] = loHead;
}
// If the hi list is not empty, put the hi header in the old position of the array
if(hiTail ! =null) {
hiTail.next = null;
newTab[j + oldCap] = hiHead;
}
}
}
}
}
return newTab;
}
Copy the code
The following diagram illustrates the process of data transfer.
Capacity to explain
Now to explain the resize source code in question
(e.hash & oldCap) == 0.
n = (tab = resize()).length;
if ((p = tab[i = (n  1) & hash]) == null)
We know from these two statements of put that we’re using the hash and the length of the array 1 to figure out where the node is mapped to.
A:
It’s pretty simple. N is the length of the array, and we all know that the length of the array in a HashMap must be m squared of 2. If the length of the array is 16, 16 is 2 to the fourth power. Then 161 = 15 is 4 all ones.

Before the expansion, the hash value obtained by using the hash (key) is combined with (n1) to obtain the position. In the above example, the lower four bits of the hash value are extracted and the result is A. (The result must be between 0 and 15)

If the size of the array is doubled, then the length of the array is 32, and the binary value is 100,000, then the same hash value is used to get the position of (321). So if you take the bottom five bits of the hash value, you get b. (The result must be between 0 and 31)
2:
We know from 1 and 2 that the binary of B has one more bit than the binary of A, and the first four bits are the same. And in binary, it’s either a 0 or a 1.
 If the extra 1 bit is 0, then b and A are the same. For example, if a is 1001 and B is 01001, then a and B are equal, that is, the position of the same node in the new array is the original position of the old array.
 If the extra 1 bit is 1, then b is 2 to the m more than a. For example, if A is also 1001 in decimal, 9, b 11001, and 25 in decimal, then B is 2 to the fourth more than A, and this 2 to the fourth power is exactly the length of the original array oldCap. In other words, the location of the same node in the new array is the original location of the old array plus 2 to the fourth power (the length of the unexpanded array (oldCap)).
So we conclude that as long as we can determine whether the bit where B has more than A is 1 or 0(in this case, bit 5), we can figure out where the same node is in the new array, in a nutshell. After the array is expanded, the same node is either in its original location or added to its original location by the length of the unexpanded array (oldCap).
So how do we figure out what is the digit where B is more than A? It’s easy to hash oldCap with oldCap. For example, if oldCap is 16, then binary is 1 followed by m zeros, i.e. 10000, that is, the m+1 bit is 1. Using the hash value and oldCap, we can find the 5th bit of the hash value, and then we can determine the node’s position in the new array according to the previous “two”.
Readers see here, can continue to go back to the source of 1 to see, this time should suddenly see!
Capacity summary
During capacity expansion, nodes in the original table will be rehash to the new table, but the positions of nodes in the new table are related to each other: either the subscripts are the same or the oldCap is different (the size of the original table).
conclusion
 This article mainly explains the HashMap of some important methods of source code parsing, but also let oneself have a further indepth understanding of HashMap.
 Since HashMap is a threadsafe class, it can be used if it is threadsafe
CocurrentHashMap
orCollections.synchronizedMap
Encapsulate the HashMap object as threadsafe.  The key value of a HashMap is allowed to be null, and it puts the element in the first position of the array it maintains.
 HashMap introduced redblack trees after Java8, which were not there before 1.8. They were simply linked lists, so the 1.8 version is more flexible.
 Through the analysis of the source code to understand a variety of bit operations, but also strengthen their own basic knowledge.
reference
 Blog.csdn.net/fan2012huan…
 www.javadoop.com/post/hashma…