Analyze the implementation principle of HashMap in depth

Public parameters

Load factor: 0.75

Why 0.75?

The tradeoff between time and space. If it is 1, it increases hash collisions, increasing the complexity of the red-black tree. If it is 0.5, hash collisions are reduced and more space is wasted.Copy the code

Source code says that when the load factor is 0.75, the space utilization is relatively high, and avoids quite a lot of Hash conflicts, making the bottom linked list or red black tree height is relatively low, improve the space efficiency.

Initial capacity: 16

If you specify the capacity, it becomes his power of two. (For the sake of performance, try to estimate the size in advance, and consider that the actual element size should be less than HashMap calculation 2 exponential *0.75, otherwise it is easy to trigger the expansion mechanism)

Why exponential capacity of 2, and double capacity?

Index calculation: Num & (length-1) = num % length is valid when length is a power of 2, and bit operation is more efficient

Lazy loading (delayed loading)

The put() call checks whether the array is empty and initializes if it is.

JDK1.7 and before

Insert mode: head insert

Why the head? In the case of general use without expansion, the head plug is convenient and there is no need to traverse the linked list.

Hidden danger: concurrent occurrence of circular linked lists

Data structure: array + linked list
Nodes: Entry

hash()

High-low perturbation computations reduce the probability of hash conflicts.

    h ^= (h >>> 20) ^ (h >>> 12);
    return h ^ (h >>> 7) ^ (h >>> 4);
Copy the code

What’s this? Index = hash&length-1, all hashes have the same low value, but different high values result in hash conflicts, which guarantees performance.
Put () procedure:

1. Check whether the current array needs to be initialized. 2. If the key is empty, put a null value into it. 3. Calculate the Hashcode based on the key. 4. Locate the bucket based on the calculated HashCode. 5. If the bucket is a linked list, check whether the hashcode and key in the bucket are equal to the incoming key. If so, overwrite the bucket and return the original value. 6. If the bucket is empty, no data is stored in the current position. Add an Entry object to the current location. 7. Determine whether to expand the capacity when calling addEntry to write Entry. Double the size if necessary, and rehash the current key and position it. CreateEntry passes the bucket with the current location to the new bucket, and if the bucket has a value, it forms a linked list of locations.

Get () process:

The HashCode is first calculated based on the key, and then located in the specific bucket. Determines whether the location is a linked list. Returns values based on keys and whether their hashcode is equal if they are not linked. For a linked list, you iterate until the key and hashcode are equal and return the value. Return null without retrieving anything.Copy the code

Expansion Opportunity:

Judge expansion before insertion.

Why this order? If the JDK7 header is inserted first and then expanded, it is not necessary to insert the element and rearrange it during expansion.

(size>=threshold)&&(null ! =table[bucketIndex]) 1. When storing the new value, the number of existing elements must be greater than or equal to the threshold. 2.
Rehash: This definition refers to the recalculation of indexes during expansion

When capacity expansion occurs, the original objects are re-allocated with the hash value and added to the new bucket.

Int I = indexFor(e.hash,newCapacity);

Objective: To solve the problem that some linked lists are too long and time complexity O(n)=n due to the increase in number

Starting with JDK1.8
- Insert mode: tail insert
Why the tail? When resize() is used, the header insertion mode changes the order of recalculating index positions of elements in the same Entry chain, resulting in concurrency problems and the formation of circular linked lists. Tail insertion, the expansion will keep the original order of the linked list elements, there will be no linked list ring problem.

However, put and GET do not have locking mechanism, still cannot guarantee the security of multithreading.
Data structure: array + linked list + red-black tree

Red-black tree time complexity O(logn)
Node, the Node
Calculate the hash:

First of all, in terms of high order perturbation, it’s just a simple h = h ^ (h >>> 16), without doing that much more perturbation, and you get the hash value. Second, we removed indexFor from the index function, and instead used put,get, and so on. You can see that there are two lines in all of these functions. In my own understanding, because red-black trees are optimized for a lot of conflicts and long chains, there is no need to do as much high-low perturbation. Conflicts can be handled.
Put () procedure:

1. Check whether the current bucket is empty and initialize it.

2. Calculate the hashcode of the key and locate the bucket. If the value is null, there is no hash conflict and a new bucket is created.

3. If the bucket is not empty (hash conflict), compare the key in the bucket and whether the hashcode of the key is equal to the written key. If the key is equal, assign the value to e.

4. If no, if the bucket is a red-black tree, data is written in red-black tree mode.

5. If the bucket is a linked list, encapsulate the current key and value into a new node and write it to the end of the bucket

6. Then determine whether the size of the current list is greater than the preset threshold. If so, it will be converted to a red-black tree.

7. If the same key is found during the traversal, exit the traversal. 8. If E! = null equals the existence of the same key, which needs to be overwritten. 9. Determine whether to expand the capacity. (Inserted size> threshold)

Get () process:

The key is first hash and the bucket is retrieved. If the bucket is empty, null is returned. Otherwise, it checks whether the key in the first position of the bucket (such as a linked list or red-black tree) is the query key and returns value directly. If the first one does not match, it is determined whether the next one is a red-black tree or a linked list. The red-black tree returns the value as the tree looks up. Otherwise, the match returns are iterated through in a linked list.Copy the code

Expansion Opportunity:

Insert the disk first and expand the disk later

Why this order? Because JDK8 tail, if the first expansion, and insert, but also through the element, expansion again, there is no need to go through twice.

Capacity expansion in two cases: 1. 2. The size after insertion is greater than the threshold.
Rehash:

Instead of recalculating the hash, we use the hash value and the length of the array:

E.hash &oldCap if the result is equal to 0, the position is equal to the original index + the original array length
Tree change mechanism

Threshold: The length of the current list is greater than 8

Why eight? The source code says that tree nodes are rarely used in conjunction with the well-distributed hashCode. In addition, under ideal conditions, affected by randomly distributed hashCode, the nodes in the linked list follow the Poisson distribution, and according to statistics, the probability that the number of nodes in the linked list is 8 is close to one in 1,000, and the performance of the linked list is already very poor. That’s why it’s rare and extreme to turn a list into a red-black tree. Because the conversion of linked list to red black tree also needs to consume performance, special case special treatment, in order to save performance, under the balance, use red black tree, improve performance. In most cases, the HashMap is still a linked list. If the distribution is ideal and the number of nodes is less than 8, the HashMap will automatically expand.

Condition: Check whether the length of the table is greater than 64 and the length of the linked list exceeds the threshold
Tree degradation mechanism

Threshold: The number of nodes in the current tree is less than 6

Why 6? Avoid switching back and forth.

Because tree nodes take up twice as much space as regular nodes, they are used only when there are enough nodes. In other words, when there are fewer nodes, although red-black trees are better than linked lists in terms of time complexity, red-black trees take up more space. Considering comprehensively, it is believed that red-black trees can only be used when there are too many nodes and the disadvantage of red-black trees taking up more space is not obvious.

Condition:

1.remove():

If the root node of a red-black tree is empty or the right node of root, the left node of root, and the left node of the root left node are empty, the tree is small

2.resize():

When the red-black tree node element is less than or equal to 6 (only resize() uses this 6)

HashMap is different from HashTable

The parent class is different

HashTable: Inherited from Dictionary (deprecated)

HashMap: Inherits from AbstractMap class

However, they both implement the map, Cloneable, and Serializable interfaces simultaneously.

Hashtable provides two more elments() and contains() methods than HashMap.
```
The elments() method inherits from Hashtable's parent class Dictionnary. The elements() method is used to return an enumeration of the values in this Hashtable.Copy the code
```
The contains() method determines whether the Hashtable contains the value passed in. It serves the same purpose as containsValue(). In fact, contansValue() simply calls the contains() method.
Null value problem

HashTable: Cannot have null values for null keys

HashMap: Can have a null value and supports null keys.

When the get() method returns a null value, the key may not exist in the HashMap, or the value corresponding to the key may be null. Therefore, instead of using the get() method to determine the presence of a key in a HashMap, use the containsKey() method.
Thread safety

HashTable: Thread-safe, with the Synchronize method added to each method.

But basically due to performance problems, has been deprecated. ConcurrentHashMap Because ConcurrentHashMap uses a piecewise lock, it does not lock the entire data.

HashMap: Single thread performance is better, multithreading is unsafe and can cause deadlocks.
The traversal is different
Different initial capacity

The initial length of the Hashtable is 11, and each subsequent expansion is 2n+1 (n is the last length).

The initial length of a HashMap is 16, and it doubles that size each time it expands

When created, if given an initial capacity value, the Hashtable will use the size you gave it, and the HashMap will expand it to a power of two.
Different hash methods are used to calculate the hash value

To get the position of an element, you first need to compute a hash value based on the element’s KEY, and then use that hash value to compute the final position

Hashtable uses the object’s hashCode directly. HashCode is a value of type int computed by the JDK from the address of the object or from a string or number. The final position is then obtained using the divisor residue. But division is very time consuming. Efficiency is very low

In order to improve the efficiency of the calculation, the size of the hash table is fixed to a power of 2, so that in the modulus budget, there is no need to do division, only need to do bitwise operation. Bit operations are much more efficient than division.

The principle of ConcurrentHashMap

jdk1.7

HashMap consists of an array of segments and a HashEntry list.

Segment lock technology, where Segment inherits from ReentrantLock. ConcurrentHashMap supports concurrent CurrencyLevel (number of segments). Every time a thread holds a lock on one Segment, it does not affect the other segments.
jdk1.8

Abandoned the original Segment Segment lock, Node into Node, array plus linked list + red black tree, and CAS + synchronized to ensure concurrency security.

Analyze the implementation principle of HashMap in depth

Public parameters

JDK1.7 and before

Starting with JDK1.8

HashMap is different from HashTable

The principle of ConcurrentHashMap

Related Posts

JDK1.8 hashMap optimization

Silently read the design pattern-adapter pattern

Python coroutines in detail