The design and optimization of the HashMap | topic writing

This article has participated in the third “Topic writing” track of the Nuggets Creators Training Camp, check out the details:Digg project | Creator Boot Camp phase 3 is underway, “write” to make a personal impact.

📖 preface

With fireworks to make a living with poetry to love

Whether I go to record or not, in the past and future infinite years, there will still be someone to rise, someone to sink, someone to become a hero, someone to play a clown, someone to stand up, someone lost to sink. But you only need — “self-discipline in prosperity, self-healing in abjuration, do not abandon conscience on the way to make a living, do not lose on the way to love the most strict, this time I stand in the wind, let you fog everywhere.”

🚀 `HashMap`Design principle of

1. Design idea of HashMap:Copy the code

A Map is a container that stores data in key-value pairsHashMapI’m using the keyKey 的 hashcodeOrganize storage by value, making it very fast and efficient to work by key valuekeyAccess to data.
For key-value pairs,HashMapInternally it will be encapsulated into a correspondingEntryObjects, that is,EntryObject is an organization of key-value pairs;
For each object,JVMI’m going to generate one for ithashcodeValue.HashMapStoring key-value pairsEntryWill be based onKey 的 hashcodeValue, in some sort of mapping, determines how the key-value pair should be pairedEntryStored in theHashMapAt what position in;
When usingKeyValue when taking data, and then according toKeyThe value of thehashcode 和 Internal mapping condition, directly locate toKeyThe correspondingValueWhere can values be stored so efficientlyValueValue.

💖 `HashMap`Design of hash function

The hash function takes the hashcode of the key, which is a 32-bit int, and then xor the high and low 16 bits of the Hashcode

Also known as the perturbation function, and there are two reasons for this design

Be sure to keep it as low as possiblehashCollision, the more scattered the better;
The algorithm has to be as efficient as possible, because it’s high-frequency, so it’s bitwise;

👏 What is Hash?

A Hash is a Hash algorithm that transforms an input of arbitrary length into an output of fixed length, which is the Hash value. Ok, this definition might be a bit abstract. Let's explain it sentence by sentence with the example of plan TWO mentioned above.Copy the code

For example, a person’s name is an arbitrary length of input. Z – zhang SAN 11; L- Li Si 22; W – fifty and 33
And then, go straight to the Z category and find it. But if there are a lot of people whose names start with Z, such as Z-Zhao Liu44, Z-Zhou Qi55, when I save more phones, I also need to find several times. I do a sorting by the uppercase of the first letter of the name in a way we can think of as a hash algorithm.
Finally, I divided the book into initials and recorded phone numbers — fixed-length output, meaning I had all the phones in 26 categories. So these 26 letters are hashes.

🐱 🏍 `HashMap`Data structures for JDK 1.8

This article is aboutJDK1.8Version, internal use array + linked list red black tree

inJdk1.8 中 HashMapSome changes have been made to the implementation of the, but the basic idea remains the same, just in a few placesTo optimize theHere’s a look at some of the changes:Data structures are stored by array + linked listChange toArray + linked list + red-black tree storage, when the list length exceedsThreshold (8)When willLists are converted to red-black trees. Performance has been further improved.
Array: The array storage interval is continuous, occupying a large amount of memory, so the space complexity is very large. But the binary search time complexity of array is O(1). The characteristics of arrays are: easy to address, insert and delete difficult; Space complexity: refers to the memory space required to execute the algorithm. Time complexity: The computational effort required to execute the algorithm
Linked list: Linked list storage interval is discrete, occupies relatively loose memory, so the space complexity is small, but the time complexity is large, up to O (N). Linked lists are difficult to address and easy to insert and delete.
Red and black tree:

😎 `JDK1.8` 对 `HashMap`What are the major optimizations?

Array + list changed to array + list or red-black tree;
The insertion method of the linked list is changed from ab initio insertion method to tail insertion method. Simply speaking, when inserting, if there are already elements in the array position, 1.7 place the new element into the array, and the original node is the successor node of the new node. 1.8 traverse the linked list and place the element to the end of the linked list.
For expansion, 1.7 needs to rehash the elements in the original array to the new location. 1.8 adopts simpler logical judgment, with unchanged position or index + old capacity.
1.7 Determine whether to expand the capacity and then insert the disk. 1.8 Determine whether to expand the capacity after the disk is inserted.

✨ why do these points of optimization

Reduce the time complexity from O(n) to O(logn) to prevent hash conflicts.
Because in the expansion of 1.7 header insertion method, the header insertion method will reverse the linked list and generate rings in multi-threaded environment; Thread A is inserting node B, and thread B is inserting as well. When the capacity is insufficient, thread B begins to expand, hash again, place elements, adopt the head insertion method, and put the head of node B after traversing, thus forming A ring, as shown in the following figure:

The transfer code is called for expansion in 1.7, as shown below:

void transfer(Entry[] newTable, boolean rehash) { int newCapacity = newTable.length; for (Entry<K,V> e : table) { while(null ! = e) { Entry<K,V> next = e.next; if (rehash) { e.hash = null == e.key ? 0 : hash(e.key); } int i = indexFor(e.hash, newCapacity); e.next = newTable[i]; NewTable [I] = e; newTable[I] = e; e = next; }}}Copy the code

Why does 1.8 not need to be reconfigured for expansionhashYou can directly locate the original node in the new data position
- This is because the expansion is twice the size of the original array, and the mask used to calculate the location of the array is only an extra 1. How to solve this problem?
- Before expansion, the length is 16, and the binary n-1 used to compute (n-1) & hash is 0000 1111. After expansion to 32, the binary is one higher, 0001 1111.
- Since it is an ampersand, 1 and any number ampersand is itself, there are two cases, as shown in the following figure: the case of the original hashcode where the 4th bit is 0 and the 4th bit is 1;
- The fourth bit is 0, the rehash value remains the same, and the fourth bit is 1, the rehash value is 16 larger than the original value.

🎉 finally

For more references, see here:Chen Yongjia’s blog
Like the small partner of the blogger can add a concern, a thumbs-up oh, continue to update hey hey!

The design and optimization of the HashMap | topic writing

📖 preface

🚀 HashMapDesign principle of

💖 HashMapDesign of hash function

👏 What is Hash?

🐱 🏍 HashMapData structures for JDK 1.8

😎 JDK1.8 对 HashMapWhat are the major optimizations?

✨ why do these points of optimization

🎉 finally

Related Posts

Eclipse connection to MySql database summary

J.U.C ConcurrentHashMap red-black tree conversion

Linux file systems and inodes

🚀 `HashMap`Design principle of

💖 `HashMap`Design of hash function

🐱 🏍 `HashMap`Data structures for JDK 1.8

😎 `JDK1.8` 对 `HashMap`What are the major optimizations?