Why is the number of elements in the underlying tree standard of HashMap 8?

General interview questions

The HashMap class is the most commonly used Java collection class in daily development, and it is almost a must to ask for HashMap knowledge in interview.

What is the underlying storage structure of a HashMap?
Is thread safe? Why isn’t it safe?
What is the difference between version 1.7 and 1.8 HashMap? What are the hidden dangers and causes of 1.7?
Is HashCode unique? How do I compare elements when I insert them?
What’s the difference between HashTable and ConcurrentHashMap?

Most of these questions can be answered if you’ve read a few blogs or skimmed the source code. I’ve done a lot of interviews before, and I’ve rarely failed on HashMap.

Turns out, I’m a little young. Sometimes, your answer is not because you know a lot, but because the question is not deep enough, and if you haven’t done the source code in-depth understanding and thinking, people will look at it from a slightly different Angle, you may be confused.

Just like the title, why is the HashMap list tree standard 8? To be honest, although I also knew that the threshold value of tree-tree is 8, I haven’t really thought about why this number is 8. Taking this opportunity, I also combed through the source code of HashMap, and this article is also a summary of some new thinking points.

The basics of HashMap

As a typical K-V storage data structure, the bottom layer of HashMap is composed of array and linked list. When adding new elements, it will find the corresponding “bucket” according to the hash value of the element, that is, the element in Node<K, V> in the source code of HashMap. If the number of linked list elements is too long, it will be converted into a red-black tree (version after JDK1.8).

Why do I want to be a red black tree?

As we all know, the linked list is traversed from the beginning node to the corresponding node, the complexity of this process is O(N), while the red-black tree is based on the binary tree structure, the complexity of searching elements is O(logN). Therefore, when there are too many elements, using red-black tree storage can improve the efficiency of searching.

Since red-black trees are so efficient, why not use red-black trees for storage in the first place? The JDK source code explains this:

* Because TreeNodes are about twice the size of regular nodes, we
* use them only when bins contain enough nodes to warrant use
* (see TREEIFY_THRESHOLD). And when they become too small (due to
* removal or resizing) they are converted back to plain bins.  
Copy the code

A TreeNode takes up about twice as much space as a normal Node, so a TreeNodes is converted only when there are enough Nodes. Enough criteria are determined by the value of TREEIFY_THRESHOLD (the default is 8). When the number of nodes in the bucket becomes smaller due to removal or resize, the red-black tree becomes a regular linked list. The threshold is UNTREEIFY_THRESHOLD (default: 6).

/**
* The bin count threshold for using a tree rather than list for a
* bin.  Bins are converted to trees when adding an element to a
* bin with at least this many nodes. The value must be greater
* than 2 and should be at least 8 to mesh with assumptions in
* tree removal about conversion back to plain bins upon
* shrinkage.
*/
static final int TREEIFY_THRESHOLD = 8;

/**
* The bin count threshold for untreeifying a (split) bin during a
* resize operation. Should be less than TREEIFY_THRESHOLD, and at
* most 6 to mesh with shrinkage detection under removal.
*/
static final int UNTREEIFY_THRESHOLD = 6;
Copy the code

Seeing this, it is not difficult to understand that although red-black tree has higher query efficiency than linked list, but the space occupied by nodes is large, only reach a certain number of tree significance, which is based on the balance of time and space consideration.

Why is the tree standard eight

As for why the number of tree standards is 8, there is a long comment in the source code behind the above note, we can find the answer from that comment:

* usages with well-distributed user hashCodes, tree bins are * rarely used. Ideally, under random hashCodes, the frequency of * nodes in bins follows a Poisson distribution * (http://en.wikipedia.org/wiki/Poisson_distribution) With a * parameter of about 0.5 on average for the default resizing * threshold of 0.75, although with a large variance because of * resizing granularity. Ignoring variance, The expected * occurrences of list size k are (exp(-0.5) * POw (0.5, k) / * factorial(k)). The first values are: * * 0: 0.60653066 * 1: 0.30326533 * 2: 0.07581633 * 3: 0.01263606 * 4: 0.00157952 * 5: 0.00015795 * 6: 0.00001316 * 7: 0.00000094 * 8: 0.00000006 * More: less than 1 in ten millionCopy the code

If hashCode is well distributed, red-black trees are rarely used because the values are evenly distributed and the list is rarely long. Under ideal conditions, the length of the linked list conforms to poisson distribution, and the hit probability of each length decreases in turn. The notes show us the specific hit probability of 1-8 length. When the length is 8, the probability is only 0.00000006, so small that red-black tree conversion of HashMap will hardly happen. Since we don’t store that much data for everyday use, would you store millions of data in a HashMap?

Of course, this is the ideal algorithm, but switching to a red-black tree is a good fallback strategy if some users use the HashMap procedure and the hashCode distribution is poorly dispersed.

As for what circumstances will lead to such a scene, you can think about or online to find the answer, I will not repeat, save some effort.

In a HashMap, it is the object’s hashCode that determines which bucket an object falls into. The JDK cannot prevent a user from implementing his or her own hash algorithm. If a user overwrites hashCode and the algorithm is poorly implemented, it is likely that the HashMap’s linked list will become very long. Like this:

public class HashMapTest { public static void main(String[] args) { Map<User, Integer> map = new HashMap<>(); for (int i = 0; i < 1000; I ++) {map.put(new User(" I "+ I), I); } } static class User{ private String name; public User(String name) { this.name = name; } @Override public int hashCode() { return 1; }}}Copy the code

We designed a class User whose hashCode is always 1, so that all User objects stored in HashMap will be stored in the same “bucket”, which will undoubtedly be very inefficient query, and this is one of the reasons why HashMap is designed to transform the linked list into a red-black tree. It can effectively prevent users from implementing a bad hash algorithm that leads to a long linked list.

The hash method

Speaking of hashing algorithms, let’s expand on one more thing, which I think is one of the really cool designs in HashMap.

In the source code of HashMap, the calculation of the stored object hashCode is determined by the hash() method, which is the core function of HashMap. When storing data, hash() is passed into the key to calculate the hash value of the key. This hash is used to determine where the key should be placed in the bucket. Here is the source of the method:

static final int hash(Object key) {
    int h;
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}
Copy the code

As you can see from the code, hash() takes the hashCode of the key, moves it unsigned 16 bits to the right, performs bitwise xor, and returns the value, which is the hash value of the key. This is done to minimize collisions, since most elements have the same hashCode in the lower levels, which can easily cause collisions if left untreated.

In addition to the 16-bit displacement, in addition to adding elements, HashMap also calculates the hash value with the size of table.length-1, which is the size of the bucket array. The result is the subscript of the bucket array to find the linked list to which the element belongs. The source code looks like this:

Tab. length if ((p = TAB [I = (n-1) & hash]) == null) TAB [I] = newNode(hash, key, value, null);Copy the code

When the corresponding index cannot be found, a new node is created as the head of the linked list. So why do I = (n-1) &hash?

This is actually an optimization, because the array size is always a power of two, after the expansion, the new index of an element is either in the original location, or in the original location plus the capacity before the expansion. The neat thing about this method is that ampersand, as mentioned earlier, only cares about the significant bits of n — 1 (n = the length of the array). When you expand, n has one more significant bit than before (n is twice as large, so it’s important to always keep the array length to the second power). Then, the new index position can be calculated by determining whether the hash is 0 or 1 in the position of the new valid bit. If 0, the index has not changed; if 1, the index adds the capacity of the original index.

It is shown in an effect picture:

Through operation, don’t have to recalculate the hash at every time of expansion, save a lot of time, and a new effective bit is 0 or 1 is a random, before the Entry of the collision of two may again in expansion and spread evenly, achieve a better distribution of discrete effect, have to exclamation, the designer’s skills are so cow force, A few words of seemingly simple code actually contain so much knowledge.

Why is the threshold for degradation to a linked list 6

As mentioned above, when the list length reaches the threshold of 8, it will become a red-black tree, but the threshold of red-black tree degenerates into a linked list is 6. Why not degenerate when the list length is less than 8? For example, it degenerates at 7, but it has to be less than or equal to 6. Right?

This is primarily a transition to avoid frequent transitions between linked lists and red-black trees. If the threshold is 7, deleting an element will degrade the red-black tree to a linked list, and adding an element will have to be treed. The back and forth conversion structure will undoubtedly degrade performance, so the threshold is not set so critical.

The last

There are many more HashMap knowledge points, here I also strongly encourage you to read the source code, not only for the interview, but also for their own better use of HashMap can have a clearer understanding, after all, it is too common, bad use is easy to produce bugs. And, I think the JDK source really have many worthy of our in-depth study of local developers, like the HashMap, its real code amount is not much, but very efficient, the most important thing is, it constantly optimize every version, every line of code is finely crafted, see the source code when I have been in my heart, If I could write such awesome code, would it still be a matter of entering jingdong or something?

Write in the last

Welcome to pay attention to my public number [calm as code], massive Java related articles, learning materials will be updated in it, sorting out the data will be placed in it.

If you think it’s written well, click a “like” and add a follow! Point attention, do not get lost, continue to update!!

Why is the number of elements in the underlying tree standard of HashMap 8?

General interview questions

The basics of HashMap

Why do I want to be a red black tree?

Why is the tree standard eight

The hash method

Why is the threshold for degradation to a linked list 6

The last

Write in the last

Related Posts

Azure Container example: A strong demonstration of Microsoft container innovation

The use of multithreading in Java general crawler framework

High concurrency order number design rules