preface

In the previous week’s technical sharing, a colleague explained the source code of HashMap, involving the purpose of some constants design, this article will talk about why these constants are designed in this way, I hope you can learn something.

HashMap default initialization size 1 << 4 (16)

/**
 * The default initial capacity - MUST be a power of two.
 */
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; 
Copy the code

The default initialization size of HashMap is 16, which is a power of 2, and 16 instead of 8 or 32.

Why is the default initialization size defined as a power of two?

 final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
        Node<K,V>[] tab; Node<K,V> p; int n, i;
        if ((tab = table) == null || (n = tab.length) == 0)
            n = (tab = resize()).length;
        if ((p = tab[i = (n - 1) & hash]) == null)
            tab[i] = newNode(hash, key, value, null);
Copy the code

We know that the underlying data structure of HashMap is array + linked list/array + red-black tree. According to the above method, it can be found that the positioning formula of array subscript index is: I = (n-1) & hash. When the initial size n is a multiple of 2, (n-1) & hash is equivalent to N %hash. So, why don’t we use mod here?

  • Because, and (&) is more efficient than mod (%)
  • A % b is equivalent to a-(a/b)*b.
  • And operations: a command to do

Therefore, the default initialization is defined as a power of two in order to use a more efficient and operation.

Why is the default initialization size 16 and not 8 or 32?

If it’s too small, 4 or 8, it expands more frequently; If it’s too big, 32 or 64 or even too big, it takes up memory.

As an analogy, suppose you run a coffee shop for couples. Usually seven or eight couples come for coffee, and the peak number is only 10. So, should you set up 8 tables, and consider adding more tables if there are too many people. If there are four tables, then there are often not enough seats to add tables, if there are ten or more tables, then there is no room.

Why is the default load factor 0.75

    /**
     * The load factor used when none specified in constructor.
     */
    static final floatDEFAULT_LOAD_FACTOR = 0.75 f;Copy the code

The load factor represents how full the hash table is, and is related to capacity expansion. Why not 0.5 or 1?

If the value is 0.5, it means that the capacity expansion begins when the hash table is half filled, which leads to frequent capacity expansion and low space utilization. If it is 1, it means that the hash table is completely full before it starts to expand, so even though space utilization is improved, the chance of hash conflicts is greater. Take a look at the source documentation for an explanation:

 * <p>As a general rule, the default load factor (.75) offers a good
 * tradeoff between time and space costs.  Higher values decrease the
 * space overhead but increase the lookup cost (reflected in most of
 * the operations of the <tt>HashMap</tt> class, including
 * <tt>get</tt> and <tt>put</tt>).  The expected number of entries in
 * the map and its load factor should be taken into account when
 * setting its initial capacity, so as to minimize the number of
 * rehash operations.  If the initial capacity is greater than the
 * maximum number of entries divided by the load factor, no rehash
 * operations will ever occur.
Copy the code

Translation:

As a general rule, the default load factor (0.75) provides a good trade-off between time and space costs. The larger the load factor number, the lower the space overhead, but the higher the search cost (as shown in most HashMap class operations, including GET and PUT). When setting the initial size, you should consider the expected number of entries in the map and its load factor, and minimize the number of rehash operations. If the initial capacity is greater than the maximum number of entries divided by the load factor, the rehash operation will not occur.

In short, load factor 0.75 is the final embodiment of the tradeoff between the chance of conflict and space utilization, and is also an empirical value of programmer experiment.

What is the significance of load factor in HashMap?

This answer explains
0.693

Finally, I chose 0.75. Maybe 0.75 is the best rounded number of 0.693, and the default size is 16*0.75=12, which is an integer.

Why is the threshold for list conversion red black tree 8

    /** * The bin count threshold for using a tree rather than list for a * bin. Bins are converted to trees when adding an element to a * bin with at least this many nodes. The value must be greater * than 2 and should be at least 8 to mesh with assumptions in * tree removal about conversion back to plain bins upon * shrinkage. */
    static final int TREEIFY_THRESHOLD = 8;
Copy the code

Red-black trees are introduced in the HashMap underlying data structure in JDK8 and later. When adding elements, if the bucket contains more than 8 linked list elements, it will automatically become a red-black tree. So why is the threshold 8? Take a look at the HashMap source code:

* Ideally, under random hashCodes, the frequency of * nodes in bins follows a Poisson distribution * (http://en.wikipedia.org/wiki/Poisson_distribution) With a * parameter of about 0.5 on average for the default resizing * threshold of 0.75, although with a large variance because of * resizing granularity. Ignoring variance, The expected * occurrences of list size k are (exp(-0.5) * POw (0.5, k) / * factorial(k)). The first values are: * * 0: 0.60653066 * 1: 0.30326533 * 2: 0.07581633 * 3: 0.01263606 * 4: 0.00157952 * 5: 0.00015795 * 6: 0.00001316 * 7: 0.00000094 * 8: 0.00000006 * More: less than 1 in ten millionCopy the code

Ideally, in the case of random hash code, for the default loading factor of 0.75, the distribution frequency of nodes in the bucket follows the Poisson distribution with the parameter of 0.5, even though the granularity adjustment will produce large variance.

From the comparison table, it can be seen that the probability of the number of elements in the linked list being 8 is very, very small, so the threshold value of the red-black tree conversion of the linked list is 8.

Why is the list restore threshold of a tree 6

/** * The bin count threshold for untreeifying a (split) bin during a * resize operation. Should be less than TREEIFY_THRESHOLD, and at * most 6 to mesh with shrinkage detection under removal. */
    static final int UNTREEIFY_THRESHOLD = 6;
Copy the code

From the analysis in the previous section, it can be seen that the tree tree threshold is 8, so why is the tree restored to the list 6 instead of 7? This is to prevent frequent conversions between lists and trees. If it is 7, and if a HashMap is constantly inserting and deleting elements, and the number of linked lists hovers around 8, it is very inefficient to rotate lists and linked lists frequently.

Why is the maximum capacity 1 << 30

/** * The maximum capacity, used if a higher value is implicitly specified * by either of the constructors with arguments. * MUST be a power of two < = 1 < < 30. * /
    static final int MAXIMUM_CAPACITY = 1 << 30;
Copy the code

Why should a HashMap satisfy 2 to the n?

According to the analysis in the first section (why is the default initial size of HashMap 1 << 4), the capacity of HashMap should meet the power of 2, and the operation efficiency of and is higher than that of mod. And is equal to mod only if the capacity is 2 to the n.

tab[i = (n - 1) & hash]
Copy the code

Why not 2 to the 31st?

We know that an int is four bytes, and a byte is eight bits, so it is a 32-bit integer, or at most 32 bits. So, logically, the largest number you can move 31 places to the left is 2 to the 31st, so why isn’t it 2 to the 31st?

In fact, the leftmost bit of the binary number is the sign bit, which is used to represent the positive and negative bits. Let’s look at the demo code:

        System.out.println(1<<30);
        System.out.println(1<<31);
        System.out.println(1<<32);
        System.out.println(1<<33);
        System.out.println(1<<34);
Copy the code

Output:

1073741824
-2147483648
1
2
4
Copy the code

Therefore, the maximum capacity of a HashMap is 1 << 30.

Why is the minimum tree capacity of a hash table 64

    /** * The smallest table capacity for which bins may be treeified. * (Otherwise the table is resized if too many nodes in a bin.) * Should be at least 4 * TREEIFY_THRESHOLD to avoid conflicts * between resizing and treeification thresholds. */
    static final int MIN_TREEIFY_CAPACITY = 64;
Copy the code

This is because when the capacity is lower than 64, the probability of hash collisions is relatively high, and the probability of long lists is slightly higher at this time. For long lists generated under this reason, we should prioritize expansion to avoid unnecessary treification.

Reference and thanks

  • Why is the loadFactor of HashMap 0.75?
  • Why is the loading factor in Java Hashmap 0.75 by default
  • Why is a HashMap with a list length greater than 8 converted to a red-black tree
  • What is the significance of load factor in HashMap?
  • Why is the maximum size of a HashMap 2 to the 30th power
  • Java programmers should understand the Java8 HashMap

Personal public account

  • If you are a good boy who loves learning, you can follow my public account and study and discuss with me.
  • If you feel that this article is not correct, you can comment, you can also follow my public account, private chat me, we learn and progress together.