(1) HashMap series: load factor 0.75

(2) HashMap series: Tree threshold 8, degradation threshold 6

(3) HashMap series: 2 power expansion

(4) HashMap series: Put elements (you will regret it forever!)

Red black tree series:

First, “Algorithm – Simple” N fork tree introduction

Two, “Algorithm – simple” red black tree rotation

Three, “Algorithm – Simple” red black tree insertion

4. Delete the red black tree of algorithm – Simple and Simple

One, foreword

HashMap in JDK7 differs somewhat from JDK8 in that the data structure changes after hash collisions:

  • JDK7: array + single linked list;
  • JDK8: array + single linked list + red black tree;

The topic of this article is load factor (0.75), so it is for array analysis, why 0.75!

Second, the role of load factor

In a HashMap, all objects are stored in an array of variables named table of type Node:

transient Node<K,V>[] table;
Copy the code

The initial size of the array (Capacity = 16).

When the value of the hash(key) conflicts with the existing one, a new node is added after the conflicting one, which becomes a linked list.

static class Node<K,V> implements Map.Entry<K,V> {
    final int hash;
    final K key;
    V value;
    Node<K,V> next;
}
Copy the code

Each Node object starts with null next, and only in case of a conflict will next point to the next new Node object with the same hash but a different key. When the length of the list exceeds 8, it becomes a “red-black tree”; When the number of red-black trees is less than 6, it is converted to a linked list.

As more data is added to the table, collisions become more frequent, affecting the performance of a HashMap, whether in time or space. How?

2.1. Load factor = 1.0

When the load factor is 1.0, it means that the table will be expanded only when the table is fully filled. As the number of elements in the table increases, a large number of conflicts occur:

  • If it is a linked list, the query efficiency is O(n), that is, linear; Insert/delete nodes are also traversed through the list.
  • If it is a red-black tree at this time, the query efficiency is O(logN), but the insertion and deletion of red-black tree is extremely complex, and the tree needs to be adjusted every time.

Thus, load factor 1.0 actually sacrifices time but ensures space utilization.

2.2 load factor = 0.5

When the load factor is 0.5, it means that when the number of elements in the table reaches half, the capacity of the table doubles:

  • Hash conflicts reduced.
  • Lists are not very long;
  • Even if the list becomes a red-black tree when the length exceeds 8, the tree is not very tall.

The query efficiency gets better, but here, we see, there’s a lot of wasted memory, because the number in the array is always less than half the length of the array. Thus, the load factor of 0.5 actually sacrifices space but ensures time efficiency.

2.3. Load factor = 0.75

After the above analysis, we find that [0.5, 1.0] correspond to [time extremes and space extremes], so we need to find a balance point, which is a reasonable load factor value. There are explanations in the source code:

As a general rule, the default load factor (.75) offers a good tradeoff between time and space costs. Higher values decrease the space overhead but increase the lookup cost (reflected in most of the operations of the HashMap class, including get and put). The expected number of entries in the map and its load factor should be taken into account when setting its initial capacity, so as to minimize the number of rehash operations. If the initial capacity is greater than the maximum number of entries divided by the load factor, no rehash operations will ever occur.

The load factor of 0.75 balances the time and space costs well, but too high a factor increases the query cost.