Talking about HashMaps, how do you optimize code with maps

I didn’t just start with HashMap and think about starting over and writing technical stuff. Recently, I began to re-learn data structures and algorithms. After learning some things, I gained new understanding on the understanding and application of HashMap. Although I have used HashMap in this way before, I found the benefits of using it after knowing the methodology.

Last time I wrote HashMap, writing Hash before JDK8, now JDK15, you can take a look at the source plan for understanding data structures from HashMap

JDK8的HashMap

JDK8 HashMap has been optimized for use in JDK8 and above.

Hash method changes

Hash algorithm after JDK8:

static final int hash(Object key) {
        int h;
        return (key == null)?0 : (h = key.hashCode()) ^ (h >>> 16);
    }
Copy the code

Hash algorithm for JDK7:

static int hash(int h) {
h ^= (h >>> 20) ^ (h >>> 12);
return h ^ (h >>> 7) ^ (h >>> 4);
}
Copy the code

As you can see, JDK8 uses the ternary operator after that, and evaluates twice, one right shift and one xOR.

In JDK7, there are four right shifts and four perturbations, and the JDK improves performance on the Hash algorithm.

Changes to the storage data structure

Before JDK8, after a Hash collision, the same node will be stored in the linked list.

Changes that have occurred since JDK8

When the same node stores 8 bytes of data, the storage structure turns the linked list into a red-black tree.

Then the data size of the node node reaches 8 at the beginning, and then the map data decreases. The data size of the node is less than 8. Is the storage structure of the node still a red-black tree?

A: Node can be a red-black tree or degenerate into a linked list structure because the degradation threshold is not 8, but 6.

The HashMap source code below shows that the red-black tree is converted to a linked list only when the node data size is less than 6.

static final int TREEIFY_THRESHOLD = 8;

static final int UNTREEIFY_THRESHOLD = 6;

if(loHead ! =null) {
                if (lc <= UNTREEIFY_THRESHOLD)
                    tab[index] = loHead.untreeify(map);
                else {
                    tab[index] = loHead;
                    if(hiHead ! =null) // (else is already treeified)loHead.treeify(tab); }}Copy the code

Why is the degradation threshold 6 instead of 8?

The answer: In terms of query efficiency, the time complexity of linked list structure is O(n)O(n)O(n) O(n) red black tree is O(logn)O(logn)O(logn) O(logn), but red black tree is a special binary tree. In extreme cases, red black tree actually becomes like linked list with small amount of data. Is most likely to happen this kind of situation, in this case, the red and black tree query time complexity and the list is the same, tend to be O (n), but the red and black tree 2 times larger than the ordinary tree node memory, on the space is not as good as a linked list, and after the threshold is 6, but not into the red-black tree threshold, is in order to avoid repeated transformation. (The source code is instructive and can be used in this way in our business code to avoid repeated conversion)

During the HashMap expansion, change the header interpolation method to the tail interpolation method

Why this change? Before JDK8 versions, multi-threaded operation, HashMap will be infinite loop problem, and the causes of this problem is because, HashMap when expansion, head of interpolation, the chain table head inserted, lead to the original data chain table position changes, can appear the following situation, form a circular linked list, lead to an infinite loop.

JDK8 later changed to the tail interpolation method, the original data list position does not change, the above situation will not occur.

But even then, HashMap is not thread-safe. A HashMap does not guarantee that the value of the last put is the same as the value of the next GET, because the PUT and GET methods are not locked.

Use HashMap correctly

We often use hashmaps to cache data in projects, but the Ali development manual specification states that when creating a HashMap, you specify the size of the HashMap, preferably a power of two.

Why a power of two

Answer: This is to make it easier to compute bits, which can reduce hash collisions and distribute data evenly.

Use HashMap to cache data

This is a commonly used optimization that focuses on the thread-safety of HashMap in multi-threaded situations. CurrentHashMap is recommended in multi-threaded situations

Reduce the for loop with HashMap

Let’s look at an algorithm:

Enter the array a = [1,2,3,4,5,5,6] to find the number that occurs the most.

The first solution:

public void s1(a) {
    int a[] = { 1.2.3.4.5.5.6 };
    int val_max = -1;
    int time_max = 0;
    int time_tmp = 0;
    for (int i = 0; i < a.length; i++) {
        time_tmp = 0;
        for (int j = 0; j < a.length; j++) {
            if (a[i] == a[j]) {
            time_tmp += 1;
        }
            if (time_tmp > time_max) {
                time_max = time_tmp;
                val_max = a[i];
            }
        }
    }
    System.out.println(val_max);
}
Copy the code

And you can see that this is a common way of thinking, with a double loop, traversing it twice, O(n2)O(n^2)O(n2).

We can use HashMap to record the number of occurrences of each element. The solution is as follows:

public void s2(a) {
    int a[] = { 1.2.3.4.5.5.6 };
    Map<Integer, Integer> d = new HashMap<>();
    for (int i = 0; i < a.length; i++) {
        if (d.containsKey(a[i])) {
            d.put(a[i], d.get(a[i]) + 1);
        } else {
            d.put(a[i], 1); }}int val_max = -1;
    int time_max = 0;
    for (Integer key : d.keySet()) {
        if (d.get(key) > time_max) {
            time_max = d.get(key);
            val_max = key;
        }
    }
    System.out.println(val_max);
}
Copy the code

The above solution also uses two layers of for loop, but it is not nested, so the time complexity is O(2n). Since the time complexity is independent of the coefficient, the time complexity of the above solution is O(n)O(n)O(n) O(n).

In most business scenarios except nested loop, can use the above way, can reduce the time complexity, but also a premise condition, the premise condition is that the outer or inner loop on the number of times past the small amount of data is known, is too large amount of data, may lead to memory overflow, and too much data, HashMap Hash collision will happen, The storage structure becomes an array + a linked list, or even a red-black tree, and the HashMap query complexity becomes O(n) or O(logn).

HashMap requires that we use it properly, but not overuse it. I used to work for a company where people read 100,000 pieces of data into HashMap and stored them in a large amount of memory, then converted them and ran out of memory at 40,000 pieces of data.

Hashmaps are stored in memory, and when used, it is important to avoid running out of memory.

A HashMap capacity

The HashMap expansion does not change much.

This article does not talk about JDK8 expansion mechanism, but mainly discusses JDK8HashMap optimization points, and how to use HashMap to optimize our code.

There is another problem with the scaling mechanism: why is the HashMap load factor 0.75?

Answer: Assuming a load factor of 1, a HashMap with a default capacity of 16 will expand only when all 16 positions of the table data structure are occupied. The Hash collisions will increase, and the underlying red-black tree will become extremely complex, sacrificing time and saving space. On the other hand, if the load factor is too small, the capacity will be expanded prematurely, and the Hash conflict will be reduced, but the space will be sacrificed. 0.75 is a neutral choice.

The last

Next time, we’ll look at how CurrentHashMap is thread-safe, as well as other maps; And the special maps of Google’s open source project Guava.

Off-topic: since I also set up a personal blog years ago, I was thinking not to write anything, but unfortunately, so from now on, I will guarantee the output of two articles a week, and urge myself to continue to strengthen learning.