Set the initial capacity of the HashMap

Setting the initial capacity of the HashMap is just the beginning of optimization.

In order to reduce the overhead of HashMap resize (resize), Java programmers should set an initial capacity for HashMap. For example, like this code in my relationship with (zi) and (ji) :

@Test
public void longLongAGo(a) {
    int count = 1000000;

    System.out.println("---------------- do not set initial hashMap capacity ------------");
    long start = System.currentTimeMillis();
    HashMap<Integer, Object> map = new HashMap<>();
    for (int i = 0; i < count; i++) {
        map.put(i, UUID.randomUUID());
    }
    long end = System.currentTimeMillis();
    System.out.println("Time to add 1,000,000 elements: + (end - start));

    System.out.println("-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- setting a hashMap initial capacity -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --");
    long start1 = System.currentTimeMillis();
    HashMap<Integer, Object> map1 = new HashMap<>(count);
    for (int i = 0; i < count; i++) {
        map1.put(i, UUID.randomUUID());
    }
    long end1 = System.currentTimeMillis();
    System.out.println("Time to add 1,000,000 elements: + (end1 - start1));
}
Copy the code

My colleague said that he set the size of the map during initialization, so that it would not automatically expand during the process of adding elements, which greatly improved the performance, and it did!

Therefore, when a collection is initialized, specifying the size of its initial value can improve performance.

However, with a skeptical attitude, I compared the expansion times of hashMap when the initial capacity is set and not set. When the initial capacity is set to 1000000, the container is not not expanded as imagined, but also expanded once:

@SneakyThrows
@Test
public void testing(a) {
    int count = 1000000;

    System.out.println("-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- the initialization hashMap capacity of 1000000 -- -- -- -- -- -- -- -- -- -- -- --");
    int resizeCount = 0;
    HashMap<Integer, Object> map = new HashMap<>(count);
    Method capacityMethod = map.getClass().getDeclaredMethod("capacity");
    capacityMethod.setAccessible(true);
    int capacity = (int) capacityMethod.invoke(map);
    System.out.println("Initial capacity:" + capacity);
    for (int i = 0; i < count; i++) {
        map.put(i, UUID.randomUUID());
        int curCapacity = (int) capacityMethod.invoke(map);
        if (curCapacity > capacity) {
            System.out.println("Current capacity:" + curCapacity);
            resizeCount++;
            capacity = curCapacity;
        }
    }
    System.out.println("HashMap expansion times:" + resizeCount);

    System.out.println("-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - do not initialize the hashMap capacity -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --");
    resizeCount = 0;
    HashMap<Integer, Object> map1 = new HashMap<>();
    Method capacityMethod1 = map1.getClass().getDeclaredMethod("capacity");
    capacityMethod1.setAccessible(true);
    int capacity1 = (int) capacityMethod1.invoke(map1);
    System.out.println("Initial capacity:" + capacity1);
    for (int i = 0; i < count; i++) {
        map1.put(i, UUID.randomUUID());
        int curCapacity = (int) capacityMethod1.invoke(map1);
        if (curCapacity > capacity1) {
            System.out.println("Current capacity:" + curCapacity);
            resizeCount++;
            capacity1 = curCapacity;
        }
    }
    System.out.println("Capacity expansion times:" + resizeCount);
}
Copy the code

Since we cannot call the capacity() method of the hashMap directly, we use reflection to monitor the number of times the hashMap expands by looking at its capacity change each time an element is added.

// Using reflection, call the capacity() method of hashMap
Method capacityMethod = map.getClass().getDeclaredMethod("capacity");
capacityMethod.setAccessible(true);
int capacity = (int) capacityMethod.invoke(map);
Copy the code

To get a sense of how reflection works, welcome to one of Java’s most powerful technologies: reflection.

Back to the execution result of the above program:

-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - initializes the hashMap capacity of 1000000 -- -- -- -- -- -- -- -- -- -- -- -- initial capacity: 1048576 current capacity: 2097152 hashMap expansion number: 1 ---------------- Uninitialized hashMap capacity ------------------- Initial capacity: 16 Current capacity: 32 Current capacity: 64 Current capacity: 128 Current capacity: 256 Current capacity: 512 Current capacity: 1024 Current capacity: 2048 Current Capacity: 4096 Current Capacity: 8192 Current Capacity: 16384 Current Capacity: 32768 Current capacity: 65536 Current capacity: 131072 Current capacity: 262144 Current capacity: 524288 Current capacity: 1048576 Current capacity: None. 2097152 Capacity Expansion times: 17Copy the code

Through the running results, it is found that:

  • The initial size of the hashMap is not 1000000 as I specified, but 1048576 (2^20)
  • The capacity of a hashMap is not fixed. It will be expanded from 16 to 32, 64, 128… (Hash selects the first power of 2 greater than the current capacity as the capacity)
  • Even if the initial capacity is specified and the initial capacity is 1048576, the hashMap is expanded once after the execution is complete by adding 1000000 elements (1000000 is less than 1048576)

Why is that? With these three findings in mind, take a look at the HashMap scaling mechanism.

Expansion mechanism of HashMap

Let’s take a look at a few member variables of the HashMap:

  • DEFAULT_INITIAL_CAPACITY: The default initial capacity is 2^4=16
  • DEFAULT_LOAD_FACTOR: The default load factor is 0.75, which measures how full the HashMap is
  • Transient int size: Number of K and V pairs in map
  • Final float loadFactor: loadFactor, which defaults to 0.75
  • Int threshold: the next size value to be resized (capacity x load factor). When the number of k and V exceeds the threshold, the HashMap will expand the capacity

Capacity () :

final int capacity(a) {
    return(table ! =null)? table.length : (threshold >0)? threshold : DEFAULT_INITIAL_CAPACITY; }Copy the code

What is this? Didn’t we already define a size variable?

Capacity can be thought of as the volume of the HashMap bucket (which can grow), and size is how many things the bucket currently holds.

The capacity of a bucket is defined by threshold, and the default capacity is 2 to the fourth power, i.e. 16.

/** * The default initial capacity - MUST be a power of two. */
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16
Copy the code

1 < 4 means we moved 4 places to the left, which is 2^4=16.

So when does it expand? When the number of pairs of k and V in the bucket is approaching capacity, the bucket will be expanded.

As shown in the previous example, the hashMap does not wait until the size reaches capacity. Instead, the hashMap is expanded when a certain value of Capacity is reached. This value is threshold.

Part of the source code has been folded, mainly display and capacity related parts.

When size increases to a value greater than threshold, hashMap resize(), and threshold = loadFactor * capacity, so that we know when hashMap bucket automatically increases its size.

Really avoid HashMap expansion

According to the previous analysis, when size > threshold, hashMap will be expanded. Using the formula threshold = loadFactor * capacity, we will have a direction during initialization.

LoadFactor * capacity loadFactor * capacity loadFactor * capacity loadFactor * capacity loadFactor * capacity loadFactor * capacity loadFactor * capacity loadFactor * capacity

int initCapacity = 1 + (int) (count / 0.75);
HashMap<Integer, Object> map = new HashMap<>(initCapacity);
Copy the code

1 + (int) (count / 0.75)

/** * Returns a power of two size for the given target capacity. */
static final int tableSizeFor(int cap) {
    int n = cap - 1;
    n |= n >>> 1;
    n |= n >>> 2;
    n |= n >>> 4;
    n |= n >>> 8;
    n |= n >>> 16;
    return (n < 0)?1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
}
Copy the code

This piece of code is out of this world! Its purpose is: according to the incoming capacity value CAP, through a series of fairy operations calculation, get the first power of 2 larger than it and return.

These are binary bit operations that shift the number to the right and then sum with the original value. You can just take any number and plug it into the code, and you get the first power of 2 that’s higher!

Why do so, perhaps it is because the unsigned right shift > > >, or | is fast!

results

The formula for calculating capacity has been worked out before, now let’s verify that it is correct:

@SneakyThrows
@Test
public void perfect(a) {
    int count = 1000000;

    int initCapacity = 1 + (int) (count / 0.75);
    HashMap<Integer, Object> map = new HashMap<>(initCapacity);
    Method capacityMethod = map.getClass().getDeclaredMethod("capacity");
    capacityMethod.setAccessible(true);
    int capacity = (int) capacityMethod.invoke(map);
    System.out.println("jdk hashMap default capacity:" + capacity);
    int resizeCount = 0;
    for (int i = 0; i < count; i++) {
        map.put(i, UUID.randomUUID());
        int curCapacity = (int) capacityMethod.invoke(map);
        if (curCapacity > capacity) {
            System.out.println("Current capacity:" + curCapacity);
            resizeCount++;
            capacity = curCapacity;
        }
    }
    System.out.println("HashMap expansion times:" + resizeCount);
Copy the code

Running results:

The number of expansion times is 0, Perfect!

TableSizeFor = 2097152=2^21 (tableSizeFor)

Do not want to calculate the initial capacity – there is still another way

Guava is an open-source Java library that contains many of the core libraries that Google is using for many of their projects. This library is designed to facilitate coding and reduce coding errors. This library provides utility methods for collections, caching, support primitives, concurrency, common annotations, string handling, I/O, and validation.

Guava has a way of initializing a HashMap that doesn’t require us to calculate initCapacity, so let’s test it out.

First introduce the Guava package:

<dependency>
    <groupId>com.google.guava</groupId>
    <artifactId>guava</artifactId>
    <version>29.0-jre</version>
</dependency>
Copy the code

Testing:

@SneakyThrows
@Test
public void perfectWithGuava(a) {
    int count = 1000000;

    HashMap<Integer, Object> map = Maps.newHashMapWithExpectedSize(count);
    Method capacityMethod = map.getClass().getDeclaredMethod("capacity");
    capacityMethod.setAccessible(true);
    int capacity = (int) capacityMethod.invoke(map);
    System.out.println("guava hashMap default capacity:" + capacity);
    int resizeCount = 0;
    for (int i = 0; i < count; i++) {
        map.put(i, UUID.randomUUID());
        int curCapacity = (int) capacityMethod.invoke(map);
        if (curCapacity > capacity) {
            System.out.println("Current capacity:" + curCapacity);
            resizeCount++;
            capacity = curCapacity;
        }
    }
    System.out.println("HashMap expansion times:" + resizeCount);
}
Copy the code

Running results:

HashMap can also be used without expansion!

Take a look at the key code:

. = Maps.newHashMapWithExpectedSize(count);Copy the code

I guess this newHashMapWithExpectedSize (int) in the source must be in accordance with the return similar to HashMap (n < 0)? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1; This method is calculated, let’s see:

Congratulations! You got it all right!

summary

  • A hashMap with an initial capacity set. The actual initial capacity is not necessarily the specified value, but is calculated internally in the hashMap
  • The capacity of a hashMap is not fixed. It will be expanded from 16 to 32, 64, 128… (Hash selects the first power of 2 greater than the current capacity as the capacity)
  • Do not assume that if you specify the initial capacity, the hashMap will not be expanded
  • The way to avoid hashMap scaling is to pass in one1 + (int) (count / 0.75)The initial value calculated
  • You can also use Guava’snewHashMapWithExpectedSize(int count)

Recommended reading

  • 【 Elegant escape pit 】 Your money is wrong! Why is 0.1+0.2 not equal to 0.3! ?
  • Do not use == to compare two Integer values
  • From optimization of captcha generation code to JVM stacks and heaps
  • One of Java’s most powerful technologies: reflection

Your praise, my motivation!

Wechat search public account line 100 li ER, get more articles.

Github: github.com/xblzer/Java…