“This is the third day of my participation in the November Gwen Challenge. See details of the event: The last Gwen Challenge 2021”.

You know that an ArrayList is an array, and the data is limited in length; You need to expand the array at the right time.

When we initialize an ArrayList of length 2 and write three pieces of data into it, the ArrayList has to expand, that is, copy the old data into a new array of length 3.

Here is the expanded source code, the reason is 3, because the new length = the old length * 1.5.

private void grow(int minCapacity) {
    int oldCapacity = elementData.length;
    // The new capacity is 1.5 times the old capacity
    int newCapacity = oldCapacity + (oldCapacity >> 1);
    // If the new capacity is found to be smaller than the required capacity, the required capacity prevails
    if (newCapacity - minCapacity < 0)
        newCapacity = minCapacity;
    // If the new capacity exceeds the maximum capacity, use the maximum capacity
    if (newCapacity - MAX_ARRAY_SIZE > 0)
        newCapacity = hugeCapacity(minCapacity);
    // Copy a new array with the new capacity
    elementData = Arrays.copyOf(elementData, newCapacity);
}
Copy the code

The default length of an ArrayList is 10.

/** * Default capacity */
private static final int DEFAULT_CAPACITY = 10;
Copy the code

The array DEFAULT_CAPACITY = 10 is not created at initialization. Instead, it expands to 10 when you add the first data.

private static int calculateCapacity(Object[] elementData, int minCapacity) {
    // If the array DEFAULTCAPACITY_EMPTY_ELEMENTDATA is empty, initialize to default size 10
    if (elementData == DEFAULTCAPACITY_EMPTY_ELEMENTDATA) {
        return Math.max(DEFAULT_CAPACITY, minCapacity);
    }
    return minCapacity;
}

private void ensureExplicitCapacity(int minCapacity) {
    modCount++;

    if (minCapacity - elementData.length > 0)
        / / capacity
        grow(minCapacity);
}
Copy the code

Now that we know that the default length is 10, it will be expanded to 10 * 1.5 = 15 once the ninth element is written. This step is array copy, that is, to create a new memory space to hold the 15 arrays. Once we write frequently and in large numbers, we can cause a lot of array copying, which is extremely inefficient.

But we can avoid this problem in advance if we anticipate how many pieces of data are likely to be written.

For example, if we write 1000W pieces of data into it, there is a huge difference in performance between the array size given at initialization and the default size of 10.

Verify this with the JMH benchmark.

Referring to JMH, it is estimated that some partners do not know, the following first to explain JMH.

JMH is an abbreviation of Java Microbenchmark Harness. The Chinese translation is roughly “JAVA microbenchmark suite”. Start by understanding what a “benchmark” is.

Baidu Encyclopedia gives the following definition:

Benchmark testing refers to the quantitative and comparable testing of a certain performance index of a class of test objects through the design of scientific testing methods, testing tools and testing systems.

It can be simply likened to master Lu, which is commonly used in our computers, or performance testing software such as An Tutu, which is commonly used in mobile phones. They test the performance of an object, such as a graphics card, IO, CPU, etc., against a certain benchmark or under certain conditions.

Why use JMH

Benchmarks have the following characteristics:

Repeatability: Repeatable tests can be performed to compare the test results and obtain the long-term performance trend, which can be used as a reference for system tuning and capacity planning before going online.

Observability: Understand and analyze what is happening in the test process through comprehensive monitoring (from start to end of the test, execution machine, server, database).

Displayability: Relevant personnel can intuitively understand the test results (in the form of Web interface, dashboard, line chart tree chart, etc.).

Authenticity: The test results reflect the real situation experienced by the customer (real and accurate business scenario + consistent configuration with production + reasonable and correct test method).

Executable: It can be quickly tested, validated, and tuned (locatable and analyzed).

So it’s very tedious and very difficult to do a benchmark test that matches a trait. External factors can easily affect the final test results. Especially for Benchmarks in JAVA.

You can get different results depending on how many times you run and how long you run, so it’s hard to get a stable result.

One solution to this is a lot of repeated calls, and some warm-up before the actual test to make the results as accurate as possible.

In addition to this, we need a good presentation of the results that we can use to judge performance.

And JMH has all of these! 😊

How do I use JMH

JMH comes with JDK9, if you are an earlier version of JDK9 you can also import openJDK

<dependency>
    <groupId>org.openjdk.jmh</groupId>
    <artifactId>jmh-core</artifactId>
    <version>1.19</version>
</dependency>
<dependency>
    <groupId>org.openjdk.jmh</groupId>
    <artifactId>jmh-generator-annprocess</artifactId>
    <version>1.19</version>
</dependency>
Copy the code

As an example, there is a huge difference in performance between an ArrayList being initialized with a given array length and using the default length of 10.

@Warmup(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
public class CollectionsTest
{
    private static final int TEN_MILLION = 10000000;

    @Benchmark
    @BenchmarkMode(Mode.AverageTime)
    @OutputTimeUnit(TimeUnit.MICROSECONDS)
    public void arrayList(a)
    {
        List<String> array = new ArrayList<>();
        for (int i = 0; i < TEN_MILLION; i++)
        {
            array.add("123"); }}@Benchmark
    @BenchmarkMode(Mode.AverageTime)
    @OutputTimeUnit(TimeUnit.MICROSECONDS)
    public void arrayListSize(a)
    {
        List<String> array = new ArrayList<>(TEN_MILLION);
        for (int i = 0; i < TEN_MILLION; i++)
        {
            array.add("123"); }}public static void main(String[] args) throws RunnerException
    {
        Options opt = new OptionsBuilder()
                .include(CollectionsTest.class.getSimpleName())
                .forks(1)
                .build();
        newRunner(opt).run(); }}Copy the code

Running results:

# Run complete. Total time: 00:00:23

Benchmark                      Mode  Cnt      Score       Error  Units
CollectionsTest.arrayList      avgt    5  50264.850 ±  6723.299  us/op
CollectionsTest.arrayListSize  avgt    5  38389.625 ± 14797.446  us/op
Copy the code

The results show that the preset length is much more efficient than using the default length (Score here refers to the time it takes to complete the function).

So it’s highly recommended that you initialize the specified length when a lot of data is written to an ArrayList.

Do not use add(int index, E element) to write data to the specified location.

public void add(int index, E element) {
    // Check for overbounds
    rangeCheckForAdd(index);
    // Check whether the capacity needs to be expanded
    ensureCapacityInternal(size + 1);
    // Move inex and the element after it one bit back to leave the index position empty
    System.arraycopy(elementData, index, elementData, index + 1,
                     size - index);
    // Insert the element at the index position
    elementData[index] = element;
    // The size increases by 1
    size++;
}
Copy the code

We can see from the source code, each write will be index after the data moved back again, in fact, is also to copy the array; But instead of writing data to the end of an array, it copies the array every time, which is extremely inefficient.

summary

High-performance applications are built up little by little, like the pit of ArrayList mentioned here. Everyday use is not a big problem, but once the amount of data is added, all the small problems become big problems.

  • When using ArrayList, if you can predict the size of the data in advance, you must specify the length if it is large.
  • Avoid using if possibleadd(index,e)API, which causes arrays to be copied, reducing efficiency.
  • One more thing, another one that we use a lotMapThe containerHashMapIt is also recommended to initialize the length to avoid expansion.