This article is participating in the “Java Theme Month – Java Development in action”. See the link to the event for more details

This article is full of good stuff for Java engineers who have been in a “screw” position for a long time and want to quickly improve their capabilities. If you think this article is not clear or there are mistakes, please correct 👏

The story background

Our company’s JDK is version 1.7 and recently launched a new service, but we found that our service was constantly getting stuck. After our analysis, a terrible thing has surfaced……

Haha, hello. I’m Fuyao. The headlines are a little clickbait. But the content is definitely quite dry. In the next installment, I’ll take you through the Map set and explain what caused the carnage.

In Java Collection, we will inevitably use the Map Collection. So here’s the problem. Do you have a deep enough understanding of Java’s Map collection?

What is the Map

If you want to know what a Map set is. I’ll take you through it right now. 🤪

Time complexity

First of all, if you have taken computer related courses. You will have learned an academic term. It’s called time complexity.

In computer science, the Time complexity of an algorithm is a function that qualitatively describes the running Time of the algorithm. This is a function of the length of the string that represents the input value of the algorithm. The time complexity is often expressed in big O notation, excluding the lower order terms and the leading coefficients of the function. In this way, the time complexity can be said to be asymptotic, that is, as the size of the input value approaches infinity.

— An excerpt from Wikipedia

To put it bluntly, time complexity is a general description of the time it takes to execute an algorithm in a computer from the start to a result. It uses the big O notation (that is, T(n) = O(f(n))). An example of this is a common loop:

For (I =1; i<=n; {arr[I] = I; // Second line}Copy the code

How do we figure out the time complexity of this?

  • The first line of code is executed only once. It takes 1.
  • The second line of code is going to be executed n times. It takes n.

So the total time is 1 plus n. Let’s plug in the formula. The time complexity of this method is T (n) = O (1+n), and it can be seen from this result that the time of this algorithm changes with the constant change of n. Then the mathematical logic can be shortened to T (n) = O (n).

HashMap

HashMap is the standard implementation of the Hash table required to make a data structure with an average insertion, lookup, and deletion time of O (1).

Hash table structure

Hash table is implemented by Hash bucket + linked list.

What is a HashMap

In simple terms, you simply hash the passed key, find the corresponding hash bucket, and throw the value into it. If there are other values in the bucket, the bucket is changed to a linked list structure, as shown in the figure above.

How important the Hash table is

Because the structure of the insertion, search, delete time complexity is low. Resulting in its widespread use. How important exactly?

If we look at the Java world ancestor of the Object class,

Object has only a few methods, two of which are specifically designed for hash tables.

Therefore, it is very important to learn it well 👍

Equals & hashCode convention

First, let’s explain what the equals & hashCode convention is, which I covered in detail in the previous article. I’ll mention it briefly here.

  1. An object’s hashCode must always be consistent.

  2. If object A and object B call equals differently, allow them to call hashCode with equal values. However, it is not recommended to do this, because it will affect the lookup speed of the internal Hash table.

  3. If object A and object B call equals equally, then their call to hashCode must also return the same value.

How do I compute the hash value

In the Java world, when you need to put data into a HashMap, the JVM will call the object’s hashCode method to calculate the hash value for you. Normally, there’s no weird problem. But there’s always someone who likes to do something weird. Thus violating the above agreement.

So what’s the problem with breaking the agreement?

Let’s look at them separately.

Object Disappearance mystery

At this point, I remember a joke.

For software engineers, the absence of objects is nothing. You can just get yourself a new one.

— Bad joke (lest everyone fall asleep)

If you break clause 1 above. Then you will find that your object has somehow disappeared.

Example code:

import java.util.HashMap;
import java.util.Map;
import java.util.Objects;

class Scratch {

    private String name;
    private String nickName;

    public Scratch(String name, String nickName) {
        this.name = name;
        this.nickName = nickName;
    }

    public void setName(String name) {
        this.name = name;
    }

    public void setNickName(String nickName) {
        this.nickName = nickName;
    }

    @Override
    public boolean equals(Object o) {
        if (this == o) return true;
        if (o == null|| getClass() ! = o.getClass())return false;
        Scratch scratch = (Scratch) o;
        return Objects.equals(name, scratch.name) && Objects.equals(nickName, scratch.nickName);
    }

    @Override
    public int hashCode(a) {
        return Objects.hash(name + nickName);  // The wrong way to write 👎
    }

    public static void main(String[] args) {
        Scratch numberOne = new Scratch("Technician 1".O "water");
        Scratch numberTwo = new Scratch("Technician 2"."Ngawang");
        Map<Scratch, String> testMap = new HashMap<>();
        testMap.put(numberOne, "I'm Technician number one.");
        testMap.put(numberTwo, "I'm Technician number two.");
        System.out.println("------ speak now -------");
        System.out.println(testMap.get(numberOne));
        System.out.println(testMap.get(numberTwo));
        numberTwo.setNickName("Bill");
        System.out.println("------ now for the second round of talks -------"); System.out.println(testMap.get(numberOne)); System.out.println(testMap.get(numberTwo)); }}Copy the code

Guess what it prints back?

Unfortunately, technician number two is missing.

“Why”

The JVM needs to call the object’s hashCode method before putting in the HashMap. Due to our wrong rewrite and subsequent modifications to internal properties. As a result, the HashMap gets a different hash value, so we can’t find the object we want.

You’re too slow!

If you break clause 2. It’s not a big deal. The code works. However, the performance benefits of Map are lost, as shown in the following code.

class Scratch {

    @Override
    public int hashCode(a) {
        return 1;  // The wrong way to write 👎}}Copy the code

If you have a hashCode of 1 for every object. A hash collision occurs each time, resulting in all values falling into a hash bucket. At this point, the HashMap will degenerate into a linked list. The time complexity of the query is increased to O (n).

As shown in the figure:

I’m a duplicant

If you do some other “magic trick”, such as making two objects have different equals but the same content as hashcode. Well, all I can say is, good luck.

import java.util.*;

class Scratch {

    private String name;
    private String nickName;

    public Scratch(String name, String nickName) {
        this.name = name;
        this.nickName = nickName;
    }

    public void setName(String name) {
        this.name = name;
    }

    public void setNickName(String nickName) {
        this.nickName = nickName;
    }

    @Override
    public boolean equals(Object o) {
        Scratch o1 = (Scratch) o;
        return (name + nickName).equals(o1.name + o1.nickName); 
    }

    @Override
    public int hashCode(a) {
        return Objects.hash(name,nickName);
    }

    public static void main(String[]args) throws IllegalAccessException, NoSuchFieldException {
        Map<Scratch, String> testMap=new HashMap<>();
        Scratch numberOne = new Scratch("Technician 1".O "water");
        testMap.put(numberOne,"I'm Technician number one.");
        Scratch sameKey = new Scratch("Technician 1".O "water");
        testMap.put(sameKey,"I'm Technician number two."); System.out.println(testMap); }}Copy the code

It runs as follows:

At this point, although the same hashcode would cause performance problems, the result is exactly what we want, updating values based on the same key. But! When I modify the code. You will discover a surprising secret……

  @Override
    public boolean equals(Object o) {
        Scratch o1 = (Scratch) o;
        return (name + nickName+System.currentTimeMillis()).equals(o1.name + o1.nickName); // The wrong way to write 🙅
    }
Copy the code

It runs as follows:

What’s going on? How do I get an extra element? I’ll answer that question :(conan plays in the background)

The following is the procedure for adding an element to a HashMap:

  1. Call the hashCode method to get its hash value, and then locate the bucket.
  2. If there’s no element at that location, we’re just going to store it.
  3. If there’s an element at this location, it’s going to be applied to these two nodes=Comparison andequalsMethod, and update the element if they are the same. If they are different, they are added.

So, the reason is: we set the equals method incorrectly. As a result, each judgment is a new element that will be continually added. Leads to results that are against common sense.

(End of Background music by Conan)

Why does capacity have to be a power of two (which I know doesn’t work, but interviewers keep asking)?

Capacity is a power of two, right? Who decreed that?

We all know that Java collections are pre-packaged and automatically expanded. Most are defined internally with a parameter similar to MAXIMUM_CAPACITY. If DEFAULT_LOAD_FACTOR is exceeded, the capacity is automatically expanded.

But if you know in advance about how much data there is, you can call the constructor to specify its size, bypassing the expansion (which is complicated, after all).

Its source method signature is as follows:

/ * * * Constructs an empty < tt > HashMap < / tt > with the specified initial * capacity and the default load factor (0.75). * *@param  initialCapacity the initial capacity.
 * @throws IllegalArgumentException if the initial capacity is negative.
 */
public HashMap(int initialCapacity)
Copy the code

But what you don’t know is that Java will secretly take powers of two for you internally.

The source code is as follows:


/** * Returns a power of two size for the given target capacity. */
static final int tableSizeFor(int cap){
     int n=cap-1;
     n|=n>>>1;
     n|=n>>>2;
     n|=n>>>4;
     n|=n>>>8;
     n|=n>>>16;
     return(n< 0)?1:(n>=MAXIMUM_CAPACITY)? MAXIMUM_CAPACITY:n+1;
}
Copy the code

What a coquettish operation. For crushing performance. Improves the CPU processing speed. Take all bit operations. And we’re not going to go into that method here, but it will help you to a power of two.

So why do we have to make it a power of two?

Since there are Hash buckets in the Hash table, a power of 2 is the fastest way to quickly find the number of buckets.

See the source code:

 /** * Computes key.hashCode() and spreads (XORs) higher bits of hash * to lower. Because the table uses power-of-two masking, sets of * hashes that vary only in bits above the current mask will * always collide. (Among known examples are sets of Float keys * holding consecutive whole numbers in small tables.) So we * apply a transform that spreads the impact of higher bits * downward. There is a tradeoff between speed, utility, and * quality of bit-spreading. Because many common sets of hashes * are already reasonably distributed (so don't benefit from * spreading), and because we use trees to handle large sets of * collisions in bins, we just XOR some shifted bits in the * cheapest possible way to reduce systematic lossage, as well as * to incorporate impact of the highest bits that would otherwise * never be used in index calculations because of table bounds. */
static final int hash(Object key){
     int h;
     return(key==null)?0:(h=key.hashCode())^(h>>>16);
}
Copy the code

This method takes the object’s hashCode and computes an XOR of 16 bits from its own unsigned right shift to get the bucket number. To solve the hash collision problem.

Java 1.7 issues

Ah… after all this preparation, let’s talk about the Java 1.7 HashMap massacre.

Java Hashmap infinite loop

“Why”

Because Small A accidentally used Hashmap in the multi-threaded environment in the project, the CPU was occupied 100% all the time. After careful investigation, this problem was found. The reasons are as follows:

First, because Hashmap is a non-thread-safe container. So its internal implementation is not compatible with multithreaded environments. This results in an infinite loop state in a multithreaded environment. The steps are as follows:

This is an original Hashmap that triggers the rehash method when a new node is inserted, reassigning the hash buckets.

It is implemented by traversing all nodes and recalculating the hash value. But it’s not orderly.

  1. Step 1: Two threads are executing concurrently. First, they all get a set of all the nodes. This method records the first obtained node element 1 and its next element, element 2. And then in the time zone of thread 1. All the elements are reallocated. The diagram below.

    Note that the order is reversed because the list is inserted in the head node.

  1. Step 2: Thread 1 has passed the CPU time slice before converting the more allocated value into the new table. Now it’s thread 2’s turn to iterate. Then continue with the steps.

  1. Step 3: Once thread 2 has acquired element 1, it will hang element 1 on bucket 4 and retrieve the previously recorded next node, which is element 2. Now we see that element 2 points to element 1, so we hang element 1 on element 2. So it’s a cycle. The result is as follows:

If thread 2 finishes updating the table first, element 3 May be lost.

Now, any time you walk into this list when you get an element, you’re going to cause an infinite loop. The CPU is 100%.

【 Solution 】

The solution is simple. Say it three times:

HashMap is not thread safe!!

HashMap is not thread safe!!

HashMap is not thread safe!!

If you want to use the Map collection in a multi-threaded environment, you can mindlessly use ConcurrentHashMap, which is a thread-safe container. We’ll talk about that later when we introduce multithreading.

conclusion

This article spends a lot of time talking about time complexity, the structure of a HashMap, the consequences of not following conventions, the interviewer question “why must capacity be a power of two”, and the “blood case” of running out of CPU due to the wrong use of HashMap.

In the next part, we’ll look at some of the improvements Java 1.8 has made to HashMaps and the crass Set collections.