How does HashMap work?

The question can be asked in the following series

  • Have you read the HashMap source code and know how it works?
  • Why array plus linked list?
  • What else do you know about resolving hash conflicts?
  • Can I use LinkedList instead of array structure?
  • Why not use a HashMap instead of a LinkedList when you can?

1. Have you read the HashMap source code and know how it works?

To answer this question, well, of course you have to look at the HashMap source code. As for the principle, the following diagram is clear:


A HashMap uses an Entry array to store key-value pairs. Each key-value pair forms an Entry entity. The Entry class is actually a one-way linked list structure with a Next pointer that can be connected to the Next Entry entity.

In JDK1.8, if the list length is greater than 8, the list will turn into a red-black tree!

2. Why array + linked list?

Arrays are used to determine the location of buckets, modulo the length of the array using the hash value of the element’s key.

A linked list is used to resolve hash conflicts. When a hash value is the same, a linked list is formed at the corresponding position in the array. Ps: The hash value here does not refer to hashcode, but to xor the high and low hashcode sixteen bits. For why, read on.

3. What else do you know about resolving hash conflicts?

There are four well-known ones: (1) open addressing, (2) chain address, (3) rehash, and (4) public overflow region

Ps: we are interested in expanding, their search to understand, this is not expanding!

4. Can I use LinkedList instead of array structure?

So let me explain a little bit here, but what this means is that the source code looks like this

Entry[] table = new Entry[capacity];
Copy the code


Ps: Entry is a linked list node.

So let me write it like this

List<Entry> table = new LinkedList<Entry>();  
Copy the code


Is it feasible?

The obvious answer is yes.

5. Why not use a HashMap instead of a LinkedList?

Because arrays are the most efficient!

In a HashMap, the position of the bucket is determined by modulo the array length with the hash of the element’s key. At this point, we have the position of the bucket. Clearly arrays are more efficient than linkedLists.

So ArrayList, which is an array at the bottom, is also a quick way to find it, so why not ArrayList?

(Smoke brother wrote here, can not help but feel that I really have an idea, I asked myself to death, fortunately, I suddenly came up with the answer)

Because the basic array structure is adopted, the expansion mechanism can be defined by itself. Array expansion in HashMap is just the second power of 2, which is highly efficient in modulus calculation.

ArrayList expands by 1.5 times, so why ArrayList expands by 1.5 times is not explained in this article.


Ii. Under what conditions will HashMap be expanded?

The question can be asked in the following series

  • Under what conditions does HashMap expand?
  • Why is expansion 2 to the NTH power?
  • Why is it that we have 16 bits higher or 16 bits lower before we do the modulo operation?

1. Under what conditions can HashMap be expanded?

If the bucket is full (exceeding load factor*current capacity), resize is required.

The load factor is 0.75 to minimize hash collisions

Current Capacity indicates the current array size.

2. Why is expansion a power of 2?

In order to efficiently access HashMap, it is necessary to minimize collisions, that is, to distribute data evenly as far as possible. Each linked list has roughly the same length. This implementation is based on the algorithm to store data in which linked list; So this algorithm is actually taking modulus, hash%length.

But, you know, it’s not as fast as displacement.

Therefore, the source code has been optimized hash&(Length-1).

Hash %length==hash&(length-1)

So why is it 2 to the n?

Because 2 to the NTH power is actually 1 followed by n 0’s. 2 to the NTH power minus 1 is actually n ones.

For example, if the length is 8, 3&(8-1)=3, and 2&(8-1)=2, they don’t collide at all.

And at length 5, 3 times 5 minus 1 is 0, 2 times 5 minus 1 is 0, they’re all at 0, they’re collides.

So, to make sure that the volume is 2 to the n, is to make sure that when I do (Length-1), each digit is equal to 1, which is equal to 1111… 1111111 performs and operations.

3. Why do we need 16 bits higher or 16 bits lower before taking modulo operation?

Let me show you the hash method in JDk1.8. 1.7 is more complicated, I will not watch.


Hashmap does this only to reduce the chance of hash collisions.

For example, when our length is 16, the hashCode (the hashCode corresponding to the key of the string “abcabcabcabcabc”) pairs (16-1) with the operation. For a hashCode generated by multiple keys, as long as the last four bits of the hashCode are 0, the result will be 0 regardless of how the high bits change.

As shown in the figure below

After adding the “perturbation function” of 16 bits higher or 16 bits lower, the result is as follows


You can see: Before perturbation function optimization: 1954974080%16 = 1954974080&(16-1) = 0 After perturbation function optimization: 1955003654%16 = 1955003654&(16-1) = 6 Obviously, the probability of collision is reduced.


How to get/put a hashmap

The question can be asked in the following series

  • Do you know what a put element looks like in a HashMap?
  • Do you know what a get element in a HashMap looks like?
  • What other hash algorithms do you know?
  • What’s the implementation of hashCode in String? (Many big factories have asked this question)

1. Do you know how to put elements in hashMap?

Hash key hashCode() to calculate index;

If you don’t hit it, you put it in the bucket;

If they collide, they should be stored behind buckets in a linked list.

If the collision causes the list to be too long (greater than or equal to TREEIFY_THRESHOLD), convert the list to a red-black tree (changed in JDK1.8);

Replace old value if the node already exists

If the bucket is full (exceeding load factor*current capacity), resize is required.

2. Do you know how to get elements in a HashMap?

Hash key hashCode() to calculate index;

If a direct hit is made in the first node in the bucket, it is returned;

If there is a conflict, key.equals(k) is used to find the corresponding Entry;

  • If it is a tree, check key.equals(k) in the tree, O(logn);
  • If it is a linked list, it is searched through key.equals(k), O(n).

3. What other hash algorithms do you know?

So let’s talk about what a hash algorithm does, but a hash function is a mapping from a large range to a small range. The goal of mapping a large scope to a small scope is often to save space and make data easy to save.

MurmurHash, MD4, MD5, etc

4. What is the implementation of hashCode in String? (This question is very frequent)

public int hashCode() {
    int h = hash;
    if (h == 0 && value.length > 0) {
        char val[] = value;
        for (int i = 0; i < value.length; i++) {
            h = 31 * h + val[i];
        }
        hash = h;
    }
    return h;
}
Copy the code

The method of calculating hashCode in the String class is relatively simple, which is to calculate the ASCII value of each character with the weight of 31, and use natural overflow to equivalent modulus.

The hashing formula can be calculated as s[0]31^(n-1) + S [1]31^(n-2) +… + s[n-1]

So why is 31 prime?

Mainly because 31 is an odd prime number, so 31* I =32* I – I =(I <<5)-i, this displacement and subtraction combined calculation is much faster than the general operation.


Why is the hashmap changed to a red-black tree when the number of linked list elements exceeds 8?

The question can be asked in the following series

  • Do you know what hashmap has changed in JDK1.8?
  • Why not just use red-black trees when resolving hash conflicts? Instead of using a linked list and then switching to a red-black tree, right?
  • Instead of a red-black tree, can I use a binary search tree?
  • So why is the threshold 8?
  • When does a linked list degenerate into a linked list after it turns into a red-black tree?


1. Do you know what hashmap has changed in JDk1.8?

  • Change from array + list structure to array + list + red-black tree.
  • Optimized hash algorithm for high-order operations: h^(h>>>16)
  • After the expansion, the elements are either in the same position, or moved to a power of 2 in the same position, and the list order remains the same.

The last one is important because hashMap will no longer have an infinite loop in 1.8.

2. Why not just use red-black trees when resolving hash conflicts? Instead of using a linked list and then switching to a red-black tree, right?

Because a red-black tree needs to do left rotation, right rotation, color change to maintain balance, whereas a single-linked list doesn’t.

When the number of elements is less than 8, the linked list structure can ensure the query performance. When the number of elements is larger than 8, red-black trees are needed to speed up the query, but the efficiency of new nodes becomes slow.

Therefore, if you start with a red-black tree structure, with too few elements and slow addition, you will definitely waste performance.

3. Can I use binary search tree instead of red-black tree?

You can. But binary lookup trees can be a linear structure in special cases (as with the original list structure, which causes deep problems), and traversal lookup can be very slow.

4. So why is the threshold 8?

Don’t know, wait for the JDK author to answer.

All the answers you can find on the Internet are bullshit.

I posted a random answer from Niuke, as shown in the picture below


See the bug? The intersection is 6.64, right? The intersection is definitely 4, okay.

Log4 = 2, 4/2 = 2.

The author of the JDK chose 8. He must have gone through rigorous calculations and decided that, at length 8, it would be better to switch to a red-black tree and maintain its balanced overhead rather than guarantee the lookup overhead of the linked list structure.

5. When does a linked list degenerate into a linked list after it becomes a red-black tree?

When it is 6, it is converted to a linked list. A difference of 7 prevents frequent conversions between lists and trees. For example, if the number of linked lists is more than 8, the linked list will be converted into a tree structure; if the number of linked lists is less than 8, the tree structure will be converted into a linked list. If a HashMap keeps inserting and deleting elements, and the number of linked lists is around 8, it will frequently convert tree to linked list and linked list to tree, and the efficiency will be very low.


HashMap concurrency problem?

The question can be asked in the following series

  • What’s wrong with HashMap in a concurrent programming environment?
  • Are these issues still present in jdk1.8?
  • How do you usually solve these problems?
What’s wrong with HashMap in a concurrent programming environment?

  • (1) Multi-threading capacity expansion, caused by the problem of dead loop
  • (2) Multithreading put may cause element loss
  • (3) When a non-null element is put, it is null
Are these issues still present in jdk1.8?

In jdk1.8, the dead-loop problem has been solved. The other two problems remain.

How do you usually solve these problems?

Thread-safe collection classes such as ConcurrentHashmap, Hashtable, etc.


What do you usually use as a HashMap key?

The question can be asked in the following series

  • Can a key be Null?
  • What do you usually use as a HashMap key?
  • What’s wrong with me using a mutable class as a HashMap key?
  • How would you implement a custom class as the key of a HashMap?

1. Can keys be Null?

It must. When the key is null, the hash algorithm evaluates the final value to 0, which is the first position in the array.


2. What do you usually use as a HashMap key?

Immutable classes such as Integer and String are commonly used as HashMap keys, and String is most commonly used.

  • (1) Because the string is immutable, hashCode is cached when it is created and does not need to be recalculated. This makes strings suitable as keys in maps, and strings are processed faster than other key objects. This is why keys in a HashMap tend to use strings.
  • (2) Because the equals() and hashCode() methods are used when retrieving objects, it is important that the key object overrides them properly. These classes have already overridden hashCode() and equals() methods in a very formal way.


3. What’s wrong with using mutable classes as HashMap keys?

Hashcode can change so that values put in cannot get out, as shown below

HashMap<List<String>, Object> changeMap = new HashMap<>();
List<String> list = new ArrayList<>();
list.add("hello");
Object objectValue = new Object();
changeMap.put(list, objectValue);
System.out.println(changeMap.get(list));
list.add("hello world"); System.out.println(changemap.get (list));Copy the code

The output values are as follows

java.lang.Object@74a14482
null
Copy the code


4. How would you implement a custom class as the key of a HashMap?

This question examines two points of knowledge

  • What should I notice when overriding hashcode and equals methods?
  • How to design an immutable class

For problem one, remember the following four principles

(1) If two objects are equal, hashCode must be equal

(2) The two objects are not equal, and hashCode is not necessarily equal

(3) Hashcode is equal. The two objects are not necessarily equal

(4) Hashcode is not equal, two objects must not be equal

For question two, remember how do you write an immutable class

(1) Add final modifier to class to ensure that class is not inherited.

If a class can be inherited, it breaks the immutability mechanism of the class. As long as the inheriting class overrides the methods of the parent class and the inheriting class can change the values of member variables, there is no guarantee that the current class will be mutable once the subclass appears as the parent.

(2) Ensure that all member variables must be private and final

In this way, member variables are guaranteed to be immutable. However, this step is not enough because it is possible to change the value of an object member variable externally. So point 4 makes up for that.

(3) Do not provide methods to change member variables, including setters

Do not change the value of a member variable through another interface, breaking immutable properties.

(4) Initialize all members through the constructor to perform deep copy.

If the object passed in by the constructor is assigned directly to a member variable, changes to the passed object can still result in changes to the value of the internal variable. Such as:

public final class ImmutableDemo {  
    private final int[] myArray;  
    public ImmutableDemo(int[] array) {  
        this.myArray = array; // wrong  
    }  
}
Copy the code

This method does not guarantee immutability. MyArray and array point to the same memory address. Users can change the value of myArray outside of ImmutableDemo by changing the value of the array object.

To ensure that the internal values are not modified, a deep copy can be used to create a new memory to hold the incoming values. Do it right:

public final class MyImmutableDemo { private final int[] myArray; public MyImmutableDemo(int[] array) { this.myArray = array.clone(); }}Copy the code

(5) In the getter method, do not return the object itself directly, but clone the object and return a copy of the object

This method is also to prevent the object leakage, prevent the internal mutable member object through the getter to operate directly on the member variable, resulting in the change of the member variable.

Seven, finally

Welcome to pay attention to my public number [programmer chasing wind], the article will be updated in it, sorting out the data will be placed in it.