preface

Today, I went to the interview, and the interviewer suddenly asked me if I would write HashMap by hand. I wrote a simple HashMap by hand before my deathless interview, so it took me only 5 minutes to create it and the interviewer called me an expert.

I believe that many of you have read the source code for many times. Most of you can say that you are very familiar with HashMap, but if the interviewer suddenly gives you such a hand, I believe that many people will still be confused. So let’s get this straight for everyone. After mastering hand-written HashMap, you will tear the interviewer by hand.

In addition to hashmap, I also sorted out a lot of classic interview questions and made it into a PDF book. The content is not much, but it is all dry goods. If you need it, you can click here to get it.Internet company Java interview core knowledge

HashMap is one of the most common data structures in Java, and it’s basically a “must have” question in an interview. It realizes efficient access of key-value pairs based on “K-V” form. Before JDK1.7, HashMap was implemented based on array + linked list. After 1.8, red-black tree was added to the underlying implementation of HashMap to improve search efficiency.

The HashMap calculates the index, the location in the array, based on the key stored in the key-value pair. When a hash conflict occurs, that is, when different keys compute the same index, the HashMap generates a linked list at the corresponding location. When the length of the list exceeds 8, the list is converted to a red-black tree.

Before writing a HashMap by hand, let’s discuss a trivia question: how many comparisons do we have to make, on average, between arrays, linked lists, and red-black trees when looking for a value by key in a HashMap?

When looking in an array, we can directly calculate the key’s position in the array using its Hashcode, with a comparison count of 1

When searching in a linked list, the keys of each node are compared according to the next reference, and the average number of list nodes of length N is N /2

When looking up in a red-black tree, because of the property of red-black trees, the average number of comparisons in a red-black tree with n nodes is log(n).

We mentioned earlier that the list is treify when the length exceeds 8, precisely because n=8, which is the threshold of log(n) < n/2. When n<6, log(n) > n/2, the red-black tree is untreify. In addition, we can see that the most important thing to improve the efficiency of HashMap is to minimize the generation of linked lists, or minimize the length of linked lists, avoid hash conflicts, and reduce the number of key comparisons.

Write a HashMap

Define a Map interface

You can also use java.util.map in Java

public interface MyMap<K,V> {

    V put(K k, V v);

    V get(K k);

    int size();

    V remove(K k);

    boolean isEmpty();

    void clear();
}
Copy the code

Then write a MyHashMap class that implements the interface and implements the methods inside.

Member variables

    final static int DEFAULT_CAPACITY = 16;
    final static float DEFAULT_LOAD_FACTOR = 0.75f;

    int capacity;
    float loadFactor;
    int size = 0;

    Entry<K,V>[] table;
Copy the code
class Entry<K, V>{ K k; V v; Entry<K,V> next; public Entry(K k, V v, Entry<K, V> next){ this.k = k; this.v = v; this.next = next; }}Copy the code

Table is the underlying array. The Entry class holds the “K-V” data. The next field indicates that it might be a linked list node.

A constructor

public MyHashMap(){
    this(DEFAULT_CAPACITY, DEFAULT_LOAD_FACTOR);
}

public MyHashMap(int capacity, float loadFactor){
    this.capacity = upperMinPowerOf2(capacity);
    this.loadFactor = loadFactor;
    this.table = new Entry[capacity];
}
Copy the code

UpperMinPowerOf2 is used to get the lowest power greater than capacity. In HashMap, the developers use a more sophisticated bit operation to accomplish this function, which is much more efficient.

private static int upperMinPowerOf2(int n){
    int power = 1;
    while(power <= n){
        power *= 2;
    }
    return power;
}
Copy the code

Why does HashMap capacity have to be a power of 2? This is to facilitate rehashing of existing elements when arrays in the HashMap are expanded.

Put method

@override public V put(K K, V V) {hashCode hash int index = k. hashcode () % table.length; Entry<K, V> current = table[index]; // If (current! = null){// If there is an equal key in the list, replace it and return the old value. = null){ if(current.k == k){ V oldValue = current.v; current.v = v; return oldValue; } current = current.next; } table[index] = new Entry<K, V>(K, V, table[index]); size++; return null; } // table[index] = new Entry<K, V>(K, V, null); size++; return null; }Copy the code

In the PUT method, we construct an Entry object from the incoming k-v value, and then determine where it should be placed in the array. Recall our earlier assertion:

The most important thing to do to improve the efficiency of a HashMap is to avoid creating linked lists as much as possible, or to minimize their length

To do this, we need Entry objects to be spread out as evenly as possible in the array table, and the index should not exceed the length of the table. HashMap also uses a more efficient method of hashing keys with the & operation, which is available in the source code for HashMap.

If an element exists at table[index], a linked list will be formed. We first iterate through the list (length 1 is also considered a list). If there is a key that is equal to the one we saved, we replace it and return the old value. If not, the new node is inserted into the linked list. There are two ways to insert linked lists: head and tail. With tail interpolation, we need to traverse the list and insert the new node at the end; With header interpolation, we simply point the reference to table[index] to the new node, and then the next reference to the new node to the node where table[index] was, as in the HashMap.

If table[index] is empty, insert a new Entry directly.

The get method

@Override public V get(K k) { int index = k.hashCode() % table.length; Entry<K, V> current = table[index]; // iterate over the list while(current! = null){ if(current.k == k){ return current.v; } current = current.next; } return null; }Copy the code

When we call get, we calculate the index of the key based on the hashcode of the key, and then go directly to the corresponding position in the table. If there is a linked list, we iterate over it.

The remove method

@Override public V remove(K k) { int index = k.hashCode() % table.length; Entry<K, V> current = table[index]; If (current. K == k){table[index] = null; size--; return current.v; } // Delete nodes from the list while(current. Next! = null){ if(current.next.k == k){ V oldValue = current.next.v; current.next = current.next.next; size--; return oldValue; } current = current.next; } return null; }Copy the code

When a node is removed, if the index corresponding to the key does not form a linked list, set it to NULL. If a linked list exists, we need to point the next reference of the precursor node of the target node to the successor node of the target node. Since our Entry node has no previous reference, we operate based on the precursors of the target node, namely:

current.next = current.next.next;
Copy the code

Current represents the precursor node of the node we want to delete.

There are some simple size(), isEmpty() methods that are too simple to go over here. At this point, our custom MyHashMap is almost ready to use.

The last

There are a few points we haven’t addressed regarding the implementation of HashMap:

  1. Capacity expansion problem. In a HashMap, when the number of stored elements exceeds a threshold (threshold = capacity * loadFactor), the HashMap resize (resize) occurs and all internal elements are rehash to minimize hash collisions. In our MyHashMap, although the load factor is defined, it is not used. Capacity is fixed. Although data can still be stored all the time due to the existence of linked lists, the query efficiency will decrease sharply when the amount of data increases.
  2. Treeify. As we mentioned earlier, when the number of linked list nodes exceeds 8, the list is converted to a red-black tree for higher query efficiency. But this is not implemented in our code.
  3. Null value determination. In a HashMap, null-valued keys are allowed. When a key is null, the hash() method in the HashMap returns a fixed 0, that is, a value with a null key is fixed at table[0]. This is easy to implement. If null is stored in MyHashMap, it will be reported directlyNullPointerExceptionThe exception.
  4. Some other questions.

I believe that after you have completed the implementation of HashMap, you will have a deeper understanding of its principle. If there are any mistakes or loose points in this article, you are welcome to point them out. We will solve these problems step by step.


Previous hot articles:

  • Summary of Java basic knowledge
  • Performance Tuning Series topics (JVM, MySQL, Nginx, and Tomcat)
  • From being kicked out to 5 offers of 30K+, I have come a long way
  • 100 Java project parsing, with source code and learning documentation!

end