Write a HashMap? Oh, that’s so hard. That’s the level of the interview?

I first saw this interview question in the article of an unnamed Offer harvester:

This…… We all know that the data structure of HashMap is array + linked list + red-black tree.

Later, after sorting out some interviews, I found that this question appeared more frequently in kuaishou’s interview, so this question should be analyzed in Kuaishou’s interview question bank. That since the frequent out, certainly can’t be hand tear red black tree — I think the interviewer also mostly tear out, do not tear red black tree, that this question has some help, slowly look down.

Get to know hash tables

A HashMap is a Java implementation of a hash table in a data structure.

Hash table nature

A hash table is also called a hash table. Let’s first look at the definition of a hash table:

A hash table is a data structure that is accessed directly according to the value of a key code.

Like someone to the company looking for old three, the front desk little sister pointed to the corner of the station is.

Simply put, a hash table consists of two elements: a bucket array and a hash function.

  • Bucket array: a row of work positions
  • Hash function: third in the corner

Bucket array

As we may know, there is a basic class of data structures called linear tables, and linear tables are divided into two types, arrays and linked lists.

In a hash table data structure, the data structure that stores the elements is an array, and each cell in the array can be thought of as a Bucket.

If for a number of programmers assigned location: big balls, bear, the cattle, zhang, we observed that the name is distinctive, the last word is digital, we can put it out as a key, the way, can put their assigned to the corresponding number of location, not let it empty first assigned to the work station.

So in this case, what is the time complexity of our search/insert/delete? Obviously, it’s all order one.

But we are not gourd children, the name can not all be called one, two, three, four, five, six, seven and so on, if the new call Nangong Daniel, then how do we allocate him?

This brings us to our second key element, the hash function.

The hash function

We need to create a mapping between the elements and the bucket array, and this mapping is called a hash function, or hash function.

For example, we have a bunch of random names zhu Ge Steel, Liu Huaqiang, Wang Situ, Zhang Quandian… We’re going to have to use the hash function to figure out which station these names should be assigned to.

Hash function construction

A hash function is also called a hash function. If the key of our data element is an integer or can be converted to an integer, we can use these common methods to get the mapped address.

  • Direct addressing

    Map directly to the corresponding array location based on the key, for example, 1232 to the subscript 1232.

  • Numerical analysis method

    Take some number of the key (such as tens and hundreds) as the location of the mapping

  • Square the middle

    Take the middle bits of key squared as the location of the mapping

  • Folding method

    Divide the key into segments with the same number of digits, and then use their summation as the location of the mapping

  • Divisor remainder method

    H (key)=key%p (p<=N), the remainder of the key divided by a positive integer p not larger than the length of the hash table is the hash address, which is the most widely used hash function constructor.

In Java, the Object class provides a default hashCode() method that returns a 32-bit int, which is the memory address of the Object.

However, this integer must be processed, and direct addressing is excluded from the above methods because we cannot build buckets that large.

And the hash address that we finally calculated should be within the length of the bucket array as much as possible, so we chose the division and residuals method.

Hash conflict

Ideally, each data element is hashed into its own bucket array.

However, the reality is not always satisfactory, our space is limited, no matter how well designed hash function can completely avoid hash conflict. The so-called hash conflict is when different keys are computed by the hash function and fall into the same index.

Since there is a conflict, we have to find a way to resolve the conflict, common solutions to resolve the hash conflict are:

Chain address method

Also known as the zipper method, it looks like you pull a linked list from the bucket array, put the elements that have hashed into a linked list, and when you look it up, you just walk back and forth through the list and find the corresponding key.

Open address method

The open address method is simply to find a free location in the bucket array for the conflicting elements.

There are many ways to find a free spot:

  • Line detection method: starting from the conflicting positions, judge whether the next position is free until the free position is found
  • Square probe: starting at the conflicting position X, the first increment1 ^ 2I’m going to increase it a second time2 ^ 2. Until a free position is found
  • Double hash function exploration

Then the hash method

Construct multiple hash functions, and in case of conflict, replace the hash functions until a free position is found.

Establish public overflow areas

Establish a public overflow area to store conflicting data elements in the public overflow area.

So, obviously, the next thing we’re going to do is we’re going to use the chain address method.

Ok, that’s all for hash tables. I believe you have a deep understanding of the nature of hash tables. Next, let’s go to coding time.

A HashMap implementation

The simple HashMap we implemented is named ThirdHashMap and determines the overall design:

  • Hash function: hashCode()+ divisor remainder method
  • Conflict resolution: chain address method

The overall structure is as follows:

Inner node class

We need to define a node to act as a vehicle for concrete data, not only to hold key-value pairs, but also as a single linked list node:

    /** * Node class **@param <K>
     * @param <V>
     */
    class Node<K.V> {
        / / key/value pair
        private K key;
        private V value;

        // List of links
        private Node<K, V> next;

        public Node(K key, V value) {
            this.key = key;
            this.value = value;
        }

        public Node(K key, V value, Node<K, V> next) {
            this.key = key;
            this.value = value;
            this.next = next; }}Copy the code

Member variables

There are four main member variables, where the bucket array serves as the structure to load the data elements:

    // Default capacity
    final int DEFAULT_CAPACITY = 16;
    // Load factor
    final float LOAD_FACTOR = 0.75 f;
    / / the size of the HashMap
    private int size;
    / / bucket array
    Node<K, V>[] buckets;
Copy the code

A constructor

There are two constructor methods, no parameter constructor, the bucket array default capacity, the parameter specifies the bucket array capacity.

    /** * Sets the default size of the bucket array */
    public ThirdHashMap(a) {
        buckets = new Node[DEFAULT_CAPACITY];
        size = 0;
    }

    /** * takes the parameter constructor to specify the bucket array capacity **@param capacity
     */
    public ThirdHashMap(int capacity) {
        buckets = new Node[capacity];
        size = 0;
    }
Copy the code

The hash function

The hash function, which is what we called hashCode() and the array length mod.

    /** * hash function, get address **@param key
     * @return* /
    private int getIndex(K key, int length) {
        // Get the Hash code
        int hashCode = key.hashCode();
        // Mod the bucket array length
        int index = hashCode % length;
        return Math.abs(index);
    }
Copy the code

Put method

I used a putVal method to do the actual logic, because this method is also used for scaling.

General logic:

  • Gets the insertion position of the element
  • Current position is empty, insert directly
  • Position is not empty, conflict occurs, traversing the linked list
  • Overwrite if the element key is the same as the node, otherwise insert a new node into the head of the list
    /** * put method **@param key
     * @param value
     * @return* /
    public void put(K key, V value) {
        // Determine whether capacity expansion is required
        if (size >= buckets.length * LOAD_FACTOR) resize();
        putVal(key, value, buckets);
    }

    /** * stores the element in the specified Node array **@param key
     * @param value
     * @param table
     */
    private void putVal(K key, V value, Node<K, V>[] table) {
        // Get the location
        int index = getIndex(key, table.length);
        Node node = table[index];
        // Insert position is empty
        if (node == null) {
            table[index] = new Node<>(key, value);
            size++;
            return;
        }
        // Insert position is not empty, indicating a conflict, use the chain address method, traversing the list
        while(node ! =null) {
            // If the key is the same, overwrite it
            if ((node.key.hashCode() == key.hashCode())
                    && (node.key == key || node.key.equals(key))) {
                node.value = value;
                return;
            }
            node = node.next;
        }
        // The current key is not in the list, insert the list header
        Node newNode = new Node(key, value, table[index]);
        table[index] = newNode;
        size++;
    }
Copy the code

Expansion method

The general process of expansion:

  • Create a new array with twice the capacity
  • Rehashes the elements of the current bucket array to the new array
  • The new array is set to the bucket array of the Map
    /** ** */
    private void resize(a) {
        // Create an array of buckets with twice the capacity
        Node<K, V>[] newBuckets = new Node[buckets.length * 2];
        // Rehash the current element into the new bucket array
        rehash(newBuckets);
        buckets = newBuckets;
    }

    /** * rehashes the current element **@param newBuckets
     */
    private void rehash(Node<K, V>[] newBuckets) {
        // Map size is recalculated
        size = 0;
        // Brush all the elements of the old bucket array into the new bucket array
        for (int i = 0; i < buckets.length; i++) {
            // If empty, skip
            if (buckets[i] == null) {
                continue;
            }
            Node<K, V> node = buckets[i];
            while(node ! =null) {
                // Put the element into the new arrayputVal(node.key, node.value, newBuckets); node = node.next; }}}Copy the code

The get method

Get method is relatively simple, through the hash function to obtain the address, here I save to whether there is a linked list of judgment, directly find the linked list.

    /** * gets the element **@param key
     * @return* /
    public V get(K key) {
        // Get the address of the key
        int index = getIndex(key, buckets.length);
        if (buckets[index] == null) return null;
        Node<K, V> node = buckets[index];
        // Find the linked list
        while(node ! =null) {
            if ((node.key.hashCode() == key.hashCode())
                    && (node.key == key || node.key.equals(key))) {
                return node.value;
            }
            node = node.next;
        }
        return null;
    }
Copy the code

Complete code:

test

The test code is as follows:

    @Test
    void test0(a) {
        ThirdHashMap map = new ThirdHashMap();
        for (int i = 0; i < 100; i++) {
            map.put("Liu Huaqiang" + i, "Is your melon ripe?" + i);
        }
        System.out.println(map.size());
        for (int i = 0; i < 100; i++) {
            System.out.println(map.get("Liu Huaqiang"+ i)); }}@Test
    void test1(a) {
        ThirdHashMap map = new ThirdHashMap();
        map.put("Liu Huaqiang 1"."Dude, are you ripe for this melon?");
        map.put("Liu Huaqiang 1"."You this melon is familiar I certainly want!");
        System.out.println(map.get("Liu Huaqiang 1"));
    }
Copy the code

You can take a run and see the results.

conclusion

Well, here we have a simple HashMap implementation, and now the interview quick hand is no longer afraid of handwritten HashMap.

Interviewer: Really? I don’t believe it. I want you to write a red and black tree…

Of course, we also find that the O(1) time complexity operation of HashMap is in the case of fewer conflicts, and the simple hash mod is definitely not the optimal hash function. After a conflict, the linked list is pulled too long, which also affects performance; We also have thread-safety issues with capacity expansion and PUT…

However, in reality we do not have to consider so much, because Li Master has helped us write, we just call the end.

In the next article, we will enter the HashMap by Li Master in the form of interview line!

Like, pay attention to not get lost, let’s see you next time!



Reference:

[1]. Data Structures and Algorithms

[2]. Constructing hash function method

[3].ACM Gold Medal contestant explaining LeetCode algorithm “Hash”


Article first individual public number: three points evil, welcome attention!