Abstract

HashMap questions are common interview questions, and there are many explanations available online. But personal feeling, see no more than their own solid writing a summary of the harvest.

Let’s start with what a hash table is,A Hash table is a data structure that accesses data stored in memory based on a Key. That is, it accesses records by computing a function on key values that maps the data for the desired query to a location in the table, which speeds up lookups. This mapping function is called a hash function, and the array of records is called a hash table.In the hashcode() method of the String class, you can see how this hash is computed:

public int hashCode() {
        int h = hash;
        if (h == 0 && value.length > 0) {
            char val[] = value;

            for (int i = 0; i < value.length; i++) {
                h = 31 * h + val[i];
            }
            hash = h;
        }
        return h;
    }
Copy the code

But note that Hash algorithms have two properties: irreversibility and non-collision

  • Irreversible: You know that X can Hash X, but you know that X can’t Hash X.
  • Conflict-free: you know that X cannot solve for Y so that the hashes of X and Y are equal.

Talking about hashCode brings me to two of the biggest conventions in the Java world: equals and hashCode.Java doc = equals ();

* Indicates whether some other object is "equal to" this one.
     * <p>
     * The {@code equals} method implements an equivalence relation
     * on non-null object references:
     * <ul>
     * <li>It is <i>reflexive</i>: for any non-null reference value
     *     {@code x}, {@code x.equals(x)} should return
     *     {@code true}.
     * <li>It is <i>symmetric</i>: for any non-null reference values
     *     {@code x} and {@code y}, {@code x.equals(y)}
     *     should return {@code true} if and only if
     *     {@code y.equals(x)} returns {@code true}.
     * <li>It is <i>transitive</i>: for any non-null reference values
     *     {@code x}, {@code y}, and {@code z}, if
     *     {@code x.equals(y)} returns {@code true} and
     *     {@code y.equals(z)} returns {@code true}, then
     *     {@code x.equals(z)} should return {@code true}.
     * <li>It is <i>consistent</i>: for any non-null reference values
     *     {@code x} and {@code y}, multiple invocations of
     *     {@code x.equals(y)} consistently return {@code true}
     *     or consistently return {@code false}, provided no
     *     information used in {@code equals} comparisons on the
     *     objects is modified.
     * <li>For any non-null reference value {@code x},
     *     {@code x.equals(null)} should return {@code false}.
     * </ul>
     * <p>
Copy the code

This does not need to translate, can also be seen directly.

HashCode’s convention is the second most important in the Java world:

<p>
     * The general contract of {@code hashCode} is:
     * <ul>
     * <li>Whenever it is invoked on the same object more than once during
     *     an execution of a Java application, the {@code hashCode} method
     *     must consistently return the same integer, provided no information
     *     used in {@code equals} comparisons on the object is modified.
     *     This integerneed not remain consistent from one execution of an * application to another execution of the same application. * <li>If  two objects are equal according to the {@code equals(Object)} * method,then calling the {@code hashCode} method on each of
     *     the two objects must produce the same integer result.
     * <li>It is <em>not</em> required that if two objects are unequal
     *     according to the {@link java.lang.Object#equals(java.lang.Object)}
     *     method, then calling the {@code hashCode} method on each of the
     *     two objects must produce distinct integer results.  However, the
     *     programmer should be aware that producing distinct integer results
     *     for unequal objects may improve the performance of hash tables.
     * </ul>
     * <p>
Copy the code

Translation:

  • The same object must always return the same hashCode.
  • The equals of both objects returns ture and must return the same hashCode, so the hashCode method needs to be overridden.
  • The two objects are not equal and may return the same Hashcode.

The third important convention is the compareTo convention

  • The compareTo() method returns an int that compares the ASCII values of the characters at the corresponding position
  • If the first character is not equal to the first character of the passed argument, the ACSII code difference between the two is calculated, with a negative value indicating that the ASCII code of the former is less than the ASCII code of the latter, a positive value indicating that the ASCII code of the former is greater than the ASCII code of the latter, and equal value returning zero.
  • If the first argument to the string is equal to the one passed in, the comparison continues backwards until one or the other has been compared, comparing the lengths of the two strings.

So ask the first question:

1. How did the structure of hashMap change as Java7 transitioned to Java8?

In Java7, hashMap is the implementation of array + linked list. The advantage of this is that the time complexity of search, insert and delete is O(1), but the fatal defect is the collision of hash buckets. All values correspond to a key, as in this case:

public static void main(String[] args) {
        HashMap<String,String> hashMap = new HashMap<>();
        List<String> list = Arrays.asList("Aa"."BB"."C#");
        for (String s:list
             ) {
            hashMap.put(s,s); System.out.println(s.hashCode()); }}Copy the code

Results:

2112
2112
2112

Process finished with exit code 0
Copy the code

The hash table becomes a linked list and degrades rapidly.

So after Java8, we used the structure of array + linked list + red-black tree to implement HashMap, that is, when the chain expression reaches a certain length, it will be converted into a red-black tree. The specific process will be described in detail later.

2. Why does the initialization capacity of a hashMap have to be a power of 2?

The Java Doc initialization capacity of hashMap has this sentence:

/**
     * The default initial capacity - MUST be a power of two.
     */
    static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16
Copy the code

So we define a hashMap whose initial capacity is not a power of two,

public HashMap(int initialCapacity, float loadFactor) {
        if (initialCapacity < 0)
            throw new IllegalArgumentException("Illegal initial capacity: " +
                                               initialCapacity);
        if (initialCapacity > MAXIMUM_CAPACITY)
            initialCapacity = MAXIMUM_CAPACITY;
        if (loadFactor <= 0 || Float.isNaN(loadFactor))
            throw new IllegalArgumentException("Illegal load factor: " +
                                               loadFactor);
        this.loadFactor = loadFactor;
        this.threshold = tableSizeFor(initialCapacity);
    }
Copy the code

So there’s a tableSizeFor() method,

static final int tableSizeFor(int cap) {
        int n = cap - 1;
        n |= n >>> 1;
        n |= n >>> 2;
        n |= n >>> 4;
        n |= n >>> 8;
        n |= n >>> 16;
        return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
    }
Copy the code

At first glance, it seems not to understand, nothing, take your own example to try again to know the effect of this method. Take 9 for example (>>> represents an unsigned move to the right one bit, <<< represents an unsigned move to the left one bit) :




You get 15 and you get 16, and 16 is bigger than 9 and the closest number to 9 is a power of 2, so that’s what this method does.

So why does a hashMap guarantee an initial capacity of a power of two? Start by asking yourself the question, the range of int is (to), but how do you put hash values into buckets whose array size is tens? The easiest thing to think of is modulo, but there’s a problem: what if the hash value is negative? Array positions do not have negative numbers. And modulo operations on disk are doing subtraction over and over again, which is very inefficient. So how does a HashMap actually work? From JDK1.7, we can see the put method of HashMap:

public V put(K key, V value) {
        if (table == EMPTY_TABLE) {
            inflateTable(threshold);
        }
        if (key == null)
            return putForNullKey(value);
        int hash = hash(key); Int I = indexFor(int I = indexFor(hash, table.length);
        for(Entry<K,V> e = table[i]; e ! = null; e = e.next) { Object k;if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
                V oldValue = e.value;
                e.value = value;
                e.recordAccess(this);
                return oldValue;
            }
        }

 static int indexFor(int h, int length) {
        // assert Integer.bitCount(length) == 1 : "length must be a non-zero power of 2"; // Use the bit operation to get the index of the arrayreturn h & (length-1);
    }
Copy the code

To demonstrate this process, take a random number, using a HashMap of size 32:


What if it’s 30?


You can see that the second to last bit is erased, that is, no matter what the hash value is, this bit will be zero. That brings up the problem of index discontinuity, and there is no guarantee that elements will be consecutively stored in the HashMap. Therefore, to ensure continuity, the size of the HashMap -1 must be 1 in each bit, and therefore the size of the HashMap must be a power of 2.

3. What is the treeization threshold for linked lists?

First of all, we know that if we keep throwing elements into the Map, then when the number of elements in a bucket exceeds a certain number, the list will turn into a red-black tree. So, to find out what the threshold is, look at the put method of HashMap, which is also available online. I’ll give you the answer straight away: intercept a small part of the put method:

if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                            treeifyBin(tab, hash);
                        break;
Copy the code
 /**
     * The bin count threshold for using a tree rather than list for a
     * bin.  Bins are converted to trees when adding an element to a
     * bin with at least this many nodes. The value must be greater
     * than 2 and should be at least 8 to mesh with assumptions in
     * tree removal about conversion back to plain bins upon
     * shrinkage.
     */
    static final int TREEIFY_THRESHOLD = 8;
Copy the code

If the number of elements in the bucket is greater than or equal to 7, enter the count method, then enter the count method:

/**
     * The smallest table capacity for which bins may be treeified.
     * (Otherwise the table is resized if too many nodes in a bin.)
     * Should be at least 4 * TREEIFY_THRESHOLD to avoid conflicts
     * between resizing and treeification thresholds.
     */
    static final int MIN_TREEIFY_CAPACITY = 64;
Copy the code

As you can see, not only must the number of lists exceed 8, but the number of buckets exceed 64 before the conversion from list to binary tree occurs.

4. Following the above questions, why 8, why 64, why 0.75 load factor?

The source code provides an answer to the question why 8 is used:

* with a
     * parameter of about 0.5 on average forThe default resizing * threshold of 0.75, although with a large variance because of * resizing granularity. Ignoring variance, The expected * occurrences of list size k are (exp(-0.5) * POw (0.5, k) / * factorial(k)). The first values are: * * 0: 0.60653066 * 1: 0.30326533 * 2: 0.07581633 * 3: 0.01263606 * 4: 0.00157952 * 5: 0.00015795 * 6: 0.00001316 * 7: 0.00000094 * 8: 0.00000006 * More: less than 1in ten million
Copy the code

Because the probability of the number of elements in the bucket is equal to 0.5Poisson distribution. According to the calculation results, under normal circumstances, the probability that the number of elements in the bucket is 8 is at the level of 10 million, which is very small. If it is above 8, a collision has occurred, and converting lists to red-black trees can prevent performance degradation in time. Another way to say this is, because the average search time of a linked list is n/2, the search time of a binary tree isWhen the number is 8,=3<4, the number will improve the search efficiency, otherwise it is not necessary, the feeling is also quite reasonable.

As for why the other threshold is 64, the reasons are as follows:

In essence, tree is an operation to improve search efficiency and save search time. However, if the number of buckets is too small to reach a certain size, there is no need to tree, just expand. Why 64? I have not seen the relevant analysis for the time being, and I will supplement it when I find it later.

Why 0.75?

The problem was found on Stack OverflowThe answer:

In general, the default load factor (0.75) provides a good compromise between time and space costs. Higher values reduce space overhead, but increase the cost of lookups (reflected in most operations of the HashMap class, including GET and PUT), reducing capacity expansion efficiency. However, if it is too small, the capacity expansion is very frequent and will take up a lot of memory space.

5. How is the expansion of HashMap realized?

In the put method of HashMap (JDk1.7 is more straightforward) you can see:

 public V put(K key, V value) {
        if (table == EMPTY_TABLE) {
            inflateTable(threshold);
        }
        if (key == null)
            return putForNullKey(value);
        int hash = hash(key);
        int i = indexFor(hash, table.length);
        for(Entry<K,V> e = table[i]; e ! = null; e = e.next) { Object k;if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
                V oldValue = e.value;
                e.value = value;
                e.recordAccess(this);
                return oldValue;
            }
        }

        modCount++;
        addEntry(hash, key, value, i);
        return null;
    }
Copy the code

When no identical element is found, a bucket is added to the element.

void addEntry(int hash, K key, V value, int bucketIndex) {
        if((size >= threshold) && (null ! = table[bucketIndex])) { resize(2 * table.length);hash= (null ! = key) ?hash(key) : 0;
            bucketIndex = indexFor(hash, table.length);
        }

        createEntry(hash, key, value, bucketIndex);
    }
Copy the code

In this method of adding buckets, you can see that when the number of buckets exceeds the threshold (load factor * capacity), a new table is created that is twice the size of the original table.

void resize(int newCapacity) {
        Entry[] oldTable = table;
        int oldCapacity = oldTable.length;
        if (oldCapacity == MAXIMUM_CAPACITY) {
            threshold = Integer.MAX_VALUE;
            return;
        }

        Entry[] newTable = new Entry[newCapacity];
        transfer(newTable, initHashSeedAsNeeded(newCapacity));
        table = newTable;
        threshold = (int)Math.min(newCapacity * loadFactor, MAXIMUM_CAPACITY + 1);
    }
Copy the code

With this resizing method, we can see that the mechanism for scaling is to drop the old data into a new table that is twice the size of the old table, and recalculate the index (note that this process can lead to a fatal problem).

6. Why isn’t HashMap thread safe?

Here’s a recommended article on this part of the processCoolshell. Cn/articles / 96…It was very clear:






What is a red-black tree?

First of all, red-black tree is a self-balanced binary search tree, because the traditional binary tree in the special case $will form an approximate linked list tree, which affects the efficiency, so the red-black tree is introduced as a substitute.

Features of red-black trees (from Wikipedia):

A red-black tree is a binary lookup tree where each node has a color attribute, and the color is red or black. In addition to the general mandatory requirements for binary lookup trees, we have added the following additional requirements for any valid red-black tree:

  • 1. The node is red or black.
  • 2. The roots are black.
  • 3. All leaves are black (leaves are NIL nodes).
  • 4. Each red node must have two black children. (There cannot be two consecutive red nodes on all paths from each leaf to the root.)
  • 5. All simple paths from any node to each of its leaves contain the same number of black nodes. Here is an example of a specific red-black tree:



left-handed

Right hand


8. What is the difference between HashMap, LinkedHashMap, HashTable, and TreeMap?

First, each of the four data structures implements the Map interface, but each has its own differences.

The data structure The characteristics of
HashMap HashMap is not thread-safe and does not guarantee order, allowing a null key for at most one record. However, because it obtains data based on the hash value of the data, the search efficiency is very high
LinkedHashMap Null values can be inserted, and data is retrieved according to the order in which they are inserted, so those inserted first are found first. But the problem is that this leads to inefficient searches. Again, it’s not thread safe.
HashTable HashTable cannot store null values. In addition, if you look at the methods and parameters in HashTable, you will find that they are basically labeled with synchornized. This means that HashTable can only be accessed by a single thread at a time, which ensures the safety in the case of multi-threading. But it also brings the problem of slow writing efficiency.
TreeMap TreeMap differs from HashMap in that it is ordered, and the parameters passed to treeMap must implement the Comparable interface, which specifies a sorting method for treeMap, or an error will be reported. The key and value cannot be empty, nor can they ensure security in multi-threaded environment.

9. Can you briefly describe the get(), put() methods of HashMap?

  • Get () method: see source code:
public V get(Object key) {
        Node<K,V> e;
        return (e = getNode(hash(key), key)) == null ? null : e.value;
    }
Copy the code

Get the hash of key, and polymorphic into another get() method. Intercept part of the code:

if (first.hash == hash&& // always check first node ((k = first.key) == key || (key ! = null && key.equals(k))))return first;
            if((e = first.next) ! = null) {if (first instanceof TreeNode)
                    return ((TreeNode<K,V>)first).getTreeNode(hash, key);
                do {
                    if (e.hash == hash&& ((k = e.key) == key || (key ! = null && key.equals(k))))return e;
                } while((e = e.next) ! = null); }Copy the code

Enter the bucket based on the Hash value, return if the first element is the desired value, or search through a linked list or red-black tree.

  • Put () method: : Enter the PUT method first

Note: onlyifAbsent indicates whether the original key-value pair is retained when repeated key-value pairs are inserted. True indicates whether the original key-value pair is retained. False indicates whether the new key-value pair is overwritten. Evict stands for Builder pattern, a type of design pattern. Let’s analyze the code piece by piece:

if ((tab = table) == null || (n = tab.length) == 0)
            n = (tab = resize()).length;
Copy the code

If the table is empty, initialize one.

if ((p = tab[i = (n - 1) & hash]) == null)
            tab[i] = newNode(hash, key, value, null);
Copy the code

If the insertion position is empty, simply new a Node at that position.

if (p.hash == hash&& ((k = p.key) == key || (key ! = null && key.equals(k)))) e = p;Copy the code

If the key and value of the inserted position are repeated, directly overwrite.

 else if (p instanceof TreeNode)
                e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
Copy the code

If it is a tree node, the tree node is inserted.

else {
                for (int binCount = 0; ; ++binCount) {
                    if ((e = p.next) == null) {
                        p.next = newNode(hash, key, value, null);
                        if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                            treeifyBin(tab, hash);
                        break;
                    }
                    if (e.hash == hash&& ((k = e.key) == key || (key ! = null && key.equals(k))))break; p = e; }}Copy the code

If it is a linked list node, insert it as a linked list node.

10. How does HashMap reduce hash collisions, and what other methods can be used to solve hash collisions?

In JDK1.7, the hash value is computed as follows:

 h ^= (h >>> 20) ^ (h >>> 12);
        return h ^ (h >>> 7) ^ (h >>> 4);
Copy the code

At first glance, you might not know what this code is doing, but try two random numbers and simulate the process yourself:

private static int cacIndex(int h, int size) {
        return h & (size - 1);
    }

    public static int cacHashCode(int h) {
        h ^= (h >>> 20) ^ (h >>> 12);
        return h ^ (h >>> 7) ^ (h >>> 4);
    }
Copy the code

Use two random numbers that would logically produce a hash collision:

public static void main(String[] args) {
        System.out.println(cacIndex(212123371, 4));
        System.out.println(cacIndex(121311123, 4));

    }
Copy the code

The results are as follows:

3
3

Process finished with exit code 0
Copy the code

Call the perturbation method:

    public static void main(String[] args) {
        System.out.println(cacIndex(cacHashCode(212123371), 4));
        System.out.println(cacIndex(cacHashCode(121311123), 4));

    }
Copy the code

Results:

0
1

Process finished with exit code 0
Copy the code

Then give the conclusion:

The method of HashMap to reduce hash collisions is to use the perturbation method to make the high and low bits of hashCode participate in the calculation, so as to reduce the probability of hash collisions caused by different high and same low bits of hashCode when the capacity of HashMap is small.

In the meantime, this method is simplified in JDK1.8 to:

return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
Copy the code

Other ways to resolve hash collisions:

  • 1. Open addressing method:

    The development of addressing method is that when a conflict occurs, the value is stored in an empty address by looking for empty space behind it.
  • 2. Chain address method:

    As a HashMap does, when a conflict occurs, the conflicting location evolves into a linked list, and the subsequent values of the conflict are inserted into the list.
  • 3. Rehash method:

    As its name suggests, when a hash collision occurs, another function is used to calculate the hash value until no collision occurs.
  • 4. Establish public overflow zone:

    Instead of inserting the conflicting values into the table, create a new overflow table and place the conflicting values into it.

11. How to create HashMap in multi-threaded situations, and how does it achieve thread safety?

We all know that you can use ConcurrentHashMap to create a thread-safe HashMap in multithreaded situations, but why is it thread-safe?

Take a look at the source code in JDK1.7:

/**
     * Segments are specialized versions of hash tables.  This
     * subclasses from ReentrantLock opportunistically, just to
     * simplify some locking and avoid separate construction.
     */
    static final class Segment<K,V> extends ReentrantLock implements Serializable 
Copy the code

ConcurrentHashMap () ¶ In JDK1.7, we use a Segment lock that inherits from the ReentrantLock class, and then check whether the current thread holds the lock before using the put method.

final V put(K key, int hash, V value, boolean onlyIfAbsent) {
            HashEntry<K,V> node = tryLock() ? null :
                scanAndLockForPut(key, hash, value);
            V oldValue;
Copy the code

And the put () method

public V put(K key, V value) {
        Segment<K,V> s;
        if (value == null)
            throw new NullPointerException();
        int hash = hash(key);
        int j = (hash >>> segmentShift) & segmentMask;
        if ((s = (Segment<K,V>)UNSAFE.getObject          // nonvolatile; recheck
             (segments, (j << SSHIFT) + SBASE)) == null) //  in ensureSegment
            s = ensureSegment(j);
        return s.put(key, hash, value, false);
    }
Copy the code

The value of the volatile keyword is added to ensure thread safety.

 static final class HashEntry<K,V> {
        final int hash;
        final K key;
        volatile V value;
        volatile HashEntry<K,V> next;

        HashEntry(int hash, K key, V value, HashEntry<K,V> next) {
            this.hash = hash;
            this.key = key;
            this.value = value;
            this.next = next;
        }
Copy the code

So in Java8, things are different again:

Java8 abandons the piecewise lock mode, but adopts the mechanism of synchronized keyword and CAS algorithm to achieve thread safety. What is CAS algorithm:Compare and swap (CAS) is a kind of atomic operation, which can be used in multithreaded programming to achieve uninterrupted data exchange operation, so as to avoid the data inconsistency caused by the uncertainty of execution order and the unpredictability of interruption when multithreading simultaneously rewriting a certain data. This operation compares the value in memory with the specified data, and replaces the data in memory with the new value if the value is the same.(From Wikipedia)

Enter the ConcurrentHashMap source code:

Extract some of the source code from the put() method:

synchronized (f) {
                    if (tabAt(tab, i) == f) {
                        if (fh >= 0) {
                            binCount = 1;
                            for (Node<K,V> e = f;; ++binCount) {
                                K ek;
                                if (e.hash == hash&& ((ek = e.key) == key || (ek ! = null && key.equals(ek)))) { oldVal = e.val;if(! onlyIfAbsent) e.val = value;break;
                                }
                                Node<K,V> pred = e;
                                if ((e = e.next) == null) {
                                    pred.next = new Node<K,V>(hash, key,
                                                              value, null);
                                    break; }}}else if (f instanceof TreeBin) {
                            Node<K,V> p;
                            binCount = 2;
                            if ((p = ((TreeBin<K,V>)f).putTreeVal(hash, key, value)) ! = null) { oldVal = p.val;if (!onlyIfAbsent)
                                    p.val = value;
                            }
                        }
Copy the code

As you can see, insertions for the PUT method are performed under synchornized, so only one thread can operate at a time, and for numeric swaps:

static final <K,V> boolean casTabAt(Node<K,V>[] tab, int i,
                                        Node<K,V> c, Node<K,V> v) {
        return U.compareAndSwapObject(tab, ((long)i << ASHIFT) + ABASE, c, v);
    }
Copy the code

The CAS algorithm is used to realize the comparison and then exchange, which ensures the safety of the exchange operation when the value at the current position is empty.


References:

  • HashMap put method source (jdk1.8
  • 2. Mining gold. “MarkDown Inserts a large set of Mathematical Formula Experiments” jump to the source article
  • Vaccines: The Endless Loop of JAVA HASHMAP jump to the source article
  • 4. Zhihu: Jump to the source article by reading HashMap
  • 5. Wikipedia. ASCII jump to source articles
  • 6. Wikipedia. Red Black Tree jump to source article
  • Comics: What is a Red Black Tree? Jump to the source article
  • 8. Jane Book. “In-depth Understanding of CAS Algorithm” jump to the source article
  • HashMap? Interview? Who am I? Where Am I? Jump to the source article
  • 10. Digging for gold. “Understanding the Volatile Keyword once and for all” jump to the source article
  • This code is derived from the hash implementation in HashMap
  • 12.Hollins. “The most thorough hash() analysis of maps in the whole web” jump to the article