This article has been authorized to be reprinted by guolin_blog, an official wechat account.

This article explains the content of Android HashMap source analysis.

The source code for HashMap analyzed in this article is based on the Android SDK (version 28).

It should be noted that the Android SDK 28 and JDK 1.8 have optimized the underlying implementation of HashMap, such as introducing red-black tree data structure and capacity expansion optimization.

An overview of the

The UML class diagram for HashMap looks like this:

HashMap is a Map interface based on a hash table implementation. This implementation provides all optional mapping operations and allows empty keys and values, noting that at most one record is allowed to have a null key and multiple records to have a null value. It does not guarantee that the order of the mapping remains constant over time.

The HashMap provides constant time performance for the basic operations (GET and PUT), assuming that the hash function distributes the elements correctly in buckets. The time required to iterate over the collection view is proportional to the capacity of the HashMap instance (number of buckets) plus its size (number of key-value maps). Do not set the initial capacity too high (or the load factor too low) if you have performance requirements for the iteration.

A HashMap instance has two parameters that affect its performance: initial capacity and load factor. Capacity is the number of buckets in the hash table. The initial capacity is the capacity when the hash table is created. The load factor is a measure of how full a hash table is allowed to be before its capacity is automatically increased. When the number of entries in the hash table exceeds the product of the current capacity and load factor, the hash table is rehashed (that is, the internal data structure is rebuilt), which is called capacity expansion, so that the number of buckets in the hash table is approximately double.

In general, the default load factor (0.75) provides a good trade-off between time and space costs, and it is not recommended to change this value in most cases. If set to a higher value, you can reduce the space overhead, but increase the cost of look-up (reflected in most operations of the HashMap class, including get and PUT methods). Threshold is the number of nodes with the maximum amount of data that a HashMap can contain. When setting the initial capacity, the maximum number of entries in the map and its load coefficient should be taken into account to reduce The Times of expansion. If the initial capacity is greater than threshold divided by the load factor, no rehash operation will occur, that is, no expansion will occur.

If you store many mappings in a HashMap instance, creating it in a sufficiently large capacity will store the mappings more efficiently than having them perform automatic rehashing to grow the table as needed. Note that using multiple keys in the same HashCode will definitely degrade the performance of any hash table. To improve the impact, this class can use the comparison order between keys when the keys are comparable.

Note that HashMap is thread-unsafe. If multiple threads concurrently access a HashMap, and at least one thread makes structural changes to it (structural changes are adding or removing one or more mappings, and merely changing values associated with keys that the instance already contains is not structural changes), then it must be synchronized externally. The Collections synchronizedMap can be used to make hashMaps thread-safe, or ConcurrentHashMap can be used.

All collection view methods of the HashMap class return iterators that fail quickly: If any time after the iterator to create mapping structure is modified, except by the iterator my own way to remove, by any means, will throw ConcurrentModificationException iterator. Thus, in the face of concurrent modifications, iterators fail quickly and cleanly, rather than risk arbitrary, uncertain behavior at an indefinite time in the future.

Note that the fast failure behavior of iterators is not guaranteed because, in general, it is impossible to make any hard guarantees when there are asynchronous concurrent changes. Fail fast iterator throw ConcurrentModificationException as best as you can. Therefore, it would be wrong to write a program that relied on this exception to ensure its correctness, and the fast failure behavior of iterators should be used only to detect errors.

Source code analysis

Source code for HashMap:

Fields and Nodes (Node)

HashMap (HashMap)

// HashMap.java
// Serialize the version number
private static final long serialVersionUID = 362498820763181265L;

// The default initial capacity (which must be a power of 2) is 1 moved 4 bits to the left, which is 16
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4;

If any of the argument constructors implicitly specify a larger value, it will be used for comparison, and the value must be a power of 2 and less than 1 moved 30 bits left, that is, 1073741824
static final int MAXIMUM_CAPACITY = 1 << 30;

// Load factor used when not specified in the constructor, which has a value of 0.75F
static final float DEFAULT_LOAD_FACTOR = 0.75 f;

Bin is converted to a tree when an element is added to bin that has at least this many nodes. The value must be greater than 2 and should be at least 8 to conform to the assumption that the tree is converted to a normal box when shrunk
static final int TREEIFY_THRESHOLD = 8;

// The box count threshold for unquery (split) storage boxes during resizing operations should be less than TREEIFY_THRESHOLD and at most 6 in order to detect shrinkage during removal
static final int UNTREEIFY_THRESHOLD = 6;

// The container becomes the minimum table size for the tree (otherwise, if there are too many nodes in the container, the table size will be resized). Its value is 64, which should be at least 4 times TREEIFY_THRESHOLD to avoid conflicts between resizing and the tree resizing threshold
static final int MIN_TREEIFY_CAPACITY = 64;

// Table, initialized at first use and resized as needed, always a power of 2 when allocated (in some operations, we also allow length zero to allow a bootstrap mechanism that is not currently needed)
transient Node<K,V>[] table;

Note that AbstractMap fields are used for keySet and VALUES methods
transient Set<Map.Entry<K,V>> entrySet;

// The number of key-value mappings contained in this mapping
transient int size;

// This field is used to make iterators on the collection view of the HashMap fail quickly. Structural modifications are those that change the number of maps in the HashMap or modify its internal structure (e.g., rehash).
transient int modCount;

// The maximum number of nodes that a HashMap can hold, the next resized value (capacity multiplied by load factor), which is the limit of nodes that can hold, is described by the Java documentation as correct at serialization, and holds the initial array capacity if no table array is allocated. Or the value of DEFAULT_INITIAL_CAPACITY is zero
int threshold;

// Hash table load coefficient
final float loadFactor;
Copy the code

Here is an important table array of type Node

[]. The initial length of the table array is 16. The default load factor is 0.75F. Threshold is the number of nodes with the maximum amount of data that a HashMap can contain. The formula is as follows: Threshold = loadFactor * length, loadFactor is the loadFactor, and length is the length of the array. That is, after the length of the array is defined, the larger the loadFactor is, the more nodes can be loaded. As mentioned above, when the number of nodes exceeds this value, The HashMap expands, doubling its capacity
,v>

The length of the table array must be a power of 2, that is to say, it must be a composite number. This is an unconventional design. The conventional design is to design the size of the bucket as a prime number (prime number).

Let a hash function be H(x) = x % n; When n is a composite number, such as a power of 2, such as 2 to the third power, which is 8, as shown in the following example, the basic data type is int, that is, the bits are 32 bits:

4 (decimal) = 00000000 00000000 00000000 00000100 (binary)

12 (decimal) = 00000000 0000000 0000000 00001100 (binary)

20 (decimal) = 00000000 00000000 00000000 00010100 (binary)

28 (decimal) = 00000000 00000000 00000000 00011100 (binary)

Call hash function H(x) :

H(4) = 4 % 8 = 4

H(12) = 12 % 8 = 4

H(20) = 20 % 8 = 4

H(28) = 28 % 8 = 4

It can be found that the hash function H(x) has the same value regardless of the fourth digit (from right to left), that is, the digits in the left direction from the fourth digit do not participate in the operation of the hash function H(x), which cannot reflect the characteristics of X, thus increasing the probability of conflict, that is to say, taking composite numbers will increase the probability of conflict.

We can try taking a prime number, such as 3, and call H(x) with the aforementioned 4, 12, 20, and 28 respectively:

H(4) = 4 % 3 = 1

H(12) = 12 % 3 = 0

H(20) = 20 % 3 = 2

H(28) = 28 % 3 = 4

We can see that the hashing function H(x) has different values, which means that taking prime numbers reduces the chance of collisions.

An example of a bucket size designed to be prime is Hashtable, which started with a bucket size of 11, but cannot be guaranteed to be prime after expansion. The main purpose of this unconventional design is to optimize the modulus and expansion, and to reduce conflicts, HashMap adds high participation operations when determining the position of the hash bucket index.

Static inner class Node: static inner class Node: static inner class Node

// HashMap.java
static class Node<K.V> implements Map.Entry<K.V> {
    // Locate the array index position
    final int hash;
    / / key
    final K key;
    / / value
    V value;
    // The next node in the list
    Node<K,V> next;

    Node(int hash, K key, V value, Node<K,V> next) {
        this.hash = hash;
        this.key = key;
        this.value = value;
        this.next = next;
    }

    public final K getKey(a)        { return key; }
    public final V getValue(a)      { return value; }
    public final String toString(a) { return key + "=" + value; }

    public final int hashCode(a) {
        return Objects.hashCode(key) ^ Objects.hashCode(value);
    }

    public final V setValue(V newValue) {
        V oldValue = value;
        value = newValue;
        return oldValue;
    }

    public final boolean equals(Object o) {
        if (o == this)
            return true;
        if (o instanceofMap.Entry) { Map.Entry<? ,? > e = (Map.Entry<? ,? >)o;// Check whether key and value are equal
            if (Objects.equals(key, e.getKey()) &&
                Objects.equals(value, e.getValue()))
                // Return true if key and value are equal
                return true;
        }
        // If key or value is not equal, return false
        return false; }}Copy the code

Node is a static inner class of HashMap that implements the Map.Entry

interface and is essentially a mapping (key-value pair).
,v>

A HashMap uses hash tables to store data. There are four ways a hash table can resolve hash conflicts, which will be explained in more detail later in the topic. A HashMap uses the chained address method to resolve hash conflicts. In simple terms, it is a combination of arrays and linked lists. First, call the key’s hashCode method to get the hash value (this method is applicable to every Java object), and then use the last two steps of the hash algorithm (high level operation, modular operation) to locate the corresponding storage location of the key value pair. If the two keys locate to the same storage location, it indicates that a hash collision has occurred. The more evenly distributed the results of hash algorithm are, the lower the probability of hash collision is, and the higher the access efficiency of Map is.

If the hash bucket array is large, the probability of hash collisions is relatively low even if the results of the bad hash algorithm are relatively evenly distributed. If the hash bucket array is very small, even if the calculation results are relatively good hash algorithm is not spread evenly, a hash collision probability also is relatively high, so it need to trade-off between time and space costs, can be a good hash algorithm and the expansion mechanism to achieve the hash bucket array takes up less space, at the same time a hash collision probability is low.

A constructor

HashMap constructor, source code:

// HashMap.java
Construct an empty HashMap with the specified initial capacity and load factor
public HashMap(int initialCapacity, float loadFactor) {
    if (initialCapacity < 0)
        If the initial capacity specified is less than 0, IllegalArgumentException is thrown
        throw new IllegalArgumentException("Illegal initial capacity: " +
                                           initialCapacity);
    if (initialCapacity > MAXIMUM_CAPACITY)
        // If the specified initial capacity is greater than the maximum capacity, the maximum capacity is taken
        initialCapacity = MAXIMUM_CAPACITY;
    if (loadFactor <= 0 || Float.isNaN(loadFactor))
        Throw an IllegalArgumentException if the loadfactor is less than or equal to zero, or if it is not a single-precision float
        throw new IllegalArgumentException("Illegal load factor: " +
                                           loadFactor);
    // assigns the specified loadFactor to the loadFactor member variable
    this.loadFactor = loadFactor;
    // Call tableSizeFor and pass in the specified initial capacity. Assign the resulting value to the member variable threshold
    this.threshold = tableSizeFor(initialCapacity);
}

public HashMap(int initialCapacity) {
    // Call the previous method, passing in the specified initial capacity and default load factor
    this(initialCapacity, DEFAULT_LOAD_FACTOR);
}

public HashMap(a) {
    // Assign the default loadFactor to the member variable loadFactor, all other fields are default values
    this.loadFactor = DEFAULT_LOAD_FACTOR;
}


public HashMap(Map<? extends K, ? extends V> m) {
    // assign the default loadFactor to the loadFactor member variable
    this.loadFactor = DEFAULT_LOAD_FACTOR;
    Call the putMapEntries method, which is also called when the putAll method is called
    putMapEntries(m, false);
}
Copy the code

TableSizeFor (tableSizeFor); tableSizeFor (tableSizeFor);

// HashMap.java
// Returns a power of 2 for the given target capacity
static final int tableSizeFor(int cap) {
    int n = cap - 1; // The first step is to reduce the size of the given target passed in by 1, and then assign the value to n
    n |= n >>> 1; // Step 2: first the complement of n is unsigned by 1 bit to the right, then the operation or is performed with the original complement of n, and finally the value is assigned to n
    n |= n >>> 2; // Step 3: first, the complement of n is unsigned 2 bits to the right, then performs or operation with the original complement of n, and finally assigns the value to n
    n |= n >>> 4; // Step 4: first, the complement of n is unsigned 4 bits to the right, then performs or operation with the original complement of n, and finally assigns the value to n
    n |= n >>> 8; // Step 5: First, the complement of n moves 8 bits unsigned to the right, then performs or operation with the original complement of n, and finally assigns the value to n
    n |= n >>> 16; // Step 6: first, the complement of n is unsigned 16 bits to the right, then performs or operation with the original complement of n, and finally assigns the value to n
    return (n < 0)?1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1; // If n is less than 0, return 1; If n is greater than or equal to 0, check whether n is greater than or equal to the maximum capacity of 1073741824. If n is greater than or equal to the maximum capacity of 1073741824, return the maximum capacity. If less than the maximum, return n plus 1
}
Copy the code

We assume that the value of cap passed in is 30. First, the first step is performed, and the value of n is 29. 29 is a positive number, so its complement is the same as the original code, as shown below:

00000000 00000000 00000000 00011101

Perform the second step. First, the complement of 29 is moved 1 bit unsigned to the right, as shown below:

00000000 00000000 00000000 00001110

It then performs or with the complement of the first step, as shown below, which in decimal is 31:

00000000 00000000 00000000 00011111

Perform the third step. First, the complement of 31 moves 2 bits unsigned to the right, as shown below:

00000000 00000000 00000000 00000111

It then performs or with the complement of the second step, as shown below, which in decimal is 31:

00000000 00000000 00000000 00011111

Perform the fourth step. First, the complement of 31 moves 4 bits unsigned to the right, as shown below:

00000000 00000000 00000000 00000001

It then performs or with the complement of step 3, which is shown below and converted to decimal is 31:

00000000 00000000 00000000 00011111

Perform the fifth step. First, the complement of 31 moves 8 bits unsigned to the right, as shown below:

00000000 00000000 00000000 00000000

It then performs or with the complement of step 4, as shown below, which in decimal is 31:

00000000 00000000 00000000 00011111

Step 6 is performed. First, the complement of 31 is moved 16 bits unsigned to the right, as shown below:

00000000 00000000 00000000 00000000

It then performs or with the complement of step 5, which is shown below and converted to decimal is 31:

00000000 00000000 00000000 00011111

If the value of 31 is greater than 0 and smaller than the maximum capacity of 1073741824, perform the following logic:

31 plus 1 is 32

The final value returned is 32, which is a power of 2, which is 2 to the fifth power.

add

Add HashMap to HashMap

// HashMap.java
// Add a Map to the HashMap, which is called by both the HashMap constructor and the putAll method
final void putMapEntries(Map<? extends K, ? extends V> m, boolean evict) {
    int s = m.size();
    if (s > 0) {
        if (table == null) {
            float ft = ((float)s / loadFactor) + 1.0 F;
            int t = ((ft < (float)MAXIMUM_CAPACITY) ?
                     (int)ft : MAXIMUM_CAPACITY);
            if (t > threshold)
                // Call tableSizeFor, return a power of 2 of t, and assign the value to the member variable threshold
                threshold = tableSizeFor(t);
        }
        else if (s > threshold)
            resize();
        / / traverse Map
        for (Map.Entry<? extends K, ? extends V> e : m.entrySet()) {
            K key = e.getKey();
            V value = e.getValue();
            // Call putVal
            putVal(hash(key), key, value, false, evict); }}}// Associate the specified key in the HashMap with the specified value, and replace the old value with the new value if the same key already exists
public V put(K key, V value) {
    // Call the putVal method. There is an important one: the hash method, which I'll explain in more detail later
    return putVal(hash(key), key, value, false.true);
}

final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
               boolean evict) {
    Node<K,V>[] tab; Node<K,V> p; int n, i;
    if ((tab = table) == null || (n = tab.length) == 0)
        // If the table array is not initialized, call the resize method to initialize the array
        n = (tab = resize()).length;
    // Get the index of the element to be inserted
    if ((p = tab[i = (n - 1) & hash]) == null)
        // If the index is empty, create a node and pass the data to it
        tab[i] = newNode(hash, key, value, null);
    else {
        // If the index is on an element whose data is not empty, that is, a hash collision occurred, the following logic is performed
        Node<K,V> e; K k;
        if(p.hash == hash && ((k = p.key) == key || (key ! =null && key.equals(k))))
            // If the key of the element in the index is the same as the key of the element to be inserted, then e is assigned
            e = p;
        else if (p instanceof TreeNode)
            // If the index element's data structure is a tree, putTreeVal is called to insert data using a red-black tree and assign the value to e
            e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
        else {
            If the index element's data structure is a linked list, the following logic is performed
            // Execute the loop
            for (int binCount = 0; ; ++binCount) {
                if ((e = p.next) == null) {
                    // If there is no element to insert, create a node and insert it into the list
                    p.next = newNode(hash, key, value, null);
                    if (binCount >= TREEIFY_THRESHOLD - 1)
                        // If binCount is greater than or equal to TREEIFY_THRESHOLD minus 1, that is, binCount is greater than or equal to 7, and the list length is greater than or equal to 8, then the list is converted to a red-black tree
                        treeifyBin(tab, hash);
                    break;
                }
                if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k))))
                    // If there is an element to insert in the list, the loop is broken
                    break;
                // Assign e to p and continue the loopp = e; }}if(e ! =null) {
            // If e is not empty, the element to be inserted already exists in the HashMap
            V oldValue = e.value;
            if(! onlyIfAbsent || oldValue ==null)
                // If the value is empty, the value is assigned to that element
                e.value = value;
            afterNodeAccess(e);
            // Return the old value
            returnoldValue; }}// modCount increment by 1
    ++modCount;
    // the value of size is increased by 1
    if (++size > threshold)
        // If the size is greater than the maximum number of nodes in the HashMap, call resize
        resize();
    afterNodeInsertion(evict);
    return null;
}

final Node<K,V>[] resize() {
    Node<K,V>[] oldTab = table;
    int oldCap = (oldTab == null)?0 : oldTab.length;
    int oldThr = threshold;
    int newCap, newThr = 0;
    if (oldCap > 0) {
        // If the table array is already initialized, perform the following logic
        if (oldCap >= MAXIMUM_CAPACITY) {
            // If threshold is greater than or equal to the maximum capacity, set threshold to the maximum capacity
            threshold = Integer.MAX_VALUE;
            return oldTab;
        }
        else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                 oldCap >= DEFAULT_INITIAL_CAPACITY)
            If the new capacity is less than the maximum capacity and the old capacity is greater than or equal to the default initial capacity, the new capacity is set to be twice the old capacity
            newThr = oldThr << 1;
    }
    else if (oldThr > 0)
        // If threshold is greater than 0, the value is assigned to newCap
        newCap = oldThr;
    else {
        // If the initial capacity and the maximum number of nodes that a HashMap can hold are both 0, it is the first initialization
        newCap = DEFAULT_INITIAL_CAPACITY;
        // The formula for threshold is the load factor multiplied by the array length
        newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
    }
    if (newThr == 0) {
        // The formula for ft is the load factor times the array length
        float ft = (float)newCap * loadFactor;
        // Determine whether the initial capacity is less than the maximum capacity, and whether ft is less than the maximum capacity. If so, use ft, otherwise use Integer's maximum value
        newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                  (int)ft : Integer.MAX_VALUE);
    }
    // Update the value of threshold
    threshold = newThr;
    @SuppressWarnings({"rawtypes","unchecked"})
        Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
    // Update the table array
    table = newTab;
    if(oldTab ! =null) {
        // If there is already data in the previous array, the hash value of the table will change due to the size change, and the index needs to be recalculated
        for (int j = 0; j < oldCap; ++j) {
            Node<K,V> e;
            if((e = oldTab[j]) ! =null) {
                // If the index element has a value, set that value to null
                oldTab[j] = null;
                if (e.next == null)
                    // If there is only one element in the index, place the element in the recalculated index
                    newTab[e.hash & (newCap - 1)] = e;
                else if (e instanceof TreeNode)
                    // If the index element's data structure is a tree, the split operation is performed
                    ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                else {
                    // If the index element's data structure is a linked list, the index is recalculated and regrouped
                    Node<K,V> loHead = null, loTail = null;
                    Node<K,V> hiHead = null, hiTail = null;
                    Node<K,V> next;
                    do {
                        next = e.next;
                        if ((e.hash & oldCap) == 0) {
                            if (loTail == null)
                                loHead = e;
                            else
                                loTail.next = e;
                            loTail = e;
                        }
                        else {
                            if (hiTail == null)
                                hiHead = e;
                            elsehiTail.next = e; hiTail = e; }}while((e = next) ! =null);
                    if(loTail ! =null) {
                        loTail.next = null;
                        newTab[j] = loHead;
                    }
                    if(hiTail ! =null) {
                        hiTail.next = null;
                        newTab[j + oldCap] = hiHead;
                    }
                }
            }
        }
    }
    return newTab;
}

// Convert the list to a red-black tree
final void treeifyBin(Node<K,V>[] tab, int hash) {
    int n, index; Node<K,V> e;
    if (tab == null || (n = tab.length) < MIN_TREEIFY_CAPACITY)
        // If the TAB array is empty, or if the TAB array is smaller than the minimum table size (value 64) that the container can turn into a tree, expand it
        resize();
    // Get the index of the element to be inserted
    else if ((e = tab[index = (n - 1) & hash]) ! =null) {
        // If the TAB array is not empty and the TAB array is greater than or equal to the tree's minimum table capacity (value 64), the following logic is performed
        TreeNode<K,V> hd = null, tl = null;
        do {
            // Convert this node to a tree node
            TreeNode<K,V> p = replacementTreeNode(e, null);
            if (tl == null)
                hd = p;
            else {
                p.prev = tl;
                tl.next = p;
            }
            tl = p;
        } while((e = e.next) ! =null);
        if((tab[index] = hd) ! =null)
            // Convert the linked list to a red-black treehd.treeify(tab); }}// Copies all mappings in the specified mapping to this mapping, and replaces its old value with the new value if the same key existed before
public void putAll(Map<? extends K, ? extends V> m) {
    // Call the putMapEntries method, as analyzed earlier
    putMapEntries(m, true);
}
Copy the code

To summarize the logic for adding a single element execution:

  1. Check whether the table array has been initialized. If not, call resize to initialize the array.
  2. Modulo operation is performed according to the hash value calculated by key to obtain the index of the element to be inserted, and determine whether the value of the element is empty. If so, create a node and pass the data into it, and then perform Step 6.
  3. Determine if the data structure of the element on which the index resides is a tree. If so, call the putTreeVal method, insert the data using a red-black tree, and perform Step 5.
  4. If the data structure of the element where the index is located is a linked list, the loop is executed to determine whether there is an element to be inserted into the linked list. If not, a node is created and inserted into the linked list, and then it is determined whether the linked list needs to be converted into a red-black tree under the condition that the length of the linked list is greater than or equal to 8. If so, it breaks out of the loop and finally performs Step 5.
  5. Determine if the value of the element is null. If so, assign the value to it and return the old value.
  6. Determine if the array size is larger than the maximum number of nodes that a HashMap can hold, expand it, and return null.

To summarize the logic of capacity expansion:

  1. Check whether the table array has been initialized. If it has been initialized, expand it to double its original capacity. If it is not initialized, it is initialized (updating the value of threshold and updating the table array).
  2. Iterate through the array, judging step 3, Step 4, and Step 5 in order.
  3. Determine if the index has only one element and, if so, place the element where the recalculated index is located.
  4. Determine whether the data structure of the index’s elements is a tree, and if so, split.
  5. If the data structure of the elements of the index is a linked list, if so, the index is recalculated and regrouped.

The hash method

Let’s take a look at the hash method, also known as the perturbation function. The source code is as follows:

// HashMap.java
static final int hash(Object key) {
    int h;
    return (key == null)?0 : (h = key.hashCode()) ^ (h >>> 16);
}
Copy the code

Check whether key is null, if so, return 0; If it is not empty, the hashCode method of the key is called. If the object has not overridden the hashCode method, it gets a value of type int based on the memory address, moves the hash value unsigned 16 bits to the right, and xor the binary complement of the value and the binary complement of the hash. The idea here is to get both the high and the low values involved, to make the hash more evenly distributed.

Modulus operation

We can often see the following logic, the source code is as follows:

// HashMap.java
p = tab[i = (n - 1) & hash]
Copy the code

It is used to perform modular operation based on the hash value calculated by the key to obtain the index of the element to be inserted. The purpose is to make the distribution of elements more evenly distributed. HashMap does not use hash % n to perform modular operation, because in HashMap, the capacity is a power of 2. Making **(n-1) &hash equivalent to hash % n**, and **& more efficient than % **, HashMap uses **(n-1) &hash for modular operation **.

Let’s say the value of n is 2, the value of hash is 4 (2 to the 2nd power), and the value of hash % n is 0. What is (n-1) &hash? First run n-1 to get 1. Then run and on 1 and 4. Since they are both positive numbers, the complement is the same as the original, as shown below:

1 (decimal) = 00000000 00000000 00000000 00000001 (binary)

4 (decimal) = 00000000 00000000 00000000 00000100 (binary)

After the operation of and, the complement is as follows:

00000000 00000000 00000000 00000000

In decimal, this is 0, the same result as hash % n.

delete

Delete HashMap ();

// HashMap.java
// Delete the key-value pairs specified in the mapping by key (if any)
public V remove(Object key) {
    Node<K,V> e;
    Call the removeNode method
    return (e = removeNode(hash(key), key, null.false.true)) = =null ?
        null : e.value;
}

final Node<K,V> removeNode(int hash, Object key, Object value,
                           boolean matchValue, boolean movable) {
    Node<K,V>[] tab; Node<K,V> p; int n, index;
    if((tab = table) ! =null && (n = tab.length) > 0 &&
        (p = tab[index = (n - 1) & hash]) ! =null) {
        // If the table array is not empty and the table array has elements, the index of the element to be deleted is obtained by modulo operation based on the hash value calculated by key. The data of the element in the index is not empty
        Node<K,V> node = null, e; K k; V v;
        if(p.hash == hash && ((k = p.key) == key || (key ! =null && key.equals(k))))
            // If the key of the element in the index is the same as the key of the element to be deleted, it is assigned to node
            node = p;
        else if((e = p.next) ! =null) {
            // If the element of the index has a next node, the following logic is performed
            if (p instanceof TreeNode)
                // If the node's data structure is a tree, the getTreeNode method is called
                node = ((TreeNode<K,V>)p).getTreeNode(hash, key);
            else {
                // If the node's data structure is a linked list, perform the following logic
                do {
                    if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k)))) {
                        // Assign to node if it can be found in the list
                        node = e;
                        break;
                    }
                    p = e;
                } while((e = e.next) ! =null); }}if(node ! =null&& (! matchValue || (v = node.value) == value || (value ! =null && value.equals(v)))) {
            // If you can find the node to delete, perform the following logic
            if (node instanceof TreeNode)
                // If the node's data structure is a tree, call the removeTreeNode method
                ((TreeNode<K,V>)node).removeTreeNode(this, tab, movable);
            else if (node == p)
                // If this node is the first in the list, the index of the array points to the next location
                tab[index] = node.next;
            else
                // If the node is not the first node in the list, it is removed from the list
                p.next = node.next;
            // modCount increment by 1
            ++modCount;
            // the value of size decreases by 1
            --size;
            afterNodeRemoval(node);
            returnnode; }}return null;
}
Copy the code

To find the

HashMap lookup method, source code:

// HashMap.java
// Returns the value of the key specified in the mapping, or null if none exists
public V get(Object key) {
    Node<K,V> e;
    // Call getNode
    return (e = getNode(hash(key), key)) == null ? null : e.value;
}

final Node<K,V> getNode(int hash, Object key) {
    Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
    if((tab = table) ! =null && (n = tab.length) > 0 &&
        (first = tab[(n - 1) & hash]) ! =null) {
        // If the table array is not empty, and the table array has elements, and the hash value calculated by key is modulo, get the index of the element to be searched, the element data of the index is not empty
        if(first.hash == hash && ((k = first.key) == key || (key ! =null && key.equals(k))))
            // If the key of the element in the index is the same as the key of the element to be deleted, the element is returned
            return first;
        if((e = first.next) ! =null) {
            // If the element of the index has a next node, the following logic is performed
            if (first instanceof TreeNode)
                // If the node's data structure is a tree, the getTreeNode method is called
                return ((TreeNode<K,V>)first).getTreeNode(hash, key);
            // If the node's data structure is a linked list, perform the following logic
            do {
                if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k))))
                    // Returns the node if it can be found in the list
                    return e;
            } while((e = e.next) ! =null); }}// If the element does not exist, return null
    return null;
}
Copy the code

digression

Common Map implementation classes

Map is an interface that maps keys to values. The mapping cannot contain duplicate keys, but each key can Map a maximum of one value. Common Map implementation classes include HashMap, ConcurrentHashMap, Hashtable, LinkedHashMap, and TreeMap. Their UML class diagram looks like this:

ConcurrentHashMap

ConcurrentHashMap is a thread-safe HashMap. Prior to JDK 1.8, ConcurrentHashMap introduced segwise locking. Segwise locking works by storing data in segments and assigning a lock to each segment. When one thread accesses one segment of data, it will occupy that lock, but it does not affect other threads to access other segments of data, thus improving efficiency. After JDK 1.8, piecewise locks were abandoned in favor of built-in locks synchronized And CAS (Compare And Swap) to ensure thread safety.

Hashtable

Hashtable is a legacy class. It is similar to HashMap, except that it inherits the Dictionary class and is thread-safe. However, it is not as concurrent as ConcurrentHashMap. Optionally use ConcurrentHashMap.

LinkedHashMap

LinkedHashMap is a subclass of HashMap, which maintains a bidirectional linked list to ensure iteration order. This iteration order is determined by accessOrder (Boolean). The default implementation is insertion order. It can implement LRU (Least Recently Used) algorithm.

TreeMap

TreeMap is based on the NavigableMap implementation of a red-black tree, and it can sort based on the natural order in which keys are compared, or by the Comparator that it provides when it is created, depending on the constructor used.

Several ways to resolve hash conflicts

There are four ways to resolve hash conflicts:

Open address method

The open address method means that when an address conflict occurs, the other storage units in the hash table continue to be explored in a certain way until an empty location is found. The formula is Hi(key) = (H(key) + di) mod m, where H(key) is the hash address of the key, di is the incremental address of each probe, and M is the length of the hash table.

Incremental di can be taken in different ways, and is named as follows:

Linear detection method

The incremental DI of linear detection method is 1, 2, 3… , k(k <= m-1). When address conflicts occur, the next storage location is sequentially probed in the hash table until an empty location is found.

Secondary detection method

The incremental DI of the second detection method is 1^2, -1^2, 2^2, -2^2… , k^2, -k^2(k <= m / 2), when the address conflict occurs, the left and right of the hash table for jump detection, bidirectional detection of empty positions.

Random detection method

The incremental DI of random detection method is calculated by random function. When address conflict occurs, empty positions are randomly detected in the hash table.

It’s worth noting that ThreadLocal’s internal class, ThreadLocalMap, uses the open address method to resolve hash conflicts.

Chain address method

Linked address method refers to all the records with the same hash address are linked in the same linked list. It is simple to deal with conflicts, and there is no accumulation phenomenon, which means that non-synonyms will never conflict. The space of nodes on each linked list is dynamically applied, so it is more suitable for the situation that the length of hash table cannot be determined.

Then the hash method

Rehash method is to construct multiple different hash functions at the same time. When a conflict occurs when using one hash function, another hash function is used until the conflict no longer occurs. This method is not easy to generate aggregation, but it will increase the calculation time.

Establish public overflow areas

The establishment of a public overflow area refers to the hash table is divided into basic table and overflow table, all elements that conflict with the basic table will be filled in the overflow table, and the overflow table can also use the same hash function, easy to implement.

My GitHub: TanJiaJunBeyond

Common Android Framework: Common Android framework

My nuggets: Tan Jiajun

My simple book: Tan Jiajun

My CSDN: Tan Jiajun