Write it up front

This project has set up a flag since the end of 20 years. After several years of study, I have a new understanding of many knowledge points when I look back. So while looking for internship preparation, combined with the previous learning reserve, to create a Java open source knowledge project mainly for fresh graduates and beginners, focus on Java rear end questions + analysis + key knowledge details + selected articles of the open source project, I hope it can accompany you and I have been progress!

Note: The content of this project is based on many bloggers (indicated source), materials, N books, as well as their own understanding, re-drawing, re-organizing language and so on. It is inevitable that individual efforts will be weak or inadequate, but there will always be updates/improvements. Every Star is an encouragement to me! I hope you enjoy it.

Note: all the pictures involved are not used in the network map bed, the article is open source to provide you.

Project name: Java-Ideal-interview

Github address: Java-Ideal-interview-github

Gitee address: Java-Ideal-interview-Gitee

In continuous update, online reading will be provided in the later stage, if you think Gitee or Github reading inconvenience, can be cloned to local with Typora and other editors comfortable reading

If the Github clone speed is too slow, you can use the Gitee repository in China

  • Three HashMap source analysis
    • 1. Pre-knowledge
      • 1.1 What is Map
        • 1.1.1 overview
        • 1.1.2 Difference between Map Collection and Collection Collection
      • 1.2 What is a hash table
        • 1.2.1 Analyze why hash table is used
        • 1.2.2 How hash tables work
        • 1.2.3 How to Resolve Hash Conflicts
          • 1.2.3.1 JDK 1.7
          • 1.2.3.1 JDK 1.8
      • 1.3 What is a red-black tree
    • 2. Source code analysis
      • 2.1 class members
      • 2.2 Two Nodes
        • 2.2.1 Node Node
        • 2.2.2 TreeNode node
      • 2.3 Construction method
      • 2.4 Adding Methods
        • Against 2.4.1 the put ()
        • 2.4.2 putVal ()
      • 2.5 Obtaining Methods
        • 2.5.1 the get ()
        • 2.5.2 getNode
      • 2.6 Removal Methods
        • 2.6.1 remove ()
      • 2.7 Capacity Expansion Method
        • 2.7.1 resize ()
    • 3. Focus on analysis
      • 3.1 How to resolve hash conflicts with a perturbation function in hash() ※

Three HashMap source analysis

1. Pre-knowledge

1.1 What is Map

In the actual demand, we often encounter such a problem, in a lot of data, through its number to find some information, so as to view or modify, for example, through the student number query student information. The Map collection we introduced today can help us achieve this requirement

1.1.1 overview

A Map is a collection of pairs of elements (called keys and values respectively, or key-value pairs) that Map keys to objects of values. A map cannot contain duplicate keys, and each key can map to at most one value.

How do you understand that?

Key: is the number of the value that you want to store. Value: is the data that you want to store

You can roughly think of keys as subscripts, values are stored by keys, and each key has its own value. These two correspond to 1 and 1

But before, the subscript was an integer, but the Map key can make any type of object.

1.1.2 Difference between Map Collection and Collection Collection

  • Map sets store elements in pairs, Map sets have unique keys and repeatable values
  • The Collection store element appears separately, the subclass Set of Collection is unique, and the List is repeatable.
  • The Map data structure values are valid for keys, not values, whereas the Collection data structure values are valid for elements

1.2 What is a hash table

A hash table, also known as a hash table, is a data structure that is accessed directly based on key values. That is, it accesses records by mapping key values to a location in the table to speed up lookups. This mapping is also called a hash function, and the array of records is called a hash table.

A common example is that to find someone’s number in a phone book, you can create an alphabetical table (that is, a function of the names to the first letter). Looking up wang’s phone number in a W table is obviously much faster than looking it up directly. Here the name is used as the key, and “take initial” is the rule for the hash function in this example. The table that holds the first letter corresponds to the hash table. Keywords and function rules can theoretically be arbitrarily determined. — Wikipedia

1.2.1 Analyze why hash table is used

A hash table is essentially an extension of an array, because it essentially uses the fact that arrays can randomly access data by subscript, so let’s take a look at it step by step

First, create an array. We regard each storage space of the array as a box or a bucket, store some key-value data, such as [Zhang SAN, 20] [Li Si, 30] [Wang Wu, 40] [Zhao Liu, 50] [Sun Qi, 60], and place them successively in the array.

If you use the common sequence table query mode, you need to compare the query sequence from the beginning. However, the more data, the longer the time to search the sequence table. With a lot of data, it obviously doesn’t add up.

There are also a variety of data structures that don’t care about the order of the elements and can quickly find element data. One of them is a hash table

Now what makes hash tables so efficient

1.2.2 How hash tables work

The Hash will use the Hash function to calculate the key of “triple”, which is the Hash value of the string “triple”. For example, return a 5372, and then do the residuals. The divisor is the length of the array, i.e. : 5372 mod 5 = 2, so place it at index 2. For example, if the second data has a hash value of 6386, continue 6386 mod 5 = 1, place it at index 1, and so on…..

For example, when we store the third data [wang wu, 40], the result obtained through hash function calculation is 5277,5277 mod 5 = 2, but the data [zhang SAN, 20] already exists in position 2. This kind of situation where the storage location is repeated is called conflict

1.2.3 How to Resolve Hash Conflicts

1.2.3.1 JDK 1.7

Prior to JDK 1.8, the underlying form of a HashMap was arrays and linked lists. Therefore, when the hashing conflict occurs, the zipper method is used to resolve the conflict.

In the zipper method, each grid (box) of the array is regarded as a linked list. For example, the grid with subscript 1 is a linked list, which has stored [three, 20]. If there is still data hash value equal to 1 after mod, it can directly append these data to the linked list in 1.

1.2.3.1 JDK 1.8

JDK 8 made some major changes to reduce the search time significantly by converting lists to red-black trees when the length of each cell in an array is larger than the threshold (8 by default).

Moreover, if the hash table is nearly full, there will be a mechanism for re-hashing, which will be analyzed in depth in the source code below.

1.3 What is a red-black tree

Red-black trees are a complex tree structure, and I’m not going to go into too much detail here, but I’m going to talk about the basic structure, and there’s a basic concept. For understanding, refer to the nuggets article (Nuggets – Comics: What is a Red Black Tree? @programmer Grey) very good!

Red-black tree is a tree structure evolved from binary search tree, 2-3 tree and so on in order to prevent some extreme situations of binary tree, such as becoming a line, or unbalanced left and right. The main purpose is to maintain balance. Ensure that the left and right branches of the tree are basically balanced.

The specific evolution of data results is quite complex. This article will focus on HashMap, and if necessary, we will focus on Java versions of some common data structures

2. Source code analysis

2.1 class members

// Serialization a code automatically generated to verify version consistency in serialization.
private static final long serialVersionUID = 362498820763181265L;   

// Default initial capacity 1 * 2^4 = 16
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4;

// Maximum capacity 1 * 2^30
static final int MAXIMUM_CAPACITY = 1 << 30;

The default load factor is 0.75
static final float DEFAULT_LOAD_FACTOR = 0.75 f;

// The tree threshold of the bucket. If the node number on the bucket is greater than this value, it will be transformed into a red-black tree.
// Convert the list to a red-black tree if the length above is greater than the threshold (8 by default)
static final int TREEIFY_THRESHOLD = 8;

// The bucket's list restore threshold, when the number of nodes on the bucket is less than this value tree list
// The same thing
static final int UNTREEIFY_THRESHOLD = 6;

// Minimum tree capacity threshold. If the capacity in the hash table is greater than this value, tree linked lists are allowed
// Otherwise, if there are too many elements in the bucket, expand the bucket directly instead of tree
// To avoid conflicts between capacity expansion and tree selection, the value cannot be less than 4 * TREEIFY_THRESHOLD
static final int MIN_TREEIFY_CAPACITY = 64;

// An array that stores elements, always a power of 2
transient Node<k,v>[] table; 

// Store a set of specific elements
transient Set<map.entry<k,v>> entrySet;

// Number of elements (not array length)
transient int size;

// Count variables for expansion and modification
transient int modCount;   

// Threshold When the actual size (capacity x fill factor) exceeds the threshold, the system expands the capacity
int threshold;

// Load factor
final float loadFactor;
Copy the code

There are a few things that need to be emphasized

Threshold threshold

  • When the actual array size (capacity x padding factor, that is, threshold = capacity x loadFactor) exceeds the threshold, the array will be expanded.

LoadFactor loadFactor

  • When the number of elements in a table exceeds the value of the load factor, the hash table is automatically expanded, usually by double. This behavior can be called rehashing.
  • Load factor value, the greater the add elements of the will, the more do space utilization to a lot of ascension, but there is no doubt that is facing the possibility of hash conflict increases, on the other hand, the wasted space utilization, but also reduced the hash conflict, so we hope that in the space utilization and hash conflict we can accept to find a balance between, After some tests, it was fixed at 0.75 F.

2.2 Two Nodes

Because the data result will be converted to red-black tree under certain conditions, there are tree nodes (TreeNode nodes) in addition to ordinary nodes.

2.2.1 Node Node

static class Node<K.V> implements Map.Entry<K.V> {
    // Hash code, used to find the position and compare elements are the same
    final int hash;
    / / key
    final K key;
    / / value
    V value;
    // point to the next node
    Node<K,V> next;

    Node(int hash, K key, V value, Node<K,V> next) {
        this.hash = hash;
        this.key = key;
        this.value = value;
        this.next = next;
    }

    public final K getKey(a)        { return key; }
    public final V getValue(a)      { return value; }
    public final String toString(a) { return key + "=" + value; }
	
    // Rewrite hashCode, ^ is the bitwise xor operator
    public final int hashCode(a) {
        return Objects.hashCode(key) ^ Objects.hashCode(value);
    }

    public final V setValue(V newValue) {
        V oldValue = value;
        value = newValue;
        return oldValue;
    }

    // Override equals()
    public final boolean equals(Object o) {
        if (o == this)
            return true;
        if (o instanceofMap.Entry) { Map.Entry<? ,? > e = (Map.Entry<? ,? >)o;if (Objects.equals(key, e.getKey()) &&
                Objects.equals(value, e.getValue()))
                return true;
        }
        return false; }}Copy the code

2.2.2 TreeNode node

static final class TreeNode<K.V> extends LinkedHashMap.Entry<K.V> {
    / / the parent node
    TreeNode<K,V> parent;
    / / the left node
    TreeNode<K,V> left;
    / / right node
    TreeNode<K,V> right;
    TreeNode<K,V> prev;
    // Determine the color, default red
    boolean red;
    TreeNode(int hash, K key, V val, Node<K,V> next) {
        super(hash, key, val, next);
    }
    // return the root node
    final TreeNode<K,V> root(a) {
        for (TreeNode<K,V> r = this, p;;) {
            if ((p = r.parent) == null)
                return r;
            r = p;
        }
Copy the code

2.3 Construction method

// a constructor that specifies a specific capacity size and load factor
public HashMap(int initialCapacity, float loadFactor) {
    if (initialCapacity < 0)
        throw new IllegalArgumentException("Illegal initial capacity: " +
                                           initialCapacity);
    if (initialCapacity > MAXIMUM_CAPACITY)
        initialCapacity = MAXIMUM_CAPACITY;
    if (loadFactor <= 0 || Float.isNaN(loadFactor))
        throw new IllegalArgumentException("Illegal load factor: " +
                                           loadFactor);
    this.loadFactor = loadFactor;
    this.threshold = tableSizeFor(initialCapacity);
}

// a constructor that specifies a specific capacity size
public HashMap(int initialCapacity) {
    this(initialCapacity, DEFAULT_LOAD_FACTOR);
}

// The default constructor takes no arguments
public HashMap(a) {
    this.loadFactor = DEFAULT_LOAD_FACTOR; 
}

// The map constructor is specified
public HashMap(Map<? extends K, ? extends V> m) {
    this.loadFactor = DEFAULT_LOAD_FACTOR;
    putMapEntries(m, false);
}
Copy the code

tableSizeFor

/** * returns a number greater than the input argument and the nearest power of 2 * is just an initialization that will be reassigned when the hash table is created
static final int tableSizeFor(int cap) {
    int n = cap - 1;
    n |= n >>> 1;
    n |= n >>> 2;
    n |= n >>> 4;
    n |= n >>> 8;
    n |= n >>> 16;
    return (n < 0)?1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
}
Copy the code

putMapEntries

final void putMapEntries(Map<? extends K, ? extends V> m, boolean evict) {
    // Get the length of the given Map
    int s = m.size();
    if (s > 0) {
        // Check whether the table where the actual data is stored is initialized
        if (table == null) { // pre-size
            // Set s to the actual number of elements in m
            float ft = ((float)s / loadFactor) + 1.0 F;
            // Prevent less than the minimum capacity (threshold)
            int t = ((ft < (float)MAXIMUM_CAPACITY) ?
                    (int)ft : MAXIMUM_CAPACITY);
            // If the value is greater than the threshold, initialize the threshold
            if (t > threshold)
                threshold = tableSizeFor(t);
        }
        // The table is initialized and the number of Map M elements exceeds the threshold
        else if (s > threshold)
            resize();
        // Add all elements from the given collection M to the HashMap
        for (Map.Entry<? extends K, ? extends V> e : m.entrySet()) {
            K key = e.getKey();
            V value = e.getValue();
            The putVal method will be introduced when adding related methods
            putVal(hash(key), key, value, false, evict); }}}Copy the code

2.4 Adding Methods

Against 2.4.1 the put ()

For a HashMap, put(K key, V value) is the only public method to be added to the outside world. Other PUT methods are called internally by PUT (K key, V value)

public V put(K key, V value) {
    return putVal(hash(key), key, value, false.true);
}
Copy the code

For each parameter and details of putVal, let’s take a look at the first parameter hash(key). Let’s first mention how the hash value is computed in a HashMap.

[3.1 Hash () how to resolve hash conflicts ※](###3.1 Hash () how to resolve hash conflicts ※)

2.4.2 putVal ()

final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
               boolean evict) {
    Node<K,V>[] tab; Node<K,V> p; int n, i;
    // If the table is not initialized (null) or the length is 0, call resize to expand the table
    if ((tab = table) == null || (n = tab.length) == 0)
        n = (tab = resize()).length;
    // If the bucket is empty, there is no collision
    // (n-1) &hash is used to determine where elements are stored, i.e. in which bucket
    if ((p = tab[i = (n - 1) & hash]) == null)
        // Add the new node to the bucket (array)
        tab[i] = newNode(hash, key, value, null);
    // If elements already exist in the bucket
    else {
        Node<K,V> e; K k;
        // If the node key exists, it is compared with the key to be inserted
        if(p.hash == hash && ((k = p.key) == key || (key ! =null && key.equals(k))))
            // Overwrite value if the key is the same
            e = p;
        // If the hash value is not equal, that is, the key is not equal, it becomes a red-black tree node
        else if (p instanceof TreeNode)
            // Insert into the tree
            e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
        // If it is a linked list node
        else {
            // traverse to find the tail node to insert
            for (int binCount = 0; ; ++binCount) {
                // get to the end of the list
                if ((e = p.next) == null) {
                    // Insert a new node at the end
                    p.next = newNode(hash, key, value, null);
                    // The number of nodes reaches the threshold and is converted into a red-black tree
                    if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                        treeifyBin(tab, hash);
                    // Break the loop
                    break;
                }
                // Overwrite value when the same key is encountered
                if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k))))
                    // equal, out of the loop
                    break;
                E = p.ext; // This is used to iterate over the bucket listp = e; }}// Find the node in the bucket where the key value and hash value are equal to the inserted element
        if(e ! =null) { 
            // Record the value of e
            V oldValue = e.value;
            // onlyIfAbsent Is false or the old value is null
            if(! onlyIfAbsent || oldValue ==null)
                // Replace the old value with the new value
                e.value = value;
            // Callback after access
            afterNodeAccess(e);
            // Return the old value
            returnoldValue; }}// Structural changes
    ++modCount;
    // Expand the capacity when the capacity exceeds the upper limit
    if (++size > threshold)
        resize();
    // Callback after insertion
    afterNodeInsertion(evict);
    return null;
} 
Copy the code

To summarize the general process:

  • Let’s go to A specific array location, for example, called A
  • If there’s no element at A
    • Just insert it
  • If there is an element at A, compare it to the key to be inserted
    • If the keys are the same, overwrite them
    • If the keys are different, p is a tree node
      • If so, call putTreeVal to add the element to it
      • If not, iterate over the list insert (tail insert)

2.5 Obtaining Methods

2.5.1 the get ()

Similarly, the get method also uses the hash method to calculate the hash value of the key. Similarly, jump to 3.1 to see, you can also see at the end of the view.

[3.1 Hash () how to resolve hash conflicts ※](###3.1 Hash () how to resolve hash conflicts ※)

public V get(Object key) {
    Node<K,V> e;
    return (e = getNode(hash(key), key)) == null ? null : e.value;
}
Copy the code

2.5.2 getNode

final Node<K,V> getNode(int hash, Object key) {
    Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
    // Make sure the calculated hash value is in the hash table
    if((tab = table) ! =null && (n = tab.length) > 0 &&
        (first = tab[(n - 1) & hash]) ! =null) {
        // If it is in the first position of the bucket, it can be returned directly (there is only one element in the bucket, or in the first position).
        if (first.hash == hash && // always check first node((k = first.key) == key || (key ! =null && key.equals(k))))
            return first;
        // There is more than one node in the bucket
        if((e = first.next) ! =null) {
            // Get in the tree
            if (first instanceof TreeNode)
                return ((TreeNode<K,V>)first).getTreeNode(hash, key);
            // Get in the list
            do {
                if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k))))
                    return e;
            } while((e = e.next) ! =null); }}return null;
}
Copy the code

2.6 Removal Methods

2.6.1 remove ()

Similarly, the get method also uses the hash method to calculate the hash value of the key. Similarly, jump to 3.1 to see, you can also see at the end of the view.

[3.1 Hash () how to resolve hash conflicts ※](###3.1 Hash () how to resolve hash conflicts ※)

public V remove(Object key) {
    Node<K,V> e;
    return (e = removeNode(hash(key), key, null.false.true)) = =null ?
        null : e.value;
}
Copy the code

2.6.2 removeNode ()

final Node<K,V> removeNode(int hash, Object key, Object value,
                           boolean matchValue, boolean movable) {
    Node<K,V>[] tab; Node<K,V> p; int n, index;
    // The bucket is not empty and the hash value of the mapping is also present
    if((tab = table) ! =null && (n = tab.length) > 0 &&
        (p = tab[index = (n - 1) & hash]) ! =null) {
        Node<K,V> node = null, e; K k; V v;
        // If the element is found at the top of the bucket, record it
        if(p.hash == hash && ((k = p.key) == key || (key ! =null && key.equals(k))))
            node = p;
        // If not, go to the red black tree or linked list
        else if((e = p.next) ! =null) {
            if (p instanceof TreeNode)
                node = ((TreeNode<K,V>)p).getTreeNode(hash, key);
            else {
                do {
                    if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k)))) {
                        node = e;
                        break;
                    }
                    p = e;
                } while((e = e.next) ! =null); }}// If you find the node and value to delete, you can delete it in three ways: linked list, red-black tree, and bucket head
        if(node ! =null&& (! matchValue || (v = node.value) == value || (value ! =null && value.equals(v)))) {
            if (node instanceof TreeNode)
                ((TreeNode<K,V>)node).removeTreeNode(this, tab, movable);
            else if (node == p)
                tab[index] = node.next;
            else
                p.next = node.next;
            ++modCount;
            --size;
            afterNodeRemoval(node);
            returnnode; }}return null;
}
Copy the code

2.7 Capacity Expansion Method

2.7.1 resize ()

Resize is very time consuming in the program. Try to avoid it.

  • It redistributes the hash and then iterates through all the elements of the hash table
final Node<K,V>[] resize() {
    Node<K,V>[] oldTab = table;
    int oldCap = (oldTab == null)?0 : oldTab.length;
    int oldThr = threshold;
    int newCap, newThr = 0;
    if (oldCap > 0) {
        // If the maximum value is exceeded, no further expansion is required
        if (oldCap >= MAXIMUM_CAPACITY) {
            threshold = Integer.MAX_VALUE;
            return oldTab;
        }
        // If the value is not higher than the maximum value, the value is doubled
        else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY && oldCap >= DEFAULT_INITIAL_CAPACITY)
            newThr = oldThr << 1; // double threshold
    }
    else if (oldThr > 0) // initial capacity was placed in threshold
        // During initialization, threshold temporarily saves the value of the initialCapacity parameter
        newCap = oldThr;
    else { 
        // signifies using defaults
        newCap = DEFAULT_INITIAL_CAPACITY;
        newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
    }
    // Calculate the new resize upper limit
    if (newThr == 0) {
        float ft = (float)newCap * loadFactor;
        newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ? (int)ft : Integer.MAX_VALUE);
    }
    threshold = newThr;
    @SuppressWarnings({"rawtypes","unchecked"})
        Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
    table = newTab;
    if(oldTab ! =null) {
        // Move the old key-value pair to the new hash bucket array
        for (int j = 0; j < oldCap; ++j) {
            Node<K,V> e;
            if((e = oldTab[j]) ! =null) {
                oldTab[j] = null;
                // / no chain, that is, there is no next, only yourself
                if (e.next == null)
                    newTab[e.hash & (newCap - 1)] = e;
                else if (e instanceof TreeNode)
                    // Split the red-black tree into two sub-lists, and then convert them into red-black trees as needed
                    ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                else { 
                    // Split the list into two sub-lists in the same order
                    Node<K,V> loHead = null, loTail = null;
                    Node<K,V> hiHead = null, hiTail = null;
                    Node<K,V> next;
                    do {
                        next = e.next;
                        / / the original indexes
                        if ((e.hash & oldCap) == 0) {
                            if (loTail == null)
                                loHead = e;
                            else
                                loTail.next = e;
                            loTail = e;
                        }
                        // Old index + oldCap
                        else {
                            if (hiTail == null)
                                hiHead = e;
                            elsehiTail.next = e; hiTail = e; }}while((e = next) ! =null);
                    // Put the old index into the new hash bucket
                    if(loTail ! =null) {
                        loTail.next = null;
                        newTab[j] = loHead;
                    }
                    // Put the old index +oldCap into the new hash bucket
                    if(hiTail ! =null) {
                        hiTail.next = null;
                        newTab[j + oldCap] = hiHead;
                    }
                }
            }
        }
    }
    return newTab;
}
Copy the code

3. Focus on analysis

3.1 How to resolve hash conflicts with a perturbation function in hash() ※

HashMap put ();

  //HashMap source code excerpt -JDK8
  public V put(K key, V value) {
      return putVal(hash(key), key, value, false.true);
  }
Copy the code

Our value needs to go through the hash method in the HashMap before returning

Then locate the source of the hash method:

static final int hash(Object key) {
    int h;
    return (key == null)?0 : (h = key.hashCode()) ^ (h >>> 16);
}
Copy the code

The result of the hash method is a ternary operator. If the key is null, it returns 0. If it exists, it returns the contents of the following sentence

(h = key.hashCode()) ^ (h >>> 16)
Copy the code

JDK8 HashMap — this code in the hash method is called “perturbation function”

Let’s break it down:

HashCode is an Object method that can be overridden by subclasses of the Object class. Integer () {hashCode ();

  /**
   * Returns a hash code for this {@code Integer}.
   *
   * @return  a hash code value for this object, equal to the
   *          primitive {@code int} value represented by this
   *          {@code Integer} object.
   */
  @Override
  public int hashCode(a) {
      return Integer.hashCode(value);
  }
  
  /**
   * Returns a hash code for a {@code int} value; compatible with
   * {@code Integer.hashCode()}.
   *
   * @param value the value to hash
   * @since 1.8
   *
   * @return a hash code value for a {@code int} value.
   */
  public static int hashCode(int value) {
      return value;
  }
Copy the code

The return value of the hashCode method in Integer is the number itself

Note: The value of an integer is a good enough hash because it is as unique as the integer itself

So, the following A and B are equivalent

  // Note: Key is a hash(Object key) parameterA :(h = key.hashcode ()) ^ (h >>>16) B: key ^ (key >>>16)
Copy the code

At this point in our analysis, all we have left is bits, so why don’t we rush, and let’s get our heads together

A HashSet uses a hash table (a linked list combined with an array). When a key is stored, it performs some operations to determine its position in the array.

The hashCoe method returns a hash equal to its own value, but given the range of ints: -2147483648 to 2147483647, it is obvious that these hash values cannot be used directly because memory would not fit a 4 billion array. IndexFor (); JDK8 indexFor(); JDK8 indexFor();

  / / JDK8
  (tab.length - 1) & hash;Copy the code
  / / JDK7
  bucketIndex = indexFor(hash, table.length);
  
  static int indexFor(int h, int length) {
      return h & (length - 1);
  }
Copy the code

By the way, the reason why we use ampersand instead of % is because bit operations operate directly on memory data and do not need to be converted to decimal, so the processing speed is very fast. As a result, bit operations are much more efficient than module % operations.

The hash method and **indexFor()** are used to determine the index of the key

(Module calculation should be based on JDK8, but for the sake of convenience, or according to JDK7 name, the following examples are for this, hereby declare in advance)

But if we look directly at the and operation (&), there seems to be some problems. Let’s take an example:

In HashMap, the initial length is 16, length-1 = 15. The binary representation is 00000000 00000000 00000000 00001111

And is calculated as: 0 is 0, and we pick a random key

          1111 1111 1010 0101 1111 0000 0011 1100
  &       0000 0000 0000 0000 0000 0000 0000 1111
  ----------------------------------------------------
          0000 0000 0000 0000 0000 0000 0000 1100
Copy the code

We split the 32 bits in the middle. The 16 bits on the left are called the high bits, and the 16 bits on the right are called the low bits. As you can see, the result of the ampersand operation is that the high bits return to 0, leaving the last four bits of the low bits. However, the problem is that if we use the default initial length of 16, the HashCode values are shown in the following figure. As you can see, the Index values are 12 after only ampersand (&) without perturbation, which leads to hash conflicts

A simple understanding of hash conflicts: Plan to insert an object into a hash table, but find that the position is already occupied by another object

In this example, two different HashCode values are computed to get the same value, which means that they both need to be placed at subscript 2

In general, if the data is widely distributed and the size of the array storing the data is large, the hash conflicts will be low, otherwise they will be high.

However, if I take only the last few bits, as in the example above, this is not a good thing, even if MY data distribution is scattered, the hash collision will still be serious.

Remember, our perturbation function is still ahead of us, and it’s going to be very powerful at this point. Again, using the above two hashing data, this time we’ll add the perturbation function and do the ampersand

The value of the left operand is moved to the right as specified by the right operand. The resulting space is filled with zero to fill the xor of the ^ bit. If it is the same, it is 0; if it is different, it is 1

It can be seen that the two sets of data with hash conflict are not the same after the perturbation function processing, so the conflict is avoided

Actually in the perturbation function, the data right displacement 16, highs and lows of hash code mixed up, this is also solved the above High return 0, several calculation only rely on the low end, it makes some of the characteristics of the high also has effect on the low, making the randomness of the strengthen, low can better avoid conflict