Introduction to the

The internal structure of HashMap is composed of array + linked list/red-black tree. In many cases, it is a combination of multiple data structures.

Let’s take a look at the basic operations of HashMap:

new HashMap(n);

First of all, if we pass in n, is the capacity of the constructed HashMap n?

The answer is: not necessarily.

    public HashMap(int initialCapacity, floatloadFactor) { this.loadFactor = loadFactor; // Default load factor 0.75 // Set capacity this.threshold = tableSizeFor(initialCapacity); }Copy the code

TableSizeFor this code actually does one thing, for example, if you initialize 10, it’ll give you 16, anything greater than 10 is 2 to the k.

Take the initial value 50 as an example to explain the implementation principle:

  static final int tableSizeFor(int cap) {
      int n = cap - 1;
      n |= n >>> 1;
      n |= n >>> 2;
      n |= n >>> 4;
      n |= n >>> 8;
      n |= n >>> 16;
      return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
  }
Copy the code

The algorithm is to move the binary to the right, xor with itself, changing the first digit from 1 to 1, 111111 + 1 = 1000000 = 26, 2^62, 6

Second point, to answer the opening question, why are hash functions designed this way?

The hash function of a HashMap is computed based on the Key value; Make sure hash collisions are as low as possible and as scattered as possible; The algorithm has to be as efficient as possible, because it’s high-frequency; Take a look at this code again:

static final int hash(Object key) {
    int h;
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}
Copy the code

This code is called a perturbation function. What if the hash function uses key.hashcode () as a hash value?

Key.hashcode () returns the hashCode() of the key. If the length of the HashMap array is 16, the hash of the object in the array location (n-1) is 0000 1111&hash. You only use the low hash, because if you only use the low hash, you have a higher chance of collisions.

Smart algorithm designers, with a focus on performance and collision reduction, consider combining xor with xOR for high and low 16 bits to form hash values. As shown in the picture below,

Third, what are the optimizations for JDK1.8 compared to 1.7?

1.7 Use the head insert method, 1.8 use the tail insert method; 1.7 Hash function uses four bit operations + five xOR operations 1.8 Hash Function uses one bit operation + one XOR operations 1.7 Use array + linked list structure, 1.8 use array + linked list + red-black tree; 1.7 The original elements need to be rehash & (len-1) for capacity expansion. 1.8 Calculating the new position of elements = original position/original position + old capacity; Here’s what 👆 said:

Article 1: Before 1.8, we used the head interpolation method, because the author thinks that the data inserted now is hot and most likely to be used immediately, so we use the head interpolation method.

And why 1.8 with tail interpolation, if is the first method, in A multithreaded environment, there will be such A problem: A thread in the insert node B, B thread insert, meet their capacity expansion, hash, again place elements, using interpolation method, after traversal to node B in the head, so to form the ring, as shown in the figure below:

The hash function for 1.7 looks like this:

static int hash(int h) {
    h ^= (h >>> 20) ^ (h >>> 12);
    return h ^ (h >>> 7) ^ (h >>> 4);
}

Copy the code

Four unsigned moves to the right five xor

Article 3: Draw an insert flow chart as follows: Note 4 points:

Add new nodes first and then expand the capacity (1.7 means determine the capacity first and then expand the capacity if the capacity is insufficient). Judge whether it is a red-black tree first, and judge whether it should be a red-black tree at the end of the list insertion. The threshold for a red-black tree to become a linked list is 6, not 8, because if the length is always around 8, you’re wasting resources by going around. The reason why the red-black tree threshold is 8 is because the reasonable hash function, the probability of hitting a list of length 8 that the author calculated to be 10 million.

// From the authorhashThe collision list length is the following probability * 0: 0.60653066 * 1: 0.30326533 * 2: 0.07581633 * 3: 0.01263606 * 4: 0.00157952 * 5: 0.00015795 * 6: 0.00001316 * 7: 0.00000094 * 8: 0.00000006Copy the code

1.7 After expansion, hash & (len-1) will be used to recalculate the positions of all array elements, but 1.8 uses a simple and quick way to locate the new location: directly put in the original location/original location + old capacity

How do I understand this? If you look at the picture below,

There are two cases:

For example, the array length is 16 and the hash value of the element is 0101, 0000 0101&0000 1111 = 0000 0101. After the expansion, the hash value is 0,0000 0101&0001 1111 = 0000 0101. You can put it directly in the original location after expansion. The array length is 16, and the original hash value is 0001 0101, 0001 0101&0000 1111 = 0101. After expansion to 32, 0001 0101&0001 1111 = 0001 0101, which is 16 larger than the original hash value. Interesting! Good product, more product more interesting! I intercepted a piece of expansion code

final Node<K,V>[] resize() {/ / * * *if(loTail ! = null) { loTail.next = null; newTab[j] = loHead; }if (hiTail != null) {
       hiTail.next = null;
       newTab[j + oldCap] = hiHead;
   }
}

Copy the code