HashMap

HashMap is used to store key-value pairs and was based on arrays + linked lists before JDK1.8, which added a red-black tree structure to improve query efficiency. (Let’s talk more about thread insecurity and capacity expansion)

  • What’s an array for? What does a linked list do?

Arrays are used to store key-value pairs, and linked lists are a way to handle hash conflicts.

  • When do you create an array?

The array is not created when the map object is created, but when the data is put.

  • When is a linked list formed?

A linked list is formed if a hash collision is found in an array where the key and value pairs are in the same position. The list lookup time is O(n).

  • Why create arrays at put?

An array is a contiguous chunk of memory space in memory that takes up a lot of memory. If a map object is created, it will be wasted if it is not used.

  • Enlargement of arrays

If the array position is nearly full, the chance of forming a linked list is greatly increased, which involves expanding the array size

Therefore, when the size of the array exceeds 16*0.75=12, the size of the array will be enlarged to reduce hash conflicts and improve query efficiency. Although the capacity is expanded, the linked list still exists.

Red and black tree

  • Why red black trees?

    1. In JDK1.7, if the list is too long, the query efficiency changes from constant order to linear order. So you want to improve query efficiency. The red-black tree was introduced to improve query efficiency.
    2. In thread-safe cases, HashMap also causes DOS (denial of service) of the server. Access to the domain name’s HTTP parameter properties? name=hce,sex=1… Is stored in a HashMap, and the hash values of keys are equal, which will result in conflicts and form a linked list. If attacked by hackers, an excessively long linked list may be generated. When the server accesses the parameters, it has to traverse the linked list, which is extremely inefficient and takes up a lot of CPU.

      For example, the “Aa”, “BB”, and “C#” strings have a hash value of 2112. In a String, hashCode is rewritten, and the calculation rules are clearly viewable, so you can reverse-derive strings with the same hash value.

  • Why a red-black tree?

    • Why not a balanced binary tree? Balanced binary tree is big to the right, if the data is increasing, then always put the right child node, then this is TMD linked list!
    • Compromise: red-black tree, essentially binary sort tree. The principle is: Max <= 2 * min, and the rotation in the red-black tree is to lower the height of the tree and ensure that it is also O(LGN) in the worst case.

  • The time complexity of the red-black tree: the query and insert time are both O(logN)
  • When would a red-black tree be used?

    When the size of the array is greater than 64 and the size of the list is greater than 8, the list will be changed into a red-black tree. When the size of the red-black tree is 6, the list will be reduced to a linked list.

Algorithms in HashMap

The flow of the Hash algorithm

Add a key-value to a HashMap, calculate the Hash value from the key, and then subscript the array from the Hash value. If they conflict, they form a linked list.

Hash computing (JDK 1.8)

static final int hash(Object key) { int h; return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16); Int h = key.hashCode(); // 32-bit int temp = h>>>16; Int newHash = h ^ temp; int newHash = h ^ temp; // XOR
Improvements to the HashCaode in the HashMap

Why do XOR?

The probability of 0 and 1 occurring is equal only for the XOR equilibrium operation. This has the advantage of increasing randomness and reducing the possibility of collisions.

Array subscript evaluation

  • Array subscripts are computed by I = (n-1) & hash. N is the length of the array, the size of the array is 16, and the maximum index is n minus 1 is 15.

    Why use and? And operation can be solved once, while modular operation needs multiple operations, and operation efficiency is higher.

  • And since the array can only be a power of two,(n - 1)Is the binary digits of1. Because it’s an AND operation, if one digit is zero, then it’s zero no matter who it’s dealing with, which wastes a lot of space and increases the chance of hash collisions.
  • In a word, Hash algorithms and array location algorithms are designed to avoid Hash collisions

Parameter depth in HashMap

Static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; Static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; Static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; Static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; Static final int DEFAULT_INITIAL_CAPACITY = 1; Static final int MAXIMUM_CAPACITY = 1 << 30; Static final int MAXIMUM_CAPACITY = 1 << 30; Static final float DEFAULT_LOAD_FACTOR = 0.75f; Static final int TREEIFY_THRESHOLD = 8; Static final int TREEIFY_THRESHOLD = 8; Static final int UNTREEIFY_THRESHOLD = 6; Static final int UNTREEIFY_THRESHOLD = 6; Static final int MIN_TREEIFY_CAPACITY = 64; Static final int MIN_TREEIFY_CAPACITY = 64; /* If the size of the array is not greater than 64, the list conversion will not be performed, but the list expansion will be performed

Why is the expansion factor 0.75?

  • Load factors represent the degree to which elements in the Hash table are filled. Determines the size of the hash table and the hash conflict.
  • The larger the loading factor, the more elements are filled, the higher the space utilization, but the greater the chance of conflict. Therefore, a balance and compromise must be found between “the chance of conflict” and “the utilization of space”.

Why is the treeing parameter 8?

  • If hashCode is well distributed and the list is rarely long, then red-black trees are rarely used. Ideally, the length of the list conforms to the Poisson distribution, and when the length of the list is 8, the probability is very, very small. So, in general, the conversion from the list to the red-black tree does not occur.
  • If we find red-black tree structure inside HashMap or ConcurrentHashMap in normal development, it usually indicates that there is something wrong with our hash algorithmhashcode()Methods were improved.

Methods to analyze

The process of finding an element

First, calculate the hash value according to the key’s hashCode, then use (n-1) & hash to calculate its position in the array, and look up whether the position is stored in the linked list or the red-black tree, and then search in the linked list or the red-black tree.

The process of adding elements (PUT operation)

  • According to the firstkeyhashcodeCalculate the hash value and use it(n-1) & hashCalculates its position in the array. If the array index position is empty, insert it directly.
  • If the array is not empty, determine whether the key is repeated first. If the key is repeated, modify the value and return the original value.
  • If a red-black tree is stored in the map, callputTreeVal()Method to insert nodes as red-black trees
  • If the map stores a linked list, insert the new node to the end of the linked list. After insertion, determine whether the length of the linked list is greater than the threshold value. If it is greater than the threshold value, the linked list needs to be transformed into a red-black tree.
  • Finally, determine whether the array capacity is greater than the threshold value, and determine whether it needs expansion.

Capacity expansion mechanism resize

Creates a new array with twice the size of the old array and recalculates the storage locations of the nodes in the old array. There are only two positions for nodes in the new array, the original subscript position or the original subscript + the size of the old array.

  • The array will be repositioned after expansion. Either the original position, or the original position plus the original array length.

    Because by doubling the size, you have one more bit to perform the XOR operation, and the position depends on whether the bit is 0 or 1. If it is 1, you are adding the size of the array.

Stash 1W pieces of data in a HashMap and construct 10000 will it trigger capacity expansion?

// Set the value to 10000 and avoid resize. HashMap<Integer,Integer> map = new HashMap<>(10000); // for (int i = 0; i < 10000; i++)

Let’s first look at what we do when we initialize a HashMap to specify an initial capacity value.

  • In a HashMap, a constructor is provided to specify the initial capacityHashMap(int initialCapacity)This method will eventually call the parameters of another constructor of the HashMaploadFactorThat’s the default value of 0.75F.
  • The member variable threshold is used to store and trigger the HashMap scaling threshold. That is, scaling is triggered when the amount of data stored in a HashMap reaches the threshold.

    this.threshold = tableSizeFor(initialCapacity);
  • As you can see from the constructor logic, instead of using the initialCapacity passed externally, HashMap is processed by the tablesizeFor () method and assigned to the threshold.

So, when we pass 1W in from the outside, it actually becomes 2^14^ = 16384 after being processed by the tableSizeFor() method, and with the load factor of 0.75F, without actually causing the expansion, The data storage capacity is 12288 (16384 * 0.75F)

HashMap threads are not safe

HashMap is not thread-safe. In A multi-threaded environment, HashMap may have data loss and not be able to retrieve the latest data, such as thread A put in and thread B get out. If we want to be thread safe, we can use ConcurrentHashMap.

HashMap is different in JDK7 to 8 versions

  • Different data structures
    • The HashMap in 1.7 is an array + linked list structure
    • HashMap in 1.8 is an array + linked list + red-black tree structure
  • Linked lists are inserted differently
    • 1.7 uses the header interpolation method, which has thread safety problems during capacity expansion, resulting in a linked list loop
    • Tail insertion was used in 1.8
  • The recalculating index method is different after capacity expansion
    • 1.7 The index will be re-computed with the hash using the enlarged size
    • If the index position is 0 or 1, it will remain the same, and if the index position is 1, it will add the length of the array to the current index

ConcurrentHashMap

Hashtable is also thread-safe, but uses the synchronized keyword, which is less efficient.

ConcurrentHashMap is a thread-safe Map implementation class that comes under the JUC package.

JDK7 ConcurrentHashMap principle

ConcurrentHashMap is used to solve the problem of thread insecurity in HashMap. Hashtable is inefficient because all threads must compete for the same lock. If there are multiple locks in the container, each lock is used to lock part of the container. When multiple threads access data of different data segments of the container, there will be no lock competition among threads, thus effectively improving concurrency efficiency. This is the lock segmentation technology of ConcurrentHashMap.

Data segments are first divided into segments, and each Segment is assigned a lock. When a thread occupies a lock to access the data of one Segment, the data of other segments can also be accessed by other threads.

JDK8 ConcurrentHashMap principle

There are three main changes to JDK7:

(1) The piecewise locking mechanism is cancelled to further reduce the probability of conflict.

② The red-black tree structure is introduced. When the number of elements in the same hash slot exceeds a certain threshold, the unidirectional linked list is changed to a red-black tree structure.

③ A more optimized way is used to count the number of elements in the collection. The optimization is shown in: the update and calculation of the total number of design elements in the PUT, RESIZE and SIZE methods all avoid locking, and CAS is used instead.

GET also does not require synchronization. If no hash occurs during PUT, CAS is used to add elements, otherwise synchronized is used to add elements.

  • Synchronized only locks the first node of the current linked list or red-black tree, so as long as the hash does not conflict, there will be no concurrency and improve efficiency