HashMap of the base review

HashMap official annotation

Jdk1.8 before

Before jdk1.8, the underlying implementation of hashMap was based on an array + linked list. Here are some important elements

// Default initial capacity - must be a power of 2. static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; Static final int MAXIMUM_CAPACITY = 1 << 30; // Default load factor static final float DEFAULT_LOAD_FACTOR = 0.75f; // Initializes an empty array of entries static final Entry<? ,? >[] EMPTY_TABLE = {}; Transient entry <K,V>[] table = (Entry<K,V>[]) EMPTY_TABLE; // Transient int size; If table == EMPTY_TABLE then this is the initial capacity at which the // table will be created when inflated. int threshold; // loadFactor final float loadFactor; // The number of times the structure of the HashMap is modified, used for the iterator transient int modCount; Static class Entry<K,V> implements Map.Entry<K,V> {final K key; V value; Entry<K,V> next; int hash; Entry(int h, K k, V v, Entry<K,V> n) { value = v; next = n; key = k; hash = h; }}Copy the code

The constructor

    public HashMap(int initialCapacity, float loadFactor) {
        if (initialCapacity < 0)
            throw new IllegalArgumentException("Illegal initial capacity: " +
                    initialCapacity);
        if (initialCapacity > MAXIMUM_CAPACITY)
            initialCapacity = MAXIMUM_CAPACITY;
        if (loadFactor <= 0 || Float.isNaN(loadFactor))
            throw new IllegalArgumentException("Illegal load factor: " +
                    loadFactor);

        this.loadFactor = loadFactor;
        threshold = initialCapacity;
        init();
    }
    public HashMap(int initialCapacity) {
        this(initialCapacity, DEFAULT_LOAD_FACTOR);
    }
    public HashMap() {
        this(DEFAULT_INITIAL_CAPACITY, DEFAULT_LOAD_FACTOR);
    }
    public HashMap(Map<? extends K, ? extends V> m) {
        this(Math.max((int) (m.size() / DEFAULT_LOAD_FACTOR) + 1,
                DEFAULT_INITIAL_CAPACITY), DEFAULT_LOAD_FACTOR);
        inflateTable(threshold);

        putAllForCreate(m);
    }
Copy the code

Put method

Public V put(K key, V value) {// If table reference refers to member variable EMPTY_TABLE, then initialize HashMap (set capacity, threshold, New Entry array references) if (table == EMPTY_TABLE) {//threshold has been assigned a value in the constructor, =initialCapacity inflateTable(threshold); } if (key == null) return putForNullKey(value); int hash = hash(key); int i = indexFor(hash, table.length); for (Entry<K,V> e = table[i]; e ! = null; e = e.next) { Object k; if (e.hash == hash && ((k = e.key) == key || key.equals(k))) { V oldValue = e.value; e.value = value; e.recordAccess(this); return oldValue; } } modCount++; addEntry(hash, key, value, i); return null; }Copy the code

Analyze the source code, collate put method steps are as follows:

  1. If the table reference points to the EMPTY_TABLE member variable, the inflateTable method is called to initialize it, as shown below
Private void inflateTable(int toSize) {private void inflateTable(int toSize) {private void inflateTable(int toSize) {private void inflateTable(int toSize) { RoundUpToPowerOf2 int Capacity = roundUpToPowerOf2(toSize); // reassign the threshold to the current capacity * loadFactor threshold = (int) math. min(capacity * loadFactor, MAXIMUM_CAPACITY + 1); Initialize the table and create an Entry array whose capacity is the result of bit-based operations. Table = new Entry[capacity]; initHashSeedAsNeeded(capacity); }Copy the code
  1. If the key is null, putForNullKey is called. If the key is null, putForNullKey is called. If the key is null, putForNullKey is called. And returns the original value
private V putForNullKey(V value) { for (Entry<K,V> e = table[0]; e ! = null; e = e.next) { if (e.key == null) { V oldValue = e.value; e.value = value; e.recordAccess(this); return oldValue; ModCount++; addEntry(0, null, value, 0); return null; }Copy the code
  1. If the key is not null, the hash() function is used to calculate the hash value of the key, and xOR and unsigned right-shift operations are performed according to the hashcode of the key. In this way, the hashcode of the key is disturbed to prevent hash conflicts caused by different hashcodes having different high values but the same low values. To put it simply, it is to combine high-order features with low-order features to reduce the probability of hash conflict. In other words, it is to try to make the change of any bit have an effect on the final result.
final int hash(Object k) { int h = hashSeed; if (0 ! = h && k instanceof String) { return sun.misc.Hashing.stringHash32((String) k); } h ^= k.hashCode(); // This function ensures that hashCodes that differ only by // constant multiples at each bit position have a bounded //  number of collisions (approximately 8 at default load factor). h ^= (h >>> 20) ^ (h >>> 12); return h ^ (h >>> 7) ^ (h >>> 4); }Copy the code
  1. The indexFor() method is used to calculate the array subscript. The hash value of the key calculated in the previous step is bitwise summed with the array length -1. In general hash algorithms, in order to evenly distribute elements, the modulo method is used, that is, N %hashCode. However, since data exists in binary form in computer, the performance of & is stronger than %. In a hashMap, the length of the array is a power of 2, ensuring that (h & (length-1)) == (h % length) is always true, also reducing hash collisions.
    static int indexFor(int h, int length) {
        // assert Integer.bitCount(length) == 1 : "length must be a non-zero power of 2";
        return h & (length-1);
    }
Copy the code
  1. If the calculated subscripts already have data in the array with the same key, value is replaced and the old value is returned.
  2. If there are no key-value pairs at the subscript of the current array or no key-value pairs with the same key exist, modCount++ and addEntry() is called.
void addEntry(int hash, K key, V value, int bucketIndex) { // if ((size >= threshold) && (null ! = table[bucketIndex])) { resize(2 * table.length); hash = (null ! = key) ? hash(key) : 0; bucketIndex = indexFor(hash, table.length); } createEntry(hash, key, value, bucketIndex); }Copy the code

6.1 Check whether size is greater than the threshold and whether elements exist in the current position. If yes, call resize() to expand capacity.

Void resize(int newCapacity) {// Assign the old array to the new entry array entry [] oldTable = table; int oldCapacity = oldTable.length; if (oldCapacity == MAXIMUM_CAPACITY) { threshold = Integer.MAX_VALUE; return; } // Create a new array Entry[] newTable = new Entry[newCapacity]; transfer(newTable, initHashSeedAsNeeded(newCapacity)); table = newTable; threshold = (int)Math.min(newCapacity * loadFactor, MAXIMUM_CAPACITY + 1); }Copy the code

6.1.1 Transfer the data in the original array to the new array. By analyzing the source code, it can be seen that the linked list elements are reversed after expansion.

void transfer(Entry[] newTable, boolean rehash) { int newCapacity = newTable.length; For (Entry<K,V> e: table) {while(null! = e) { Entry<K,V> next = e.next; //rehash if (rehash) { e.hash = null == e.key ? 0 : hash(e.key); Int I = indexFor(e.hash, newCapacity); E.ext = newTable[I]; newTable[i] = e; e = next; }}}Copy the code

6.1.2 Call createEntry() to place the element in the current position. It can be seen that in 1.7, when adding elements to the linked list, the time-header method was adopted. The transfer after expansion was analyzed above. Combined with this, it can be concluded that in 1.7, the linked list was reversed after expansion, and linked list rings would appear in the case of multi-threading.

    void createEntry(int hash, K key, V value, int bucketIndex) {
        Entry<K,V> e = table[bucketIndex];
        table[bucketIndex] = new Entry<>(hash, key, value, e);
        size++;
    }
    Entry(int h, K k, V v, Entry<K,V> n) {
            value = v;
            next = n;
            key = k;
            hash = h;
    }
Copy the code

The get method

    public V get(Object key) {
        if (key == null)
            return getForNullKey();
        Entry<K,V> entry = getEntry(key);

        return null == entry ? null : entry.getValue();
    }
Copy the code

Analyze the source code and sort out the get steps as follows:

  1. If the key is null, call getForNullKey() and return.
  2. Call getEntry(key) to get the corresponding Entey object
final Entry<K,V> getEntry(Object key) { if (size == 0) { return null; } // Get the hash value of the key, same as the put method int hash = (key == null)? 0 : hash(key); Get the corresponding Entry for (Entry<K,V> e = table[indexFor(hash, table.length)]; e ! = null; e = e.next) { Object k; if (e.hash == hash && ((k = e.key) == key || (key ! = null && key.equals(k)))) return e; } return null; }Copy the code

After jdk1.8

After JDk1.8, the underlying implementation is array + linked list + red-black tree. Here are some important elements:

Static final int TREEIFY_THRESHOLD = 8; Static final int UNTREEIFY_THRESHOLD = 6; Static final int MIN_TREEIFY_CAPACITY = 64;Copy the code

Put method

    public V put(K key, V value) {
        return putVal(hash(key), key, value, false, true);
    }
Copy the code

Analyze the source code, collate 1.8 put method steps as follows:

  1. Hash (key), which is different from the hash method in 1.7, but in essence, hashCode is spread as low as possible to reduce hash conflicts.
    static final int hash(Object key) {
        int h;
        return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
    }
Copy the code
  1. Call the putVal method. The put method in 1.8 has been clearly described through the source code comments, so let’s dig into some of the key points involved.
Final V putVal(int hash, K key, V value, Boolean onlyIfAbsent, Boolean evict) {// Define two nodes TAB, p, n, I Node<K,V>[] TAB; Node<K,V> p; int n, i; / / if the current array is null or length is 2.1 0, expansion the if ((TAB = table) = = null | | (n = TAB. Length) = = 0) n = (TAB = the resize ()). The length; // If the array is not empty, If (p = TAB [I = (n-1) & hash]) == null) // If (p = TAB [I = (n-1) & hash]) == null) // If (p = TAB [I] = newNode(hash, key, value, null); // if not empty else {Node<K,V> e; K k; / / determine whether can be the same, the same is covered the if (p.h ash = = hash && ((k = p.k ey) = = key | | (key! = null && key.equals(k)))) e = p; Else if (p instanceof TreeNode) // if (p instanceof TreeNode) PutTreeVal () adds e = ((TreeNode<K,V>)p). PutTreeVal (this, TAB, hash, key, value); For (int binCount = 0; for (int binCount = 0; ; If ((e = p.ext) == null) {// If (e = p.ext) == null) {// If (e = p.ext) == null) {// If (e = p.ext) == null); If (binCount >= treeify_threshthreshold -1) // -1 for 1st // 2.3 Treating treeifyBin(TAB, hash); break; } / / secondary judge whether the key is the same if (e.h ash = = hash && ((k = e.k ey) = = key | | (key! = null && key.equals(k)))) break; // next node p = e; } // If e is null, it means that the list has been traversed to the end of the list, and a new node has been added. =null; if (e! = null) { // existing mapping for key V oldValue = e.value; if (! onlyIfAbsent || oldValue == null) e.value = value; afterNodeAccess(e); return oldValue; ++modCount; If (++size > threshold) resize(); afterNodeInsertion(evict); return null; }Copy the code

2.1 the resize ()

Final Node<K,V>[] resize() {// New variable to receive old array Node<K,V>[] oldTab = table; Int oldCap = (oldTab == null)? 0 : oldTab.length; Int oldThr = threshold; Int newCap, newThr = 0; If (oldCap > 0) {if (oldCap > 0) {// If (oldCap > 0) { Return array if (oldCap >= MAXIMUM_CAPACITY) {threshold = integer.max_value; return oldTab; Else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY && oldCap >= DEFAULT_INITIAL_CAPACITY) newThr = oldThr << 1; // Double threshold} // If the size of the old array is greater than 0, set the size of the new array to that threshold. else if (oldThr > 0) // initial capacity was placed in threshold newCap = oldThr; // The map is created by calling the constructor without arguments. // Zero initial threshold Using defaults newCap = DEFAULT_INITIAL_CAPACITY; newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY); } // If (newThr == 0) {float ft = (float)newCap * loadFactor; newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ? (int)ft : Integer.MAX_VALUE); } // After capacity expansion threshold = newThr; @suppressWarnings ({"rawtypes","unchecked"}) // Generate new hash bucket Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap]; //table refers to a new array table = newTab; // The original array is not empty if (oldTab! = null) {// iterate over old array for (int j = 0; j < oldCap; ++j) { Node<K,V> e; if ((e = oldTab[j]) ! OldTab [j] = null; oldTab[j] = null; oldTab[j] = null; NewTab [e.hash & (newCap-1)] = e; if (e.next == null) newTab[e.hash & (newcap-1)] = e; Else if (e instanceof TreeNode) // ((TreeNode<K,V>)e).split(this, newTab, j, oldCap); Else {// Preserve order Node<K,V> loHead = null, loTail = null; // Node<K,V> hiHead = null, hiTail = null; // Node<K,V> next; // oldCap to newCap-1; oldCap to newcap-1 do {next = e.next; // Since the most significant binary bit of the array length is 1 (for example, 16 is 10000), only *.. 0**** and 10000 will result in 00000 (*.. Represents uncertain multiple binary bits). // If the length of the subscript is reduced by 1, the subscript = (*.. 0**** & 1111) also = (*.. 0**** & 11111) // So the mod value will not change if the hash value is compared to the length of the new array. That is to say, the position of the element in the new array is the same as that in the old array, so the element can be placed in the lower list. If ((e.hash & oldCap) == 0) {if (loTail == null)// If there is no tail, the list is empty loHead = e; else loTail.next = e; // If there is a tail, then the list is not empty, hang the element at the end of the list. loTail = e; } // If the hash value is not 0, the hash value is greater than the old array length (for example, 17) // The element should be placed at the higher level of the new array // for example: The old array length is 16, so the new array length is 32, the hash of 17 should be placed at the 17th position of the array, that is, the index of 16, so the index of 16 is already high, and the low value is [0-15]. Else {if (hiTail == null) hiHead = e; else hiTail.next = e; hiTail = e; } } while ((e = next) ! = null); if (loTail ! Lotail.next = null; lotail.next = null; newTab[j] = loHead; } if (hiTail ! = null) {// The position of the list is offset by the length of the old array. hiTail.next = null; newTab[j + oldCap] = hiHead; } } } } } return newTab; }Copy the code

The get method

final Node<K,V> getNode(int hash, Object key) { Node<K,V>[] tab; Node<K,V> first, e; int n; K k; If ((TAB = table)! = null && (n = tab.length) > 0 && (first = tab[(n - 1) & hash]) ! = null) {// If the hash of the first node of the list is the same as that of the key, And the key is the same as the key that the first node (address same or equals) if (first. The hash = = hash && / / always check first node ((k = first. Key) = = key | | (key! = null && key.equals(k)))) // return first; if ((e = first.next) ! If (first instanceof TreeNode) return ((TreeNode<K,V>)first).getTreenode (hash, key); / / common list, loop through the do {the if (e.h ash = = hash && ((k = e.k ey) = = key | | (key! = null && key.equals(k)))) return e; } while ((e = e.next) ! = null); } } return null; }Copy the code

The difference between the two versions

  1. The underlying data structure is different. Red-black trees were introduced after 1.8.
  2. After 1.8, the original Entry object was replaced by a Node object, which is essentially a linked list
  3. For nullkey processing, 1.7 putForNullKey processing, 1.8 directly hash() processing
  4. The hash() method is different. 1.7 does four hash operations, 1.8 does only one (h = key.hashcode ()) ^ (h >>> 16)
  5. When inserting elements into the linked list, the header interpolation method is used in 1.7. Every time new elements are placed in the head of the linked list, the linked list will be reversed in the transfer method during expansion. In the case of multi-threading, the problem of linked list loops will occur

Red and black tree

Reference articles Meituan technical team (tech.meituan.com/2016/12/02/…)

Frequently seen exam

  1. The data structure of HashMap?
  2. How does HashMap work?
  3. What happens when two objects have the same hashCode?
  4. Do you know the implementation of Hash? Why is it implemented this way?
  5. Why use xOR operators?
  6. How to determine the size of a HashMap table? What is loadFactor? How does this capacity change? What problems will this change bring?
  7. The procedure for the put method in HashMap?
  8. Array expansion process?
  9. Why not use a binary search tree instead of a red-black tree? Why not always use red black trees?
In special cases, binary trees degenerate into linear structures (like linked lists), and traversal searches are slow. The red-black tree after inserting new data may need to be by a left-handed, right-handed, color changing these operations to maintain a balance, the introduction of a red-black tree is to find data block, solve the problems of the chain table query depth, we know that the red-black tree to balanced binary tree, but in order to maintain the "balance" is the need to pay the price, but the price to the depletion of the resources less than traverse linear list, So when the length is greater than 8, we use a red-black tree, but if the list length is very short, we don't need to introduce a red-black tree at all, it will be slow.Copy the code
  1. What is a red-black tree?
  2. What changes have been made to HashMap in JDK8?
  3. What is the difference between HashMap, LinkedHashMap, and TreeMap?
  4. How to use HashMap & TreeMap & LinkedHashMap?
  5. The difference between HashMap and ConcurrentHashMap?
  6. What is the difference between HashMap and HashTable?
  7. What are the common hash algorithms?
  8. Fast failure and safe failure?
The collection classes in the java.util package fail quickly; The java.util.concurrent package allows classes to fail safely by testing the modCount and expectedModCount values...Copy the code

In this article, only put and get are analyzed. There are still many things missing, such as traversal, etc., which need to be added…