Basic implementation principles of HashMap

HashMap is a common question in interviews. Most of the time we just know that HashMap allows Null key/value pairs and is not thread-safe, which can be problematic if used in a multi-threaded environment. This is something we would normally say in an interview, but sometimes when asked about the underlying source code analysis, why Null is allowed, why is not safe, these questions seem difficult to answer without analyzing the source code, so let’s look at the source code. Let’s see why.

HashMap first appeared in JDK1.2, and its underlying hash algorithm is based. Allowing both key-value pairs to be Null and non-thread-safe, let’s take a look at the HashMap data structure in the 1.8 JDK.

The HashMap is illustrated below

The bucket array is the body of the HashMap, and the linked list is designed to resolve hash conflicts. However, many people do not know that the HashMap contains a tree structure, but there is a note of care. When does a red-black tree structure appear? We have to look at the source code, the source code explains that the default list length greater than 8 will be converted to a tree. Let’s look at the source code

structure

/** * Basic hash bin node, used for most entries.  (See below for * TreeNode subclass, and in LinkedHashMap forIts Entry subclass.) */ /** Node yeshash*/static class Node<K,V> implements map. Entry<K,V> {final inthash; final K key; V value; Node<K,V> next; // constructor Node(inthash, K key, V value, Node<K,V> next) {        this.hash = hash;        this.key = key;        this.value = value;        this.next = next;    }  public final K getKey()        { return key; }  public final V getValue()      { return value; }  public final String toString() { return key + "=" + value; }  public final int hashCode() {              return Objects.hashCode(key) ^ Objects.hashCode(value);   }  public final V setValue(V newValue) {              V oldValue = value;              value = newValue;              return oldValue;  }  public final boolean equals(Object o) {      if (o == this)          return true;      if(o instanceof Map.Entry) { Map.Entry<? ,? > e = (Map.Entry<? ,? >)o;if (Objects.equals(key, e.getKey()) &&              Objects.equals(value, e.getValue()))              return true;      }      return false;  }}Copy the code

Then there’s the tree structure

TreeNode is a red-black tree data structure.

     /**     * Entry for Tree bins. Extends LinkedHashMap.Entry (which inturn * extends Node) so can be used as extension of either regular or * linked node. */ static final class TreeNode<K,V>  extends LinkedHashMap.Entry<K,V> { TreeNode<K,V> parent; // red-black tree links TreeNode<K,V> left; TreeNode<K,V> right; TreeNode<K,V> prev; // needed to unlink next upon deletion boolean red; TreeNode(inthash, K key, V val, Node<K,V> next) {            super(hash, key, val, next);        }     /**      * Returns root of tree containing this node.      */     final TreeNode<K,V> root() {         for (TreeNode<K,V> r = this, p;;) {             if ((p = r.parent) == null)                 returnr; r = p; }}Copy the code

Let’s look at the definition of a class

public class HashMap<K,V> extends AbstractMap<K,V>    implements Map<K,V>, Cloneable, Serializable {Copy the code

Inherits abstract Map, implements map interface, and carries on serialization.

There are also basic variables in the class

variable

/** * The default initial capacity - MUST be a power of two. */static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16/** * The maximum capacity, usedifa higher value is implicitly specified * by either of the constructors with arguments. * MUST be a power of two <= Static final int MAXIMUM_CAPACITY = 1<<30; /** * The load factor used when none specifiedinConstructor. * Default loading factor used to calculate threshold */ static finalfloatDEFAULT_LOAD_FACTOR = 0.75 f; /** * The bin count thresholdfor using a tree rather than list for a * bin.  Bins are converted to trees when adding an element to a * bin with at least this many nodes. The value must be greater * than 2 and should be at least 8 to mesh with assumptions in* Tree removal about conversion back to plain bins upon * Shrinkage. * threshold = Capacity * loadFactor */static final int TREEIFY_THRESHOLD = 8; * threshold = Capacity * loadFactor */static final int TREEIFY_THRESHOLD = 8; /** * The bin count thresholdforuntreeifying a (split) bin during a * resize operation. Should be less than TREEIFY_THRESHOLD, And at * most 6 to mesh with Shrinkage detection under removal. */static final int UNTREEIFY_THRESHOLD = 6; /** * The smallest table capacityfor which bins may be treeified. * (Otherwise the table is resized if too many nodes ina bin.) * Should be at least 4 * TREEIFY_THRESHOLD to avoid conflicts * between resizing and treeification thresholds. * The minimum size of the table corresponding to the structure in the bucket converted into a red-black tree * will be resolved when neededhashWhen a conflicting list is turned into a red-black tree, * determine the size of the array, * if the size is too small (less than MIN_TREEIFY_CAPACITY) *hashIf there are too many conflicts, the list will not be converted to a red-black tree and * will use the resize() function pair insteadhash*/static final int MIN_TREEIFY_CAPACITY = 64; /** * The table, initialized on first use, and resized as * necessary. When allocated, length is always a power of two. * (We also tolerate length zeroinSome operations to allow * bootstrapping mechanics that are currently not needed.) * Hold an array of Node<K,V> nodes * the table is initialized when first used, And adjust the size as needed. When assigned, the * length is always a power of 2. */transient Node<K,V>[] table; /** * Holds cached entrySet(). Note that AbstractMap fields are used *forKeySet () and values(). * TRANSIENT Set< map.entry <K,V>> entrySet; /** * The number of key-value mappings containedin* Record this maphashMap Number of elements currently stored */ TRANSIENT int size; /** * The number oftimes this HashMap has been structurally modified * Structural modifications are those that change the number of mappings in * the HashMap or otherwise modify its internal structure (e.g., * rehash). This field is used to make iterators on Collection-views of * the HashMap fail-fast. (See ConcurrentModificationException). * every change map structure of counter * / transient int modCount; /** * The next size value atwhichTo resize (Capacity * load factor). * Threshold When The actual size (capacity * fill factor) exceeds The threshold, capacity expansion is performed * @serial *// (The Javadoc description istrue upon serialization.// Additionally, ifthe table array has not been allocated, this// field holds the initial array capacity, or zero signifying// DEFAULT_INITIAL_CAPACITY.)int threshold; /** * The load factorfor the hashTable. * Load factor: The next size value to resize (capacity * load factor). * @serial */finalfloat loadFactor;Copy the code

Let’s look at the constructor

A constructor

/** * Constructs an empty <tt>HashMap</tt> with the specified initial * capacity and the default load factor (0.75). ** Constructs an empty <tt>HashMap</tt> with the specified initial * capacity and the default load factor (0.75)  @param initialCapacity the initial capacity. * @throws IllegalArgumentExceptionifThe initial capacity is negative. Public HashMap(int initialCapacity) {this(initialCapacity, DEFAULT_LOAD_FACTOR); }/** * Constructs an empty <tt>HashMap</tt> with the default initial capacity * (16) and the default load factor (0.75). * Default capacity and load factor */publicHashMap() {    this.loadFactor = DEFAULT_LOAD_FACTOR; // all other fields defaulted}/** * Constructs an empty <tt>HashMap</tt> with the specified initial * capacity and load factor. * * @param  initialCapacity the initial capacity * @param  loadFactor      the load factor * @throws IllegalArgumentException ifThe initial capacity is negative * or the load factor is nonpositive * The initial capacity and load factor are passed to initialize the HashMap object */public HashMap(int initialCapacity,floatLoadFactor) {// The initial capacity cannot be less than 0, otherwise an error is reportedif (initialCapacity < 0)        throw new IllegalArgumentException("Illegal initial capacity: "+ initialCapacity); // The initial capacity cannot be greater than the maximum capacityif(initialCapacity > MAXIMUM_CAPACITY) initialCapacity = MAXIMUM_CAPACITY; // The load factor cannot be less than or equal to 0 and cannot be a non-numberif (loadFactor <= 0 || Float.isNaN(loadFactor))        throw new IllegalArgumentException("Illegal load factor: "+ loadFactor); // initialize loadFactor this.loadFactor = loadFactor; This. threshold = tableSizeFor(initialCapacity); }/** * Returns a power of two sizeforThe given target capacity. * Find greater than or equal tocap*/static final int tableSizeFor(intcap) {    int n = cap - 1;    n |= n >>> 1;    n |= n >>> 2;    n |= n >>> 4;    n |= n >>> 8;    n |= n >>> 16;    return(n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1; }Copy the code

In this source code, the loadFactor is a very important parameter because it reflects the usage of the HashMap bucket array, so that the time complexity of the HashMap can vary.

When the load factor is low, the number of key-value pairs that HashMap can accommodate is too small. After expansion, the key-value pair is stored in bucket array again, the collisions between keys will be reduced, and the length of the linked list will be shortened accordingly.

But if you increase the load factor when the load factor is greater than 1, the HashMap can hold more key-value pairs, which increases the number of collisions, which increases the length of the linked list. Normally, the load factor is not changed. The default is 0.75.

Expansion mechanism

Resize () is a method to recalculate the capacity, let’s look at the source code:

/** * Initializes or doubles table size.  If null, allocates in * accord with initial capacity target held in field threshold. * Otherwise, because we are using power-of-two expansion, the * elements from each bin must either stay at same index, or move * with a power of two offset in the new table. * * @return the table */final Node<K,V>[] resize() {// reference the Entry array Node<K,V>[] oldTab = table; int oldCap = (oldTab == null) ? 0 : oldTab.length; int oldThr = threshold; int newCap, newThr = 0;if(oldCap > 0) {// If the size of the array before expansion has reached the maximum size (2^30) // Check whether the size has reached the maximum sizeif(oldCap >= MAXIMUM_CAPACITY) {// Change the threshold to the maximum value of int (2^31-1), so that the capacity will not be expanded. Threshold = integer.max_value;returnoldTab; } // If the capacity is smaller than the maximum and the old bucket is larger than the initial capacity by 16, the threshold is moved to the left by 1(multiplied by 2)else if((newCap = oldCap << 1) < MAXIMUM_CAPACITY && oldCap >= DEFAULT_INITIAL_CAPACITY) newThr = oldThr << 1; // Double threshold} // If the bucket capacity <=0 and the old threshold >0else if (oldThr > 0) // initial capacity was placed inThreshold // The new capacity equals the old threshold newCap = oldThr;else{// Zero initial threshold Using defaults // So if the array bucket capacity <=0 and the old threshold <=0 // New capacity = default capacity // New threshold = load factor * default capacity newCap = DEFAULT_INITIAL_CAPACITY; newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY); } // If the new threshold is 0if(newThr == 0) {// Recalculate the thresholdfloat ft = (float)newCap * loadFactor;        newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ? (int)ft : Integer.MAX_VALUE); } // This will update the threshold = newThr; @SuppressWarnings({"rawtypes"."unchecked"}) / / create a new array Node < K, V > [] newTab = (Node < K, V > []) new Node [newCap]; // table = newTab; // If the old bucket is not empty, the bucket is traversed and the key-value pair is mapped to the new bucket. // There is a little bit weird here. 1.7 does not have a red-black tree behind it, but 1.8 doesif(oldTab ! = null) {for (int j = 0; j < oldCap; ++j) {            Node<K,V> e;            if((e = oldTab[j]) ! = null) { oldTab[j] = null;if(e.next == null) newTab[e.hash & (newCap - 1)] = e; // If it is a red-black treeelse if(TreeNode<K,V>)e).split(this, newTab, j, oldCap);else{// preserve order // If it is not a red and black tree, it is still a linked list, // Node<K,V> loHead = null, loTail = null; Node<K,V> hiHead = null, hiTail = null; Node<K,V> next; // Iterate through the list and group the list nodes in the same orderdo {                        next = e.next;                        if ((e.hash & oldCap) == 0) {                            if (loTail == null)                                loHead = e;                            else                                loTail.next = e;                            loTail = e;                        }                        else {                            if (hiTail == null)                                hiHead = e;                            elsehiTail.next = e; hiTail = e; }}while((e = next) ! = null); // Map the grouped list to the new bucketif(loTail ! = null) { loTail.next = null; newTab[j] = loHead; }if(hiTail ! = null) { hiTail.next = null; newTab[j + oldCap] = hiHead; } } } } }returnnewTab; }Copy the code

So after resize, the position of the element will either be the same as it was before, or it will be moved to the power of 2. Comments on the source code can also be translated

/**     * Initializes or doubles table size.  If null, allocates in     * accord with initial capacity target held in field threshold.     * Otherwise, because we are using power-of-two expansion, the     * elements from each bin must either stay at same index, or move     * with a power of two offset in the new table.     *     * @returnIf the table is null, the initial capacity target saved in the field threshold is allocated. Otherwise, because we are using a power of 2 extension, each element in the bin must remain the same index, or move at an offset of 2 in the new table. */ final Node<K,V>[] resize() .....Copy the code

So it’s kind of interesting how he’s doing it, there are three different ways of doing it,

When the HashMap is first initialized, using the default construct initialization will return an empty table with thershold set to 0, so the default value for the first expansion will be 16. Thershold = DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY = 16*0.75 = 12.

Threshold = DEFAULT_LOAD_FACTOR * threshold(current capacity)

If the HashMap is not expanded for the first time and has already been expanded, then the table capacity each time

The threshold is going to be twice as big.

Before looking at the 1.7 source, there is no red black tree, but in 1.8 after the corresponding optimization. The extension of the length to the second power is used. And when extending the HashMap, it doesn’t need to recalculate the hash as JDK1.7 does, leaving it with time to compute the hash.

After reading this source code, translated a section of English, is roughly understand a bit of source content, what discussion we can discuss together, thank you for watching.

Java Geek technology public account, is set up by a group of technical people who love Java development, focus on sharing original, high quality Java articles. If you think our article is good, please help to appreciate, read, forward support, encourage us to share better articles.

Pay attention to the public account, you can reply “nuggets” in the background of the public account, to obtain the author Java knowledge system/interview must see information.

Basic implementation principles of HashMap

The HashMap is illustrated below

structure

Then there’s the tree structure

Let’s look at the definition of a class

variable

A constructor

Expansion mechanism

Related Posts

OpenGL ES image loading and texture flipping

Hive: Common DDL operations for Hive

Baidu alliance violated advertising law implicated Lu Songsong blog