A HashMap overview

If you don’t have time to dig into this article, you can go straight to the HashMap overview to get a sense of what hashMaps are.

HashMap is an implementation of the Map interface. HashMap allows empty key-value pairs. HashMap is considered an enhanced version of Hashtable, which is a non-thread-safe container. If you want to construct a thread-safe Map consider using ConcurrentHashMap. A HashMap is unordered because a HashMap cannot guarantee the order of the key-value pairs stored internally.

The underlying data structure of a HashMap is a collection of arrays + linked lists. Arrays are also called buckets in a HashMap. The time required to traverse a HashMap is the number of HashMap instance buckets + the number of key-value mappings. Therefore, if traversal elements is important, do not set the initial capacity too high or the load factor too low.

The initial capacity refers to the number of hash buckets. The load factor is a measure of how full the hash table is. When there are enough entries in the hash table to exceed the load factor and the current capacity, The hash table will be rehashed and the internal data structure will be rebuilt.

Note that HashMap is not thread-safe, and if multiple threads affect the HashMap at the same time, and at least one thread changes the structure of the HashMap, then the HashMap must be synchronized. You can use the Collections. SynchronizedMap (new HashMap) to create a Map of thread safety.

HashMap causes all external remove methods except the iterator’s own remove to cause fail-fast, so try to use the iterator’s own remove method. If in the process of the iterator create modified the structure of the map, will throw ConcurrentModificationException.

Let’s talk about the details of HashMap. Let’s start with the interview questions.

Difference between HashMap and HashTable

We introduced HashMap above, now we introduce HashTable

The same

Both HashMap and HashTable are implemented based on hash tables, where each element is a key-value pair. Both HashMap and HashTable implement the Map, Cloneable, and Serializable interfaces.

The difference between

  • Different parent classes: HashMap inherits AbstractMap class, while HashTable inherits Dictionary class

  • Empty values are different: HashMap allows empty keys and values, HashTable does not. A HashMap treats a Null key as a normal key. Null key duplication is not allowed.

  • Thread-safe: A HashMap is not thread-safe. If multiple external operations modify the HashMap data structure at the same time, such as add or DELETE, synchronization must be performed. Changes to a key or value alone are not changes to the data structure. Can choose to construct thread-safe Map such as Collections. SynchronizedMap or ConcurrentHashMap. And HashTable itself is a thread-safe container.

  • Performance: Although HashMap and HashTable are based on single linked lists, HashMap can achieve constant time performance by performing put or get􏱤 operations. However, the put and get operations of HashTable are added with synchronized lock, so the efficiency is poor.

  • The initial size of the HashTable is different: the initial size of the HashTable is 11, and each subsequent expansion is 2n+1 (n is the last length).

    The initial length of a HashMap is 16, and it doubles that size each time it expands. When created, if given an initial capacity value, the HashTable will use the size you gave it, and the HashMap will expand it to a power of two.

The difference between HashMap and HashSet

The difference between a HashMap and a HashSet is also often asked

HashSet extends AbstractSet and implements Set, Cloneable, and Java.io.Serializable interfaces. A HashSet does not allow duplicate values in the collection. A HashSet is essentially a HashMap, and all operations on a HashSet are operations on a HashMap. So a HashSet does not guarantee the order of collections.

Underlying structure of HashMap

To understand the structure of a class, take a look at the structure of a HashMap:

The three main classes (interfaces) are HashMap,AbstractMap, and Map. HashMap has been briefly introduced in the overview above, and AbstractMap is introduced next.

AbstractMap class

This abstract class is the backbone implementation of the Map interface to minimize the workload of the implementation class. To implement an immutable map, the programmer simply inherits this class and provides an implementation of the entrySet method. It will return a segment of a set of maps. Typically, the returned collection will be implemented on Top of AbstractSet. This set should not support add or remove methods, and its iterators do not support remove methods.

In order to achieve the modifiable map, programmers have to rewrite this class put extra method (or it will throw an UnsupportedOperationException), And the iterator returned by entryset.iterator () must implement remove().

The Map interface

The Map interface defines standards for key-value pairs. An object supports key-value storage. A Map cannot contain duplicate keys. Each key maps a maximum of one value. This interface replaces the Dictionary class, which is an abstract class rather than an interface.

The Map interface provides a constructor for three collections, which allows the contents of a Map to be treated as a set of keys, a set of values, or a set of key-value mappings. The order of a map is defined as the order in which iterators on a map map collection return their elements. Some map implementations, such as the TreeMap class, ensure that maps are ordered; Other implementations, such as HashMap, are not guaranteed.

Important inner classes and interfaces

The Node interface

The Node Node is used to store each instance of a HashMap. It implements the Map.Entry interface. Let’s first look at the definition of the Entry interface in the Map

Map.Entry

The map.entryset () method returns a view of the collection containing the elements of the class,
// The only way to do this is to iterate through the view of the collection and get a map entry chain. These map.Entry chains are only available in
// Valid during iteration.
interface Entry<K.V> {
  K getKey(a);
  V getValue(a);
  V setValue(V value);
  boolean equals(Object o);
  int hashCode(a);
}
Copy the code

Node stores four attributes, hash, key, value, and references to the next Node

 / / hash value
final int hash;
/ / key
final K key;
/ / value
V value;
// Point to the Node type of the next Node
Node<K,V> next;
Copy the code

Because Map.Entry is connected by an Entry chain, Node is also connected by an Entry chain. When constructing a new HashMap instance, these four attribute values are passed in

Node(int hash, K key, V value, Node<K,V> next) {
  this.hash = hash;
  this.key = key;
  this.value = value;
  this.next = next;
}
Copy the code

The map. Entry interface is implemented so methods must be implemented, so Node includes the above five methods

KeySet inner class

The AbstractSet abstract class is derived from the AbstractSet abstract class. This class is created using the keySet () method in a HashMap to create an instance of a keySet that operates on a key in a HashMap

Map.keyset () returns a Set interface. KeySet() is defined in the map interface. But is implemented by HashMap, look at the source code to understand

// Return a set view that contains the map key.
public Set<K> keySet(a) {
  // // keySet points to a keySet in AbstractMap
  Set<K> ks = keySet;
  if (ks == null) {
    // If ks is empty, create a KeySet object
    // and assign to ks.
    ks = new KeySet();
    keySet = ks;
  }
  return ks;
}
Copy the code

So the KeySet class operates on keys in the Map:

Values within the class

The Values class is created similar to the KeySet class, except that the KeySet class operates on keys in a Map and the Values class operates on Values in a key-value pair.

Loop through the values in the Map and see what the values() method ends up creating:

public Collection<V> values(a) {
  // Values are actually values in AbstractMap
  Collection<V> vs = values;
  if (vs == null) {
    vs = new Values();
    values = vs;
  }
  return vs;
}
Copy the code

AbstractMap All values are actually stored in AbstractMap, and the values class implements the values interface of Map

In fact, it is the same operation as key

EntrySet inner class

Create an EntrySet that operates on a key or value pair in a HashMap. Create an EntrySet that operates on a key or value pair in a HashMap.

Clicking on entrySet() shows that this method is also defined in the Map interface and overridden by HashMap

// Return a set view containing the key-value pairs in the map
public Set<Map.Entry<K,V>> entrySet() {
  Set<Map.Entry<K,V>> es;
  return (es = entrySet) == null ? (entrySet = new EntrySet()) : es;
}
Copy the code

If es is null and a new EntrySet instance is created, the EntrySet consists mainly of methods for mapping key-value pairs, as follows

The underlying structure of HashMap 1.7

In JDK1.7, HashMap is implemented in bitbuckets and linked lists. That is, linked lists are used to handle conflicts. Lists with the same hash value are stored in an array. However, when there are many elements in a bucket, that is, there are many elements with the same hash value, searching by key value is inefficient. Its data structure is as follows

The underlying data structure of a HashMap is an Entry array, which is the basic component of a HashMap. Each Entry contains a key-value pair.

transient Entry<K,V>[] table = (Entry<K,V>[]) EMPTY_TABLE;
Copy the code

Each Entry contains hash, key, and value attributes, which are internal classes of the HashMap

static class Entry<K.V> implements Map.Entry<K.V> {
  final K key;
  V value;
  Entry<K,V> next;
  int hash;
  
  Entry(inth, K k, V v, Entry<K,V> n) { value = v; next = n; key = k; hash = h; }... }Copy the code

So the overall structure of a HashMap looks like this

The underlying structure of HashMap 1.8

Compared to JDK 1.7, 1.8 has some changes in the underlying structure. When each bucket element is greater than 8, it is converted to a red-black tree to optimize query efficiency. JDK 1.8 rewrites the resize() method.

HashMap Important attributes

Initial capacity

The default initial capacity of a HashMap is managed by the DEFAULT_INITIAL_CAPACITY property.

static final int DEFAULT_INITIAL_CAPACITY = 1 << 4;
Copy the code

The default initial capacity of HashMaap is 1 << 4 = 16. << is a left-shift operation, which is equivalent to

The maximum capacity

The maximum capacity of a HashMap is

static final int MAXIMUM_CAPACITY = 1 << 30;
Copy the code

Is there a question here? Int takes up four bytes. The maximum size of a HashMap should be moved 31 bits to the left. Because in numerical calculation, the highest bit, that is, the leftmost bit, represents the sign of 0 -> positive number and 1 -> negative number, the capacity cannot be negative number, so the highest bit of HashMap can only be shifted to the power of 2 ^ 30.

Default load factor

The default load factor for a HashMap is

static final float DEFAULT_LOAD_FACTOR = 0.75 f;
Copy the code

The load factor is related to the capacity expansion mechanism, which is discussed here and discussed later. The principle of the scaling mechanism is that when the amount stored in a HashMap is greater than the HashMap capacity * load factor, the HashMap capacity is doubled.

The first capacity expansion of HashMap occurs when DEFAULT_INITIAL_CAPACITY * DEFAULT_LOAD_FACTOR = 12.

Tree threshold

The treification threshold for HashMap is

static final int TREEIFY_THRESHOLD = 8;
Copy the code

When adding elements, the number of elements stored in a bucket is greater than 8, it is automatically converted to a red-black tree (JDK1.8 feature).

List the threshold

The linked list threshold for a HashMap is

static final int UNTREEIFY_THRESHOLD = 6;
Copy the code

If the number of elements stored in a bucket is less than 6, the bucket is automatically converted to a linked list

Expansion critical value

static final int MIN_TREEIFY_CAPACITY = 64;
Copy the code

This value indicates that when the capacity of the bucket array is smaller than this value, it is preferred to expand the bucket array rather than tree it

Nodes in the array

The node array in a HashMap is an Entry array, which represents the array in the HashMap array + linked list data structure.

transient Node<K,V>[] table;
Copy the code

The Node array is initialized the first time it is used, resized if necessary, and then doubled in size.

Number of key-value pairs

In a HashMap, size is used to indicate the number of key-value pairs in the HashMap.

Modify the number of times

In HashMap, modCount is used to indicate the number of changes, which is mainly used for the fast fail-fast mechanism for concurrent changes of HashMap.

Capacity threshold

In a HashMap, threshold is used to represent the capacity expansion threshold, which is the initial capacity x load factor value.

Threshold involves a capacity expansion threshold problem that is solved by tableSizeFor source code. Let’s take a look at its source code and then explain

static final int tableSizeFor(int cap) {
    int n = cap - 1;
    n |= n >>> 1;
    n |= n >>> 2;
    n |= n >>> 4;
    n |= n >>> 8;
    n |= n >>> 16;
    return (n < 0)?1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
}
Copy the code

Involved in the code an operator | =, it says is the bitwise or of what mean? You must know the meaning of a + = b is a = a + b, so * * in the same way: a | = b is a = a | b * * *, which is converted to binary on both sides, for and operation. As shown in the figure below

We used a large number to expand the array, and we can see from the above figure that the array to the power 2^29 will be calculated to the power 2^30 after a series of or operations.

So it’s twice as long.

The load factor

LoadFactor represents the loadFactor, which represents how dense the HashMap is.

HashMap constructor

In the HashMap source code, there are four constructors, described separately

  • withInitialCapacity initialCapacityLoadFactor loadFactorConstructor of
public HashMap(int initialCapacity, float loadFactor) {
  if (initialCapacity < 0)
    throw new IllegalArgumentException("Illegal initial capacity: " +
                                       initialCapacity);
  if (initialCapacity > MAXIMUM_CAPACITY)
    initialCapacity = MAXIMUM_CAPACITY;
  if (loadFactor <= 0 || Float.isNaN(loadFactor))
    throw new IllegalArgumentException("Illegal load factor: " +
                                       loadFactor);
  this.loadFactor = loadFactor;
  // Capacity expansion threshold
  this.threshold = tableSizeFor(initialCapacity);
}
Copy the code

The initial capacity cannot be negative, so passing an initial capacity < 0 directly throws an IllegalArgumentException. If the initial capacity passed in > maximum capacity, initial capacity = maximum capacity. The load factor cannot be less than 0. And then we expand the array, which is also very important, and we’ll talk about that later

  • A constructor with initialCapacity only
public HashMap(int initialCapacity) {
  this(initialCapacity, DEFAULT_LOAD_FACTOR);
}
Copy the code

The above constructor is eventually called, but the default load factor is the default load factor for HashMap, which is 0.75F

  • A constructor with no arguments
public HashMap(a) {
  this.loadFactor = DEFAULT_LOAD_FACTOR;
}
Copy the code

The default load factor is 0.75F

  • Constructor with map
public HashMap(Map<? extends K, ? extends V> m) {
  this.loadFactor = DEFAULT_LOAD_FACTOR;
  putMapEntries(m, false);
}
Copy the code

A constructor with a Map directly batches external elements into a HashMap.

Let’s talk about HashMap Put

I remember that when I went to Beijing for an interview just one year after graduation, a company asked me the process of “HashMap put”. I hesitated and couldn’t answer, so I made up my mind to do it well. Use JDK 1.8 as the benchmark for analysis, as follows. Post the entire code first, then analyze it line by line.

final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
  Node<K,V>[] tab; Node<K,V> p; int n, i;
  // If the table is null or no memory is allocated for the table, resize the table once
  if ((tab = table) == null || (n = tab.length) == 0)
    n = (tab = resize()).length;
  // The (n-1) &hash is the actual hash in the table
  if ((p = tab[i = (n - 1) & hash]) == null)
    tab[i] = newNode(hash, key, value, null);
  // If not empty
  else {
    Node<K,V> e; K k;
    // Calculate the actual hash value in the table compared to the key.hash to be inserted
    if(p.hash == hash && ((k = p.key) == key || (key ! =null && key.equals(k))))
      e = p;
    // If not, and the current node is already on the TreeNode
    else if (p instanceof TreeNode)
      // Use red-black tree storage mode
      e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
    // key.hash is different and no longer on TreeNode, find p.ext ==null on the linked list
    else {
      for (int binCount = 0; ; ++binCount) {
        if ((e = p.next) == null) {
          Insert at the end of the table
          p.next = newNode(hash, key, value, null);
          // If the number of new nodes reaches the threshold, enter treeifyBin() to check again
          if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
            treeifyBin(tab, hash);
          break;
        }
        // If a node with the same hash and key is found, exit the loop
        if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k))))
          break;
        // Update p to point to the next nodep = e; }}// Map contains old values, return old values
    if(e ! =null) { // existing mapping for key
      V oldValue = e.value;
      if(! onlyIfAbsent || oldValue ==null)
        e.value = value;
      afterNodeAccess(e);
      returnoldValue; }}// Map adjustment times + 1
  ++modCount;
  // The number of key-value pairs reaches the threshold and needs to be expanded
  if (++size > threshold)
    resize();
  afterNodeInsertion(evict);
  return null;
}
Copy the code

The putVal method is final. If you define HashMap inheritance yourself, you are not allowed to override the PUT method. Then the putVal method takes five parameters

  • Hash -> put Position in the bucket. Before putting, the hash function will be computed.
  • Key -> Key value of the parameter
  • Value -> Value of the parameter
  • OnlyIfAbsent -> Whether to change an existing value, that is, whether to replace the value
  • Evict -> Is the flag that the HashMap was just created

When the putVal method is called, the hash function first calculates where to insert it

public V put(K key, V value) {
  return putVal(hash(key), key, value, false.true);
}
Copy the code

The source code for the hash function is as follows

static final int hash(Object key) {
  int h;
  return (key == null)?0 : (h = key.hashCode()) ^ (h >>> 16);
}
Copy the code

Let’s first understand the rules for calculating hash functions

The Hash function

The hash function evaluates the key based on the key you pass, first evaluating the hashCode of the key, then performing an unsigned right shift on the hashCode, and finally xor ^ with the hashCode.

>>>: Unsigned right shift operation, it refers to the unsigned right shift, also called logical right shift, that is, if the number is positive, then the high order will add 0, and if the number is negative, then the high order will also add 0 after the right shift, that is, no matter whether the number is positive or negative, right shift will add 0 in the vacancy.

After the hash value is obtained, the put process is performed.

If the HashMap is created for the first time and the element is inserted for the first time, the resize of the array is performed first, that is, the resize() mechanism is also involved in the source analysis, which will be described later. After capacity expansion, the storage location of the HashMap will be calculated by using (n-1) & hash.

And then we’ll use that as the index of the array as the location of the element. If not empty, the actual hash in the table is computed against the key.hash to be inserted. If the hash value is the same and the key-value is different, then check whether it is an instance of the tree, and if so, insert it into the tree. If not, perform tail insertion at the end of the entry chain.

The bucket determines whether it is a linked list or a red-black tree based on the number of elements in the bucket. Then check whether the number of key-value pairs is greater than the threshold. If so, expand the capacity.

Expansion mechanism

In Java, arrays are fixed in length, which means they can only store a fixed amount of data. But a lot of times during development, we don’t know how big the array should be. The good news is that HashMap is an auto-scaling data structure, where scaling mechanisms are important in a variance-based data structure.

In a HashMap, the threshold size is the product of the bucket array length and the load factor. If the number of key/value pairs in a HashMap exceeds the threshold, expand the number. The scaling mechanism in HashMap is implemented by the resize() method, which we’ll take a look at. (Post Chinese notes for easy copying)

final Node<K,V>[] resize() {
  Node<K,V>[] oldTab = table;
  // Store the size of the old table
  int oldCap = (oldTab == null)?0 : oldTab.length;
  // Storage expansion threshold
  int oldThr = threshold;
  int newCap, newThr = 0;
  if (oldCap > 0) {
    // If the old table data has reached the maximum, then the threshold is set to the maximum
    if (oldCap >= MAXIMUM_CAPACITY) {
      threshold = Integer.MAX_VALUE;
      return oldTab;
    }
    // Double the left shift,
    else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
             oldCap >= DEFAULT_INITIAL_CAPACITY)
      // Double the size
      newThr = oldThr << 1; // double threshold
  }
  // If oldThr! > 0
  else if (oldThr > 0) // initial capacity was placed in threshold
    newCap = oldThr;
  // If old table <= 0 and store threshold <= 0
  else {               // zero initial threshold signifies using defaults
    newCap = DEFAULT_INITIAL_CAPACITY;
    newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
  }
  // If the expansion threshold is 0
  if (newThr == 0) {
    // The capacity expansion threshold is initial capacity x load factor
    float ft = (float)newCap * loadFactor;
    newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
              (int)ft : Integer.MAX_VALUE);
  }
  // Reassign the load factor
  threshold = newThr;
  // Get the expanded array
  @SuppressWarnings({"rawtypes"."unchecked"})
  Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
  table = newTab;
  // If the table is initialized for the first time, the following code will not be used
  // After the expansion, the nodes need to be placed in the newly expanded array
  if(oldTab ! =null) {
    for (int j = 0; j < oldCap; ++j) {
      Node<K,V> e;
      if((e = oldTab[j]) ! =null) {
        oldTab[j] = null;
        if (e.next == null)
          newTab[e.hash & (newCap - 1)] = e;
        else if (e instanceof TreeNode)
          // When remapping, the red-black tree needs to be split
          ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
        else { // preserve order
          Node<K,V> loHead = null, loTail = null;
          Node<K,V> hiHead = null, hiTail = null;
          Node<K,V> next;
          // Iterate through the list and group the list nodes in the same order
          do {
            next = e.next;
            if ((e.hash & oldCap) == 0) {
              if (loTail == null)
                loHead = e;
              else
                loTail.next = e;
              loTail = e;
            }
            else {
              if (hiTail == null)
                hiHead = e;
              elsehiTail.next = e; hiTail = e; }}while((e = next) ! =null);
          // Map the grouped list to the new bucket
          if(loTail ! =null) {
            loTail.next = null;
            newTab[j] = loHead;
          }
          if(hiTail ! =null) {
            hiTail.next = null;
            newTab[j + oldCap] = hiHead;
          }
        }
      }
    }
  }
  return newTab;
}
Copy the code

Expansion mechanism source is relatively long, we are patient to split

We use the if… else if… Else logic to split, the above code mainly does these things

  • Determine the length of the array in the HashMap, i.e(Node<K,V>[])oldTab.length(), and then determine whether the length of the array is greater than the maximum length, which is 2^30<<Double the size of the original

  • If the array length is less than or equal to 0, determine the expansion thresholdthresholdWhether the value is greater than 0, that is, whether there is an external capacity expansion threshold. If there is an external capacity expansion threshold, it is usedoldThr > 0, because oldThr = threshold, the actual comparison here is threshold, because every constructor in the HashMap is calledHashMap(initCapacity,loadFactor)This constructor, so if no external initialCapacity is specified, the initialCapacity is used is 16, and then according tothis.threshold = tableSizeFor(initialCapacity);Evaluate threshold.

  • Otherwise, use the default initial capacity and expansion threshold. Else logic is used when the table is initialized.

NewThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY); I always thought that this is constant multiplication, how can it be 0, in fact, it is not the problem of this part, but the expansion operation in the above logical judgment, may cause bit overflow.

Example of a bit overflow: oldCap = 2^28, threshold > 2^ 3 integer power. On entering float ft = (float)newCap * loadFactor; This method is 2 to the 28 times 2 to the 3 plus n, which is just going to be greater than 2 to the 31, which is going to be zero.

After capacity expansion, nodes need to be placed in the newly expanded array, which also involves three steps

  • Check whether Node[I] is empty. If Node[I] is empty, it returns directly. If Node[I] is not empty, it iterates through the bucket array and maps the key-value pair to the new bucket array.

  • If it is not empty, then judge whether it is a tree structure. If it is a tree structure, split it according to the tree structure. The split method is in the split method.

  • If it is not a tree, the list is traversed and the list nodes are grouped in the same order.

Let’s talk about the get method

Let’s take a look at the get method,

public V get(Object key) {
  Node<K,V> e;
  return (e = getNode(hash(key), key)) == null ? null : e.value;
}

final Node<K,V> getNode(int hash, Object key) {
  Node<K,V>[] tab; Node<K,V> first, e; int n; K k;

  // Find the real element location
  if((tab = table) ! =null && (n = tab.length) > 0 &&
      (first = tab[(n - 1) & hash]) ! =null) {
    // Always check the first element
    if (first.hash == hash && // always check first node((k = first.key) == key || (key ! =null && key.equals(k))))
      return first;

    // If it is not the first element and the next element is not empty
    if((e = first.next) ! =null) {

      // Check whether it is a TreeNode. If it is an instance of TreeNode, fetch it directly from treenode.gettreenode
      if (first instanceof TreeNode)
        return ((TreeNode<K,V>)first).getTreeNode(hash, key);

      // If it is not already a TreeNode instance, loop through the array elements until the specified element position is found
      do {
        if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k))))
          return e;
      } while((e = e.next) ! =null); }}return null;
}

Copy the code

For a brief introduction, we first check if the table element is empty, and then hash out the location of the specified key. Then check whether the first element of the list is empty, if not, whether it matches, if a match, directly return the record; If the value of the next element is null, it is returned directly; if it is not null, it is determined whether it is TreeNode instance. If it is TreeNode instance, it is directly retrieved using treenode. getTreeNode; otherwise, the loop is executed. Until the next element is null.

(n-1) &hash (n – 1) hash (n – 1) &hash (n – 1) hash (n – 1) hash (n – 1) hash (n – 1) hash (n – 1) hash (n – 1) hash (n – 1) hash (n – 1) hash

N is the number of buckets in a HashMap, which means (n-1) hash is (bucket capacity -1) hash

// Why does HashMap retrieve (table.size-1) & hash
public static void main(String[] args) {

  Map<String,Object> map = new HashMap<>();

  // debug finds that the hash value of 1 is 49
  map.put("1"."cxuan");
  // debug finds that the hash value of 1 is 50
  map.put("2"."cxuan");
  // debug finds that the hash value of 1 is 51
  map.put("3"."cxuan");

}
Copy the code

So n minus 1 and hash, after each calculation

That’s TAB [(n-1) hash].

Traversal of a HashMap

HashMap traversal is also a particularly frequently used operation

The base class for HashMap traversal is HashIterator, which is a Hash iterator. It is an abstract class inside a HashMap. Its construction is relatively simple, with only three methods: hasNext, remove, and nextNode. The nextNode method is implemented by three iterators

These three types of iterators are just that

  • KeyIterator, traverses the key
  • ValueIterator, traverses value
  • EntryIteratorTo traverse the Entry chain

Although there are a lot of iterators, they are all iterated in the same order and constructed very simply, using the nextNode method in HashIterator

final class KeyIterator extends HashIterator
        implements Iterator<K> {
        public final K next(a) { returnnextNode().key; }}final class ValueIterator extends HashIterator
  implements Iterator<V> {
  public final V next(a) { returnnextNode().value; }}final class EntryIterator extends HashIterator
  implements Iterator<Map.Entry<K.V>> {
  public final Map.Entry<K,V> next(a) { returnnextNode(); }}Copy the code

Traversal in HashIterator

abstract class HashIterator {
  Node<K,V> next;        // Next entry node
  Node<K,V> current;     // Current entry node
  int expectedModCount;  // Fail-fast identifier
  int index;             / / the current slot

  HashIterator() {
    expectedModCount = modCount;
    Node<K,V>[] t = table;
    current = next = null;
    index = 0;
    if(t ! =null && size > 0) { // advance to first entry
      do {} while (index < t.length && (next = t[index++]) == null); }}public final boolean hasNext(a) {
    returnnext ! =null;
  }

  final Node<K,V> nextNode(a) {
    Node<K,V>[] t;
    Node<K,V> e = next;
    if(modCount ! = expectedModCount)throw new ConcurrentModificationException();
    if (e == null)
      throw new NoSuchElementException();
    if ((next = (current = e).next) == null&& (t = table) ! =null) {
      do {} while (index < t.length && (next = t[index++]) == null);
    }
    return e;
  }

  public final void remove(a) {...}
}
Copy the code

Next and current represent the next Node and the current Node, respectively, and the HashIterator iterates through all nodes when initialized. Let’s use a graph to show their traversal order

You’ll notice that the nextNode() method iterates the same way as a HashIterator, except that the criteria for constructing a HashIterator are whether there are linked lists, whether the bucket is null, The criteria for traversing nextNode are whether the nextNode is null and whether the bucket is null.

Remove method in HashMap

The removal method in HashMap is also relatively simple, the source code is as follows

public V remove(Object key) {
  Node<K,V> e;
  return (e = removeNode(hash(key), key, null.false.true)) = =null ?
    null : e.value;
}

final Node<K,V> removeNode(int hash, Object key, Object value,
                           boolean matchValue, boolean movable) {
  Node<K,V>[] tab; Node<K,V> p; int n, index;
  if((tab = table) ! =null && (n = tab.length) > 0 &&
      (p = tab[index = (n - 1) & hash]) ! =null) {
    Node<K,V> node = null, e; K k; V v;
    if(p.hash == hash && ((k = p.key) == key || (key ! =null && key.equals(k))))
      node = p;
    else if((e = p.next) ! =null) {
      if (p instanceof TreeNode)
        node = ((TreeNode<K,V>)p).getTreeNode(hash, key);
      else {
        do {
          if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k)))) {
            node = e;
            break;
          }
          p = e;
        } while((e = e.next) ! =null); }}if(node ! =null&& (! matchValue || (v = node.value) == value || (value ! =null && value.equals(v)))) {
      if (node instanceof TreeNode)
        ((TreeNode<K,V>)node).removeTreeNode(this, tab, movable);
      else if (node == p)
        tab[index] = node.next;
      else
        p.next = node.next;
      ++modCount;
      --size;
      afterNodeRemoval(node);
      returnnode; }}return null;
}
Copy the code

Remove (object) : removeNode: removeNode: removeNode: removeNode: removeNode: removeNode: removeNode: removeNode

The bucket is first hashed to find the corresponding bucket, then the linked list is iterated to find the nodes with the same key value, and then the corresponding nodes are deleted.

Interview questions about HashMap

The data structure of the HashMap

In JDK1.7, HashMap is implemented in bitbuckets and linked lists. That is, linked lists are used to handle conflicts. Lists with the same hash value are stored in an array. However, when there are many elements in a bucket, that is, there are many elements with the same hash value, searching by key value is inefficient.

So, compared to JDK 1.7, JDK 1.8 has some changes in the underlying structure. When the elements in each bucket are greater than 8, it is converted to a red-black tree to optimize query efficiency.

The put procedure of a HashMap

The general process is as follows: First, hash is used to calculate the hash code of the object, and the location of the object in the bucket is determined according to the hash code. If there is no Node Node in the bucket, put is performed directly. If the corresponding bucket already has Node nodes, the length of the linked list is analyzed. Check if the list length is greater than 8. If the list length is less than 8, head interpolation will be used before JDK1.7, and tail interpolation will be used after JDK1.8. If the length of the list is greater than 8, the tree operation will be performed to convert the list to a red-black tree and store the list in the red-black tree.

Why HashMap thread is not safe

HashMap is not a thread-safe container. It is unsafe when multiple threads concurrently put a HashMap. If you have two threads A and B, first A wants to insert A key/value pair into the HashMap. After the bucket position is determined and the time slice of A runs out, B runs out and performs the same operation as A, except that B successfully inserts the key/value pair. If A and B are inserted at the same location (bucket), thread A will overwrite B’s record after continuing execution, causing data inconsistency problems.

Another point is that when HashMap is expanded, the resize method creates a loop, causing the CPU to skyrocket.

How does HashMap handle hash collisions

The bottom layer of HashMap is realized by using bitbucket + linked list. Bitbucket determines the insertion position of elements, and bitbucket is determined by the hash method. When multiple elements get the same hash value, HashMap will put multiple Node elements into the corresponding bitbucket to form a linked list. This approach to hash collisions is known as chain-address.

Other ways to handle hash collisions include the open address method, the rehash method, and creating a public overflow area.

How does a HashMap get elements

It first checks whether the elements in the table are empty, and then hashes out the location of the specified key. Then check whether the first element of the list is empty, if not, whether it matches, if a match, directly return the record; If the value of the next element is null, it is returned directly; if it is not null, it is determined whether it is TreeNode instance. If it is TreeNode instance, it is directly retrieved using treenode. getTreeNode; otherwise, the loop is executed. Until the next element is null.

What is the difference between HashMap and HashTable

See above

The difference between HashMap and HashSet

See above

How is HashMap expanded

There are two very important variables in HashMap, one is loadFactor and the other is threshold. LoadFactor refers to the loadFactor, and threshold refers to the threshold for the next expansion. When threshold = loadFactor * array length, the array size is doubled to resize the map and place the original objects into the new bucket array.

Why is the length of a HashMap a power of 2

I’ve been thinking about this question for a few days. When I asked my group members why length%hash == (n-1) &hash, they said it was equal only if length is a power of 2, And I said, is length not a power of two? Actually, I don’t understand causality, because the length of a HashMap is a power of 2, so I use the remainder to determine the subscript in the bucket. If length is not a power of 2, you can give an example

For example, if the length is 9, 3 & (9-1) = 0, 2 & (9-1) = 0, both on 0, collision;

This increases the chance of HashMap collisions.

What are the thread-safe implementations of HashMap

Since HashMap is not a thread-safe container, ConcurrentHashMap is recommended for concurrent scenarios, or a thread-safe HashMap, using a thread-safe container under the Collections package, for example

Collections.synchronizedMap(new HashMap());
Copy the code

You can also use HashTable, which is also a thread-safe container based on key-value storage. HashMap is often compared to HashTable because HashTable has the same data structure as HashMap.

The most efficient of the above is ConcurrentHashMap.

Afterword.

The article did not describe much about the construction of red-black tree, including the process of adding, deleting and treification. On the one hand, I could not reach my ability; on the other hand, the description of red-black tree occupied too much space. Red-black tree is a large part of the content, so I will consider the red-black tree in the later part to explain.