The principle of the Map

1. What is the Map

A Map is used to store data with mapping relationships. The set of maps holds two sets of values: one set holds the Key of the Map, and the other set holds the value of the Map.

What are the methods of Map

2. What is HsahMap

HashMap is a collection of key and value pairs implemented in a hash table. It inherits from AbstractMap and implements the Map interface.

The special storage structure of HashMap makes it necessary to get the position of the target element in the hash table through hash operation before getting the specified element, and then make a small amount of comparison to get the element, which makes the search efficiency of HashMap high.

When there is a hash collision (collision), HashMap takesZipper methodSo the underlying implementation of a HashMap isArray plus linked list, as shown below:

An overview of the HashMap implementation

  • Key /value is allowed to be NULL, but a key can only have one NULL
  • Non-thread-safe, where changes made by multiple threads working on the same HashMap instance are out of sync between threads
  • Traversal does not guarantee any order, regardless of the order in which elements are inserted or accessed
  • For traversed if you execute a HashMap remove (Object key) or put (Object value) method will fail fast, throw an exception ConcurrentModificationException. Removing elements traversally can only be done through the Iterator’s own remove() method
There are two important concepts in a HashMap

Because HashMap scaling is expensive (creating a new array, rehashing, allocating, and so on), there are two scale-related factors:

  • The current length of the Capacity HashMap
  • Load factor, default value 0.75F

Disadvantages: the capacity is too small to enlarge rehash, resulting in reduced performance; too large load factor will cause conflicts, and the efficiency of search will be reduced.

Capacity is the maximum number of keys that can be stored in the HashMap. Capacity is 2^n for scaling purposes. The load factor is used to calculate the threshold to perform scaling. The default load factor is 0.75, which is the balance between time and space when considering various operations such as GET/PUT of HashMap. The default value is recommended. If the capacity is 256 and the load factor is 0.75, the capacity will be expanded when the total number of keys exceeds 192, that is, the capacity will be doubled to 512, and the original stored keys will be re-distributed through hash operation.

constant
Static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16 static final int MAXIMUM_CAPACITY = 1 << 30; Static final float DEFAULT_LOAD_FACTOR = 0.75f; static final int TREEIFY_THRESHOLD = 8; static final int UNTREEIFY_THRESHOLD = 6; static final int MIN_TREEIFY_CAPACITY = 64;
Member variables
// transient node list transient node <K,V>[] table; // Set< map.entry <K,V>> entrySet; // transient int size; // transient int modCount; // Capacity * loadfactor int threshold = capacity * loadfactor int threshold; // Final Float LoadFactor is expanded when the node value is greater than a certain percentage of the current total;
How HashMap is implemented
/* ---------------- Public operations -------------- */ /** * Constructs an empty <tt>HashMap</tt> with the specified initial * capacity and load factor. * * @param initialCapacity the initial capacity * @param loadFactor the load factor * @throws IllegalArgumentException if the initial capacity is negative * or the load factor is nonpositive */ public HashMap(int initialCapacity, float loadFactor) { if (initialCapacity < 0) throw new IllegalArgumentException("Illegal initial capacity: " + initialCapacity); if (initialCapacity > MAXIMUM_CAPACITY) initialCapacity = MAXIMUM_CAPACITY; if (loadFactor <= 0 || Float.isNaN(loadFactor)) throw new IllegalArgumentException("Illegal load factor: " + loadFactor); this.loadFactor = loadFactor; this.threshold = tableSizeFor(initialCapacity); } /** * Constructs an Empty < TT >HashMap</ TT > with the specified initial * capacity and the default load factor (0.75). *  * @param initialCapacity the initial capacity. * @throws IllegalArgumentException if the initial capacity is negative. */ public HashMap(int initialCapacity) { this(initialCapacity, DEFAULT_LOAD_FACTOR); } /** * Constructs an empty <tt>HashMap</tt> with the default initial capacity * (16) and the default load factor */ public HashMap() {this.loadFactor = DEFAULT_LOAD_FACTOR; // all other fields defaulted } public HashMap(Map<? extends K, ? extends V> m) { this.loadFactor = DEFAULT_LOAD_FACTOR; putMapEntries(m, false); } create a new Map that belongs to deep copy final void putMapEntries(Map<? extends K, ? extends V> m, boolean evict) { int s = m.size(); If (s > 0) {if (table == null) {// pre-size float ft = ((float)s/loadFactor) + 1.0f; int t = ((ft < (float)MAXIMUM_CAPACITY) ? (int)ft : MAXIMUM_CAPACITY); if (t > threshold) threshold = tableSizeFor(t); } else if (s > threshold) resize(); for (Map.Entry<? extends K, ? extends V> e : m.entrySet()) { K key = e.getKey(); V value = e.getValue(); putVal(hash(key), key, value, false, evict); }}}
The get method
public V get(Object key) { Node<K,V> e; return (e = getNode(hash(key), key)) == null ? null : e.value; } /** * Implements Map.get and related methods. * * @param hash hash for key * @param key the key * @return the node, or null if none */ final Node<K,V> getNode(int hash, Object key) { Node<K,V>[] tab; Node<K,V> first, e; int n; K k; If (TAB = table)! = null && (n = tab.length) > 0 && (first = tab[(n - 1) & hash]) ! = null) {// Whether to find the first one, Consistent judgment conditions and key agreement if (first hash = = hash && / / always check first node ((k = first. Key) = = key | | (key! = null && key.equals(k)))) return first; if ((e = first.next) ! = null) { if (first instanceof TreeNode) return ((TreeNode<K,V>)first).getTreeNode(hash, key); / / based on hash conflict solution, each data item is a linked list, need to traverse the list to find the key that we need do {the if (e.h ash = = hash && ((k = e.k ey) = = key | | (key! = null && key.equals(k)))) return e; } while ((e = e.next) ! = null); } } return null; }
Put method

When put, HashMap is initialized first. If it is not initialized first, the array will be initialized by the nearest power of 2, which is larger than the user specified capacity. If the expansion threshold is set, the array will be updated as well.

  1. Let’s see if it’s initialized
  2. The key passed in is stored in the table(0) position before determining whether the key is empty
  3. If the key is not empty, hash the key and the result of the hash is located over & and the length of the array is determined.
  4. If the storage location is empty, a new node is created. If it is not empty, a hash conflict exists.
  5. Resolving conflicts HashMap traverses the entire list, updating it if it has the same value, otherwise creating nodes to add to the list header.
  6. Add to determine whether the storage node reaches the threshold value, reach the threshold value to carry out capacity expansion.
  7. Arrays.copy() is used to enlarge the capacity of Arrays.copy()
  8. After scaling, new nodes inserted will have to be hashed again before being inserted.
/** * Implements Map.put and related methods. * * @param hash hash for key * @param key the key * @param value the value  to put * @param onlyIfAbsent if true, don't change existing value * @param evict if false, the table is in creation mode. * @return previous value, or null if none */ final V putVal(int hash, K key, V value, boolean onlyIfAbsent, boolean evict) { Node<K,V>[] tab; Node<K,V> p; int n, i; / / first or check list is empty, empty, you can call reszie () if ((TAB = table) = = null | | (n = TAB. Length) = = 0) n = (TAB = the resize ()). The length; If ((p = Tab [I = (n-1) & hash]) == null) Tab [I] = newNode(hash, key, value, null); else { Node<K,V> e; K k; / / if it already exists, to check whether and the key is consistent, if the consistent value to replace the if (p.h ash = = hash && ((k = p.k ey) = = key | | (key! = null && key.equals(k)))) e = p; else if (p instanceof TreeNode) e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value); else { for (int binCount = 0; ; ++binCount) {// Check if there is an item in the list that is the same as the key. If ((e = p.ext) == null) {// insert a new entry at the end of the list p.ext = newNode(hash, key, value, null); if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st treeifyBin(tab, hash); break; } / / consistent let p and q e point to the same data item if (e.h ash = = hash && ((k = e.k ey) = = key | | (key! = null && key.equals(k)))) break; p = e; }} // Check if (e! = null);} // Check if (e! = null); = null) { // existing mapping for key V oldValue = e.value; // The onlyIfAbsent switch is true to indicate no need to update an existing item and false to update if (! onlyIfAbsent || oldValue == null) e.value = value; afterNodeAccess(e); return oldValue; } } ++modCount; if (++size > threshold) resize(); afterNodeInsertion(evict); return null; }
How to calculate the hash code

^ and >, >, >


This symbol is called the XOR operator. The bits of the two operands that are the same result 0, and those that are different result 1


So 2 to the fifth is equal to 7 because 0010 to the 0101 is equal to 0111

static final int hash(Object key) {
    int h;
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}
The role of modCount

You can see that modeCount is modified in the values PUT, GET, REMOVE. It is then used for comparison in several subclasses of AbstractSet: keySet and foreach in Values.


If the value of modCount is changed at the end of the loop, the internal structure of the HashMap has changed and the thread is not safe. An exception is thrown to implement the “fast-fail” mechanism.

public final void forEach(Consumer<? super V> action) { Node<K,V>[] tab; if (action == null) throw new NullPointerException(); if (size > 0 && (tab = table) ! = null) { int mc = modCount; for (int i = 0; i < tab.length; ++i) { for (Node<K,V> e = tab[i]; e ! = null; e = e.next) action.accept(e.value); } if (modCount ! = mc) throw new ConcurrentModificationException(); }}

Common problems with HashMap

What is the difference between HashMap 1.7 and 1.8?

The biggest difference between Hash1.7 and 1.8 is that 1.8 uses the data structure of “array + linked list + red-black tree”. When the length of the linked list exceeds 8, the linked list is converted into a red-black tree to solve the problem that HashMap queries slow down due to the length of the linked list.


The underlying node of 1.7 is Entry and 1.8 is Node, but they are essentially the same implementation of Map.Entry.


In addition, the traversal update and add operation of tree structure is added when accessing data, and the tail interpolation method is adopted to avoid the generation of ring linked list.

What is the reason that HashMap changes header interpolation and changes tail interpolation?

Before Java 8, there was header interpolation, which meant that the new value would replace the old value, and the old value would be pushed to the linked list. The author of the code thought that the later value would be more likely to be looked up, increasing the efficiency of the lookup.

What are the scaling mechanisms for HashMap?

HashMap scale mechanism, the array size is limited, multiple inserts, up to a certain number of data will be expanded, that is, resize.


For example, when the size is 100, it will be determined that it needs to resize and enlarge the size when it is 76.

/**
 * The load factor used when none specified in constructor.
 */
static final float DEFAULT_LOAD_FACTOR = 0.75f;
What are the conditions for scaling HashMap?

Enlargement: Creates a new empty Entry array that is twice as long as the previous one.


ReHash: Traverse through the original Entry array, hashing all the entries into the new array.

Why recalculate the Hash? Wouldn’t it be nice to just copy it?

> index = hashCode (Key) & (Length – 1)

The original Length was 8 and you computed it to be 2, the new Length is 16 and you computed it to be significantly different

If we put two values of a put of size 2 and the load factor is 0.75, will we resize when we put the second one?

2*0.75 = 1, so I have to resize the second one

Because of the resize assignment method, which uses single-linked list header insertion, new elements in the same position are always placed at the top of the list. Elements in the same Entry chain in the old array may be placed at different positions in the new array by recalcating their index positions.

If you evaluate it at this point, you have an Infinite Loop.

Using the header plug will change the order of the list, but if using the tail plug, the original order of the list elements will be maintained in the capacity expansion, so there will not be the problem of the linked list ring.

So it was A minus >B, but it’s still going to be A minus >B

Java7 May cause an infinite loop when manipulating a HashMap with multiple threads. The reason is that the linked list is reversed after the scaling transfer, and the reference relationship of the original linked list is changed during the transfer.

Java8 does not cause an infinite loop under the same premise, because the order of the linked list remains unchanged after the expansion transfer, keeping the reference relationship of the previous node.

Put /get methods are not synchronized lock, multi-threading situation is the most likely to occur: cannot guarantee the last second put value, the next second get the same value, so the thread safety is still not guaranteed.

Why is the HashMap starting with 16?

The bit-and-arithmetic operation is much more efficient than arithmetic calculation. The initial value of 16 is set to serve the algorithm that maps the Key to the index.


Because when using a number that is not a power of 2, the value of length-1 is one for all binary bits, in which case the result of index is equal to the value of the last few bits of hashCode. As long as the hashCode entered is itself evenly distributed, the result of the Hash algorithm is uniform.


This is to
Achieve uniform distribution.

/**
 * The default initial capacity - MUST be a power of two.
 */
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16
Can you give me an example of a HashMap?

In Java, all objects inherit from the Object class. The OJBect class has two methods: equals and hashCode. Both of these methods compare two objects to see if they are equal.


The equals equals method is used to compare the memory addresses of two objects. The equals method is used to compare the memory addresses of two objects

For a value object, == compares the values of two objects and for a reference object, it compares the addresses of two objects

HashMap is thread-unsafe. What is the main reason?

In JDK 1.7, capacity expansion can result in ring links or data loss in a multithreaded environment.


In JDK 1.8, data overwriting occurs in a multithreaded environment.

HashMap often meets test questions
  • The underlying data structure of the HashMap?
  • How does HashMap access work?
  • What’s the difference between Java7 and Java8?
  • Why are threads unsafe?
  • Are there any thread-safe classes instead?
  • What is the default initialization size? Why so many? Why are they all powers of two?
  • How does HashMap scale? What is the load factor? Why so much?
  • What are the main parameters of a HashMap?
  • How does HashMap handle hash collisions?
  • What are the computational rules for hashes?

To achieve thread-safety ConcurrentHashMap https://www.jianshu.com/p/e69…