GitHub JavaKeeper, N line Internet development essential skills weapon spectrum, notes taken from

As a side job interviewer, I’m sure I’ll be asked about Java collections during the interview process, and as a candidate, I’m sure I’ll be asked about Java collections, so here are some of the questions

What are some common sets?

HashMap does the Key need to override hashCode() and equals()?

Can keys and values be null in a HashMap? How many nulls are allowed?

Is the HashMap thread safe? What is the difference between ConcurrentHashMap and hashTable?

List and Set. Now I have an ArrayList, and I sort all the elements by some attribute size. What do I do?

Difference between an ArrayList and a Vector

Can you delete a list? Can you delete a list while iterating? Why

Object oriented languages reflect things in the form of objects, so in order to facilitate the operation of multiple objects, objects need to be stored, collection is the most common way to store objects, also called containers.

As you can see from the collection framework diagram above, the Java Collections framework consists of two main types of containers

  • One is a Collection, which stores a Collection of elements
  • The other is a Map, which stores key/value pair mappings.

The Collection interface has three subtypes, List, Set, and Queue, followed by some abstract classes, and finally the concrete implementation classes. Common examples are ArrayList, LinkedList, HashSet, LinkedHashSet, HashMap, LinkedHashMap, and so on.

The collection framework is a unified framework for representing and manipulating collections. All collections frameworks contain the following:

  • Interface: Is an abstract data type that represents a collection. For example, Collection, List, Set, and Map. Multiple interfaces are defined to manipulate collection objects in different ways

  • Implementation (class) : Is a concrete implementation of the collection interface. In essence, they are reusable data structures, such as ArrayList, LinkedList, HashSet, HashMap.

  • Algorithms: Useful calculations performed by methods in objects that implement the collection interface, such as searching and sorting. These algorithms are called polymorphic because the same method can have different implementations on similar interfaces.


So what are the common sets?

The Map and Collection interfaces are the parent interfaces of all collections frameworks:

  1. The subinterfaces of the Collection interface include Set, List, and Queue
  2. List is an ordered Collection that allows repeating elements. The main implementation classes are ArrayList, LinkedList, Stack, and Vector
  3. Set is an unordered Collection without repeating elements. The main implementation classes are: HashSet, TreeSet, LinkedHashSet, etc
  4. Map does not inherit the Collection interface. Map provides key-to-value mapping. The main implementation classes are: HashMap, TreeMap, Hashtable, ConcurrentHashMap, Properties, etc

Difference between an ArrayList and a Vector

Similarities:

  • ArrayList and Vector both inherit the same parent class and implement the same interface (both implement lists, ordered, repeatable, and NULL)

    extends AbstractList<E>
            implements List<E>, RandomAccess, Cloneable, java.io.Serializable
    Copy the code
  • The bottom layer is the array (Object[]) implementation

  • The initial default length is 10

Difference:

  • Synchronization: Most of the public methods in Vector add the synchronized keyword to ensure that methods are synchronized, making Vector thread safe and ArrayList thread unsafe

  • Performance: The Vector has relatively poor performance due to synchronized lock waiting and the process of lock release

  • Size: ArrayList is 0.5 times larger when the underlying array is insufficient. Vector is 1 times larger by default

    Expansion mechanism, expansion method is actually create a new array, and then copy the elements of the old array into the new array. The underlying expansion methods are all in grow() (based on JDK8)

    • ArrayList grows (). When the capacity expansion condition is met, ArrayList grows () by 1.5 times (oldCapacity >> 1, right move, equivalent to dividing by 2, result is 1/2 oldCapacity).

      private void grow(int minCapacity) {
          // overflow-conscious code
          int oldCapacity = elementData.length;
          //newCapacity = oldCapacity + O.5*oldCapacity, expand the capacity by 0.5 times
          int newCapacity = oldCapacity + (oldCapacity >> 1); 
          if (newCapacity - minCapacity < 0)
              newCapacity = minCapacity;
          if (newCapacity - MAX_ARRAY_SIZE > 0)
              newCapacity = hugeCapacity(minCapacity);
          // minCapacity is usually close to size, so this is a win:
          elementData = Arrays.copyOf(elementData, newCapacity);
      }
      Copy the code
    • Grow () of Vector, which has one more property than ArrayList, capacityIncrement, which can be used to increase the size of a Vector. If the increment of expanded capacity is greater than 0, the length of the new array is the original array length **+ the increment of expanded capacity; otherwise, the length of the new array is 2** times the length of the original array

      private void grow(int minCapacity) {
          // overflow-conscious code
          int oldCapacity = elementData.length;
          //
          int newCapacity = oldCapacity + ((capacityIncrement > 0)? capacityIncrement : oldCapacity);if (newCapacity - minCapacity < 0)
              newCapacity = minCapacity;
          if (newCapacity - MAX_ARRAY_SIZE > 0)
              newCapacity = hugeCapacity(minCapacity);
          elementData = Arrays.copyOf(elementData, newCapacity);
      }
      Copy the code

ArrayList is different from LinkedList

  • Threadsafe: ArrayList and LinkedList are not synchronized, that is, not thread-safe;
  • Underlying data structures: Arraylist uses Object arrays at the bottom; The underlying LinkedList uses a two-way circular LinkedList data structure;
  • Whether insert and delete are affected by element position:
    • Arraylists are arrays, so the time complexity of inserting and deleting elements depends on their location.For example: executeadd(E e)Method, the ArrayList defaults to append the specified element to the end of the list, which is O(1) time. But if you want to insert and delete elements at position I (add(intindex,E element)The time is O(n-i). Because all the (n-i) elements after the ith and ith elements in the set are moved backwards/forwards by one bit.
    • LinkedList is stored in a LinkedList, so the time complexity of inserting and deleting elements is not affected by the location of elements and is approximate, and the array is approximate.
    • ArrayList is generally used when there are many queries but few inserts and deletions; if there are many inserts and deletions then LinkedList is recommended
  • Whether fast random access is supported: LinkedList does not support efficient access to random elements, whereas ArrayList implements the RandomAccess interface, so it does. Fast random access is the quick retrieval of an element object by its ordinal number (corresponding toget(intindex)Methods).
  • Memory space usage: The space waste of ArrayList is mainly reflected in the amount of space reserved at the end of the list, while the space cost of LinkedList is reflected in the amount of space consumed by each element (because of the storage of direct descendants and direct precursors and data) than ArrayList.

Senior engineer, I can not look at the source code, specific analysis:

  • Each time an ArrayList instance is created, it allocates an initial capacity (default is 10 if no initial capacity is specified). Take the add method as an example. If no initial capacity is specified, the add method checks whether the current array is empty. If empty, the array holding the object is assigned a minimum size, which is 10 by default. When adding large elements, the array size is increased first to improve the efficiency of adding.

  • LinkedList is an ordered and repeated collection of elements. The underlying LinkedList is based on a two-way list, with each node containing references to both its successors and its predecessors. Lists have no capacity limitation, but two-way lists themselves use more space and require additional list pointer operations. Accessing the element get(I)/set(I,e) by subscript moves the pointer into place (from the end if I > half the size of the array) to traverse the list tragically. Add (), addFirst(), add(), add(), RemoveLast () or remove() on iterator() can eliminate pointer movement. Additionally LinkedList implements the Deque (inherited from the Queue interface) interface, which can be used as a Queue.

Not all methods, just for learning, for recording ideas.

ArrayList and LinkedList both implement the List interface

public class ArrayList<E> extends AbstractList<E>
        implements List<E>, RandomAccess.Cloneable.java.io.Serializable{
Copy the code
public class LinkedList<E>
    extends AbstractSequentialList<E>
    implements List<E>, Deque<E>, Cloneable.java.io.Serializable
Copy the code

The constructor

ArrayList provides three constructors, one with no parameters, one with initial capacity, and one with set parameters

public class ArrayList<E> extends AbstractList<E>
        implements List<E>, RandomAccess.Cloneable.java.io.Serializable{
   
public ArrayList(int initialCapacity) {
   if (initialCapacity > 0) {
       // Create an array of initial capacity
     this.elementData = new Object[initialCapacity];
    } else if (initialCapacity == 0) {
     this.elementData = EMPTY_ELEMENTDATA;
    } else {
   throw new IllegalArgumentException("Illegal Capacity: "+initialCapacity); }}public ArrayList(a) {
  // Default is empty array
  this.elementData = DEFAULTCAPACITY_EMPTY_ELEMENTDATA;
}
    
public ArrayList(Collection<? extends E> c) { / /... }
}
Copy the code

LinkedList provides two constructors, and since LinkedList is based, there is no initializing size and no scaling mechanism, except to keep plugging in front and behind

public LinkedList(a) {}public LinkedList(Collection<? extends E> c) {
    this(a); addAll(c); }// LinkedList must have nodes
private static class Node<E> {
    E item;
    Node<E> next;
    Node<E> prev;

    Node(Node<E> prev, E element, Node<E> next) {
        this.item = element;
        this.next = next;
        this.prev = prev; }}Copy the code

insert

ArrayList:

public boolean add(E e) {
    // Make sure the array is large enough to add the element
    ensureCapacityInternal(size + 1);  // Increments modCount!!
    // Put the element into the array
    elementData[size++] = e;
    return true;
}
private void ensureCapacityInternal(int minCapacity) {
    // If the array is empty, the array is initialized
    if (elementData == DEFAULTCAPACITY_EMPTY_ELEMENTDATA) {
        // DEFAULT_CAPACITY is 10, so the default array capacity is 10 when initialized by calling the default ArrayList constructor without arguments
        minCapacity = Math.max(DEFAULT_CAPACITY, minCapacity);
    }

    ensureExplicitCapacity(minCapacity);
}

private void ensureExplicitCapacity(int minCapacity) {
    modCount++;

    // Make sure the array has enough capacity. If not, call the grow method to expand it
    if (minCapacity - elementData.length > 0)
        grow(minCapacity);
}
// Specific method of capacity expansion
private void grow(int minCapacity) {
    // The capacity of the current array
    int oldCapacity = elementData.length;
    // The new array is expanded to 1.5 times its original capacity
    int newCapacity = oldCapacity + (oldCapacity >> 1);
    // If the size of the new array is still smaller than the minimum required, set the size to the minimum required
    if (newCapacity - minCapacity < 0)
        newCapacity = minCapacity;
    MAX_ARRAY_SIZE = integer.max_value - 8
    if (newCapacity - MAX_ARRAY_SIZE > 0)
        newCapacity = hugeCapacity(minCapacity);
    // minCapacity is usually close to size, so this is a win:
    // Copy the elements into the new array
    elementData = Arrays.copyOf(elementData, newCapacity);
}
Copy the code

Of course you can insert it at a given location, and there’s an overloaded method add(int index, E element)

public void add(int index, E element) {
    // Check whether the index exceeds the range of the index
    rangeCheckForAdd(index);
    // This is the same operation as before, just to ensure that the array size is sufficient
    ensureCapacityInternal(size + 1);  // Increments modCount!!
    // Move the specified position and the data following it back one bit
    System.arraycopy(elementData, index, elementData, index + 1,
                     size - index);
    // Adds the element to the specified array location
    elementData[index] = element;
    // The size of the ArrayList changes
    size++;
}
Copy the code

You can see that it is less efficient to move the element each time you insert the specified position.

Insert into LinkedList. Insert into LinkedList. Insert into LinkedList

private static class Node<E> {
    E item;
    Node<E> next;
    Node<E> prev;

    Node(Node<E> prev, E element, Node<E> next) {
        this.item = element;
        this.next = next;
        this.prev = prev; }}Copy the code
public boolean add(E e) {
    // Add elements directly to the end of the queue
    linkLast(e);
    return true;
}

void linkLast(E e) {
    // Save the end of the list. Last is the global variable used to represent the end of the list
    final Node<E> l = last;
    // Create a new node for the element e
    final Node<E> newNode = new Node<>(l, e, null);
    // Set the new node to the end of the queue
    last = newNode;
    // If the end of the queue is empty, then the entire list is empty, and the new node is assigned to the head node
    if (l == null)
        first = newNode;
    else
    // A new node is generated after the old one
        l.next = newNode;
    // Number of nodes +1
    size++;
    modCount++;
}

public void add(int index, E element) {
    // Check whether the index exceeds the index range
    checkPositionIndex(index);
    // If you append to the end, it is the same as add(E E)
    if (index == size)
        linkLast(element);
    else
    // If not, insert it in another position
     linkBefore(element, node(index));
}

// The node method is called in the linkBefore method, similar to the binary lookup optimization
Node<E> node(int index) {
    // assert isElementIndex(index);
    // If index is in the first half, go back to get node
    if (index < (size >> 1)) {
        Node<E> x = first;
        for (int i = 0; i < index; i++)
            x = x.next;
        return x;
    } else {
        // If index is in the last half, traverse backwards to get node
        Node<E> x = last;
        for (int i = size - 1; i > index; i--)
            x = x.prev;
        returnx; }}void linkBefore(E e, Node<E> succ) {
    // assert succ ! = null;
    // Save the front node of the index node
    final Node<E> pred = succ.prev;
    // Create a new destination node
    final Node<E> newNode = new Node<>(pred, e, succ);
    succ.prev = newNode;
    // If it is inserted at the beginning
    if (pred == null)
        first = newNode;
    else
        pred.next = newNode;
    size++;
    modCount++;
}
Copy the code

To obtain

ArrayList’s get() method is very efficient because it simply returns the element at the specified location in the array

public E get(int index) {
    // Check whether the index exceeds the range of the index
    rangeCheck(index);
    // Returns the element at the specified position
    return elementData(index);
}
Copy the code

The LinkedList get() method internally calls the node() method seen above, checks whether it is in the first half of the LinkedList or the second half of the LinkedList.

public E get(int index) {
    checkElementIndex(index);
    return node(index).item;
}
Copy the code

The underlying implementation of HashMap

When will HashMap be used? What is it about him?

Do you know how HashMap works?

Do you know how get and put work? What do equals() and hashCode() do?

Do you know the implementation of Hash? Why is it implemented this way?

What if the size of the HashMap exceeds the capacity defined by the Load factor?

HashMap is implemented slightly differently in JDK 7 and JDK8. Record them separately.

Before diving into HahsMap, you need to understand some of the concepts

  1. InitialCapacity: indicates the initialCapacity. Refers to the capacity of the HashMap collection when it is initialized. Can be specified in the constructor; If not specified, the total capacity defaults to 16. Note that the initial capacity has to be a power of two. (In 1.7, when the number of KV to be stored in HashMap is known, setting a reasonable initial capacity can effectively improve performance)

    static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16
    Copy the code
  2. Size: The number of key-value pairs already stored in the current HashMap, that is, hashmap.size ().

  3. LoadFactor: loadFactor. The load factor is when the HashMap (current capacity/total capacity) reaches a certain value, the HashMap expands. The load factor can also be specified in the constructor, and the default value is 0.75. For example, if you have a HashMap with an initial capacity of 16, the threshold for expansion is 0.75 * 16 = 12. That is, before you want to store the 13th value, HashMap does the expansion first.

  4. Threshold: indicates the capacity expansion threshold. That is, capacity expansion threshold = total capacity of HashMap x load factor. Capacity expansion is performed when the current HashMap capacity is greater than or equal to the capacity expansion threshold. The capacity expanded is twice the total capacity of the current HashMap. For example, if the current total capacity of the HashMap is 16, it will be 32 after expansion.

  5. Table: Indicates an Entry array. We all know that the internal storage of keys/values in a HashMap is implemented through the Entry medium. And a table is an array of entries.

JDK1.7 implementation

In JDK1.7, a HashMap consists of an array and a linked list. The array is the body of the HashMap, and the linked list is mainly used to resolve hash collisions. If the location of the array does not contain a linked list (the next of the current entry points to NULL), the search, add, and other operations are quick, requiring only one address. If the array to be located contains a linked list, the time complexity for the add operation is still O(1) because the latest Entry is inserted into the head of the list, requiring a simple change of the reference chain. For the lookup operation, it is necessary to traverse the list and then compare the search through the Equals method of the key object.

The so-called “zipper method” is a combination of linked lists and arrays. That is, create an array of linked lists, where each cell is a linked list. If a hash conflict is encountered, add the value of the conflict to the linked list.

The source code parsing

A constructor

The Alibaba Java Development Manual recommends specifying the initial value of a collection when initializing the collection. (note: HashMap using HashMap (int initialCapacity) initialization) recommended reason: www.zhihu.com/question/31…

// The default constructor uses the default initial capacity and load factor
// DEFAULT_INITIAL_CAPACITY = 16, DEFAULT_LOAD_FACTOR = 0.75F
public HashMap(a) {
    this(DEFAULT_INITIAL_CAPACITY, DEFAULT_LOAD_FACTOR);
}

// You can specify the initial capacity and use the default load factor
public HashMap(int initialCapacity) {
    this(initialCapacity, DEFAULT_LOAD_FACTOR);
}

public HashMap(int initialCapacity, float loadFactor) {
    // Determine the initial capacity value
    if (initialCapacity < 0)
        throw new IllegalArgumentException("Illegal initial capacity: " +
                                           initialCapacity);
    if (initialCapacity > MAXIMUM_CAPACITY)
        initialCapacity = MAXIMUM_CAPACITY;
    if (loadFactor <= 0 || Float.isNaN(loadFactor))
        throw new IllegalArgumentException("Illegal load factor: " +
                                           loadFactor);
    // Set the load factor
    this.loadFactor = loadFactor;
    threshold = initialCapacity;
    / / short method
    init();
}

public HashMap(Map<? extends K, ? extends V> m) {
  this(Math.max((int) (m.size() / DEFAULT_LOAD_FACTOR) + 1,
                DEFAULT_INITIAL_CAPACITY), DEFAULT_LOAD_FACTOR);
  inflateTable(threshold);
  putAllForCreate(m);
}
Copy the code

The first three constructors of a HashMap all end up calling HashMap(int initialCapacity, float loadFactor). Set the initial capacity and load factor inside it. The final init() method is empty and is implemented mainly for subclasses such as LinkedHashMap.

Put () method
public V put(K key, V value) {
    // If the table array is empty, create the array first and set the expansion threshold
    if (table == EMPTY_TABLE) {
        inflateTable(threshold);
    }
    // If the key is null, call putForNullKey special processing
    if (key == null)
        return putForNullKey(value);
    // Calculate the hash value of key
    int hash = hash(key);
    // Calculate the index in the array based on the calculated hash value and the current array length
    int i = indexFor(hash, table.length);
    // Iterate through the entire list under the array index
    // If the key is already stored in the HashMap, just replace the corresponding value
    for(Entry<K,V> e = table[i]; e ! =null; e = e.next) {
        Object k;
       // Check whether the hash value is the same. If so, check whether the key is the same. Different objects may have the same hash value
        if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
            V oldValue = e.value;
            e.value = value;
            e.recordAccess(this);
            return oldValue;
        }
    }

    modCount++;
    // If the key has not been stored before, the addEntry method is entered
    addEntry(hash, key, value, i);
    return null;
}

void addEntry(int hash, K key, V value, int bucketIndex) {
    // Capacity expansion is performed when the current capacity is greater than or equal to the capacity expansion threshold
    if ((size >= threshold) && (null! = table[bucketIndex])) {// Double the original capacity
        resize(2 * table.length);
        // recalculate the hash value
        hash = (null! = key) ? hash(key) :0;
        // Retrieve the index in the new array
        bucketIndex = indexFor(hash, table.length);
    }
    // Create a node
    createEntry(hash, key, value, bucketIndex);
}

// create a new array, copy all the data, and assign a reference to the new array to the table
void resize(int newCapacity) {
    Entry[] oldTable = table;  // Reference the Entry array before expansion
    int oldCapacity = oldTable.length;
    if (oldCapacity == MAXIMUM_CAPACITY) { // The size of the array before the expansion has reached the maximum (2^30)
        threshold = Integer.MAX_VALUE;  // Change the threshold to the maximum value of int (2^31-1) so that the capacity will not be expanded later
        return;
    }
    // Create a new entry array
    Entry[] newTable = new Entry[newCapacity];
    // Copy the data from the old entry array to the new entry array
    transfer(newTable, initHashSeedAsNeeded(newCapacity));
    // Assign a reference to the new array to table
    table = newTable;
    // Calculate the new capacity expansion threshold
    threshold = (int)Math.min(newCapacity * loadFactor, MAXIMUM_CAPACITY + 1);
}

void transfer(Entry[] newTable) {
     Entry[] src = table;                   // SRC references the old Entry array
     int newCapacity = newTable.length;
     for (int j = 0; j < src.length; j++) { // Iterate over the old Entry array
         Entry<K,V> e = src[j];             // Get each element of the old Entry array
         if(e ! =null) {
             src[j] = null;// Release the object reference of the old Entry array (after the for loop, the old Entry array no longer references any objects)
             do {
                 Entry<K,V> next = e.next;
                 int i = indexFor(e.hash, newCapacity); / /!!!!! Recalculates the position of each element in the array
                 e.next = newTable[i]; / / tag [1]
                 newTable[i] = e;      // Put the element on the array
                 e = next;             // Access the element in the next Entry chain
             } while(e ! =null); }}}void createEntry(int hash, K key, V value, int bucketIndex) {
   // Retrieve the Entry with bucketIndex subscript in the table
    Entry<K,V> e = table[bucketIndex];
   // Create a new Entry using key and value
   // And the Entry stored at table[bucketIndex] is the next of the new Entry
   // Place the newly created Entry in table[bucketIndex]
    table[bucketIndex] = new Entry<>(hash, key, value, e);
    // The capacity of the current HashMap is increased by 1
    size++;
}
Copy the code

The last createEntry() method shows that when a hash conflict occurs, the zip method is used to resolve the hash conflict, and the new element is inserted into the head of the unilateral table.

The get () method
public V get(Object key) {
    // If the key is empty, the getForNullKey method is called for special handling
    if (key == null)
        return getForNullKey();
    // Get the entry corresponding to the key
    Entry<K,V> entry = getEntry(key);

    return null == entry ? null : entry.getValue();
}

// Find the array index corresponding to the key, and then traverse the list to find it
final Entry<K,V> getEntry(Object key) {
    if (size == 0) {
        return null;
    }
    // Calculate the hash value of key
    int hash = (key == null)?0 : hash(key);
    // Get the index of the array and iterate through the list to see if there are any entries with the same key
    for(Entry<K,V> e = table[indexFor(hash, table.length)]; e ! =null;
         e = e.next) {
        Object k;
        if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k))))
            return e;
    }
    // If not, return null
    return null;
}
Copy the code

JDK1.8 implementation

In JDK 1.7, if there are too many hash collisions, the zipper is too long, and in extreme cases all the values fall into the same bucket, this degrades to a linked list. Searching by key value requires traversing the linked list, which is inefficient. JDK1.8 has a major change in resolving hash conflicts, converting lists to red-black trees when their length is greater than the threshold (8 by default) to reduce search time.

TreeMap, TreeSet, and HashMap after JDK1.8 all use red-black trees at their underlying level. Red-black trees are designed to solve the problem of binary search trees, which degenerate into a linear structure in some cases.

The source code parsing

A constructor

The JDK8 constructor hasn’t changed much

public HashMap(a) {
  this.loadFactor = DEFAULT_LOAD_FACTOR; // all other fields defaulted
}

public HashMap(int initialCapacity) {
  this(initialCapacity, DEFAULT_LOAD_FACTOR);
}

public HashMap(int initialCapacity, float loadFactor) {
  if (initialCapacity < 0)
    throw new IllegalArgumentException("Illegal initial capacity: " +
                                       initialCapacity);
  if (initialCapacity > MAXIMUM_CAPACITY)
    initialCapacity = MAXIMUM_CAPACITY;
  if (loadFactor <= 0 || Float.isNaN(loadFactor))
    throw new IllegalArgumentException("Illegal load factor: " +
                                       loadFactor);
  this.loadFactor = loadFactor;
  this.threshold = tableSizeFor(initialCapacity);
}

public HashMap(Map<? extends K, ? extends V> m) {
  this.loadFactor = DEFAULT_LOAD_FACTOR;
  putMapEntries(m, false);
}
Copy the code
Determine the hash bucket array index position (implementation of hash function)
// Method 1:
static final int hash(Object key) { / / jdk1.8 & jdk1.7
 int h;
 // h = key.hashcode () takes the hashCode value for the first step
 // h ^ (h >>> 16) participates in the second step
 return (key == null)?0 : (h = key.hashCode()) ^ (h >>> 16);
}
// Method 2:
static int indexFor(int h, int length) { Put p = TAB [I = (n-1) &hash];
 return h & (length-1); // The third step is the modulo operation
}
Copy the code

The hash map locates the index position of the array, which directly determines the discrete performance of the hash method. Essentially, a Hash algorithm consists of three steps: the hashCode of the key, the high order operation, and the modulus operation.

Why is that?

Why is the length of a HashMap a power of 2?

The purpose of course is to reduce hash collisions, so that the table data distribution is more uniform.

  1. In the HashMap, the size of the bucket array length is always a power of 2. In this case, h & (table.length-1) is equivalent to modulo h%length. But modulus is not as efficient as bit operation, so this is an optimization. If h = 185, table.length-1 = 15(0x1111), the hash is only valid at low 4bits, so it is easy to collide.

  2. The hash in the figure is generated by the hashCode of the key. When calculating the remainder, since n is relatively small, only the lower 4 bits of hash are involved in the calculation, and the calculation of the higher bits can be considered invalid. As a result, the calculation result is only related to the low level information, and the high level data does not play a role. To deal with this flaw, we can xOR the upper four bits of the hash in the figure above with the lower four bits, namely hash ^ (hash >>> 4). In this way, the high level data and the low level data are xor, so as to increase the randomness of the low level information and make the high level data participate in the calculation in a disguised way. The calculation process is as follows:

    In Java, the hashCode method produces a hash of type int, 32 bits wide. The first 16 bits are high and the last 16 bits are low, so move it 16 bits to the right, hash ^ (hash >>> 16). This also increases the hash’s complexity, which in turn affects the hash’s distribution.

Why is the length of a HashMap a power of 2?

In order to make HashMap access efficient and minimize collisions, that is, to distribute data evenly, the Hash value ranges from -2147483648 to 2147483647, with a total of 4 billion mapping space. As long as the Hash function mapping is relatively uniform and loose, collisions are difficult to occur in general applications. But one problem is that 4 billion of array memory doesn’t fit. So you can’t use this hash value directly. Before we use it, we need to modulo the length of the array to get the remainder before we use it to store the location, that is, the corresponding array subscript. The array subscript is computed as (n-1)&hash, where n represents the length of the array.

How should this algorithm be designed?

The first thing we might think of is doing % mod. But here’s the point.

In the mod operation, if the divisor is a power of 2, it is equivalent to the and operation of the divisor minus one, that is, hash%length=hash&(length-1), but the premise is that length is 2 to the n, and the operation of & is more efficient than %. This explains why the length of a HashMap is a power of two.

The get () method
public V get(Object key) {
  Node<K,V> e;
  return (e = getNode(hash(key), key)) == null ? null : e.value;
}

final Node<K,V> getNode(int hash, Object key) {
  Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
  // Locate the bucket where the key-value pair resides
  if((tab = table) ! =null && (n = tab.length) > 0 &&
      (first = tab[(n - 1) & hash]) ! =null) {
    // Direct hit
    if (first.hash == hash && // always check first node((k = first.key) == key || (key ! =null && key.equals(k))))
      return first;
    / / not hit
    if((e = first.next) ! =null) {
      // If first is of type TreeNode, the black mangrove lookup method is called
      if (first instanceof TreeNode)
        return ((TreeNode<K,V>)first).getTreeNode(hash, key);
      do {
        // Look it up in the list
        if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k))))
          return e;
      } while((e = e.next) ! =null); }}return null;
}
Copy the code
Put () method

public V put(K key, V value) {
  // Hash key hashCode()
  return putVal(hash(key), key, value, false.true);
}

final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
               boolean evict) {
  Node<K,V>[] tab; Node<K,V> p; int n, i;
  // if TAB is empty, it will be created
  if ((tab = table) == null || (n = tab.length) == 0)
    n = (tab = resize()).length;
  // Compute index and handle null
  if ((p = tab[i = (n - 1) & hash]) == null)
    tab[i] = newNode(hash, key, value, null);
  else {
    Node<K,V> e; K k;
    // The node key exists, overwriting value directly
    if(p.hash == hash && ((k = p.key) == key || (key ! =null && key.equals(k))))
      e = p;
    // Determine the chain to be a red-black tree
    else if (p instanceof TreeNode)
      e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
    else {
      // The chain is a linked list
      for (int binCount = 0; ; ++binCount) {
        if ((e = p.next) == null) {
          p.next = newNode(hash, key, value, null);
          // List length greater than 8 is converted to red black tree for processing
          if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
            treeifyBin(tab, hash);
          break;
        }
        // Key already exists overwriting value directly
        if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k))))
          break; p = e; }}if(e ! =null) { // existing mapping for key
      V oldValue = e.value;
      if(! onlyIfAbsent || oldValue ==null)
        e.value = value;
      afterNodeAccess(e);
      return oldValue;
    }
  }
  ++modCount;
  // Expand the capacity when the capacity exceeds the maximum
  if (++size > threshold)
    resize();
  afterNodeInsertion(evict);
  return null;
}
Copy the code

How does a HashMap put function flow?

Check whether the key-value pair array table[I] is empty or null; otherwise, perform resize() for expansion;

Table [I]==null; table[I]==null; table[I]==null;

Check whether the first element of the table[I] is the same as the first element of the key. If the first element of the table[I] is the same as the first element of the key.

Check whether table[I] is a treeNode, i.e. whether table[I] is a red-black tree. If table[I] is a red-black tree, insert key pairs directly into the tree; otherwise, change to ⑤.

(5). Traverses table[I] to determine whether the length of the linked list is greater than 8. If the length is greater than 8, the linked list is converted into a red-black tree, and the insertion operation is performed in the red-black tree; otherwise, the insertion operation of the linked list is carried out; If the key already exists in the traversal process, the value can be overwritten directly.

⑥. After the insert is successful, check whether the actual number of key value pairs size exceeds the maximum capacity threshold. If so, expand the capacity.

The resize () expansion
final Node<K,V>[] resize() {
  Node<K,V>[] oldTab = table;
  int oldCap = (oldTab == null)?0 : oldTab.length;
  int oldThr = threshold;
  int newCap, newThr = 0;
  if (oldCap > 0) {
    // If it exceeds the maximum value, it will no longer be expanded
    if (oldCap >= MAXIMUM_CAPACITY) {
      // Change the threshold to the maximum value of int (2^31-1) so that the capacity will not be expanded later
      threshold = Integer.MAX_VALUE;
      return oldTab;
    }
     // If the value is not higher than the maximum value, the value is doubled
    else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
             oldCap >= DEFAULT_INITIAL_CAPACITY)
      newThr = oldThr << 1; // double threshold
  }
  else if (oldThr > 0) // initial capacity was placed in threshold
    newCap = oldThr;
  else {               // zero initial threshold signifies using defaults
    newCap = DEFAULT_INITIAL_CAPACITY;
    newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
  }
  // Calculate the new resize upper limit
  if (newThr == 0) {
    float ft = (float)newCap * loadFactor;
    newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
              (int)ft : Integer.MAX_VALUE);
  }
  threshold = newThr;
  // Start copying to the new array
  @SuppressWarnings({"rawtypes"."unchecked"})
  Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
  table = newTab;
  // Move each bucket to the new buckets
  if(oldTab ! =null) {
    // Loop over the old table array
    for (int j = 0; j < oldCap; ++j) {
      Node<K,V> e;
      if((e = oldTab[j]) ! =null) {
        oldTab[j] = null;
        // If the bucket has only one element, copy it directly
        if (e.next == null)
          newTab[e.hash & (newCap - 1)] = e;
        // If it is a red-black tree, then split the red-black tree
        else if (e instanceof TreeNode)
          ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
         // Walk through the list and divide the original list into LO and HI
         // Where lo stands for the old bucket position, hi stands for the new bucket position
        else { // The preserve Order list optimizes the code block for rehash
          Node<K,V> loHead = null, loTail = null;
          Node<K,V> hiHead = null, hiTail = null;
          Node<K,V> next;
          do {
            / / the original indexes
            next = e.next;
            if ((e.hash & oldCap) == 0) {
              if (loTail == null)
                loHead = e;
              else
                loTail.next = e;
              loTail = e;
            }
            // Old index +oldCap
            else {
              if (hiTail == null)
                hiHead = e;
              elsehiTail.next = e; hiTail = e; }}while((e = next) ! =null);
          // Put the original index in the bucket
          if(loTail ! =null) {
            loTail.next = null;
            newTab[j] = loHead;
          }
          // Add oldCap to bucket
          if(hiTail ! =null) {
            hiTail.next = null;
            newTab[j + oldCap] = hiHead;
          }
        }
      }
    }
  }
  return newTab;
}
Copy the code

How is the HashMap expansion operation implemented?

  1. In JDK1.8, the resize method is called when the key pair in the HashMap exceeds the threshold or during initialization.
  2. Every time it expands, it’s twice as big;
  3. After the extension, the Node object is either in the same position or moved to twice the original offset.

In putVal(), we see that the resize() method is used twice in this function. The resize() method means that it will be expanded on the first initialization, or when the actual size of the array is greater than its expansion threshold (0.75 * 16 = 12 for the first time). This is an optimization of JDK1.8. In 1.7, after expansion, it is necessary to re-calculate the Hash value and distribute the bucket according to the Hash value. In 1.8, however, If (e.hash & oldCap) is 0 in the same bucket location, the element will either stay at the original location or move to the original location + the increased array size after the hash assignment

List the tree

When the list length in the bucket exceeds TREEIFY_THRESHOLD (default: 8), the treeifyBin method is called for tree-ization. However, this is not necessarily tree-like, because the HashMap capacity is also determined in the treeifyBin method to be greater than or equal to 64. If the capacity is greater than or equal to 64, tree the capacity. Otherwise, expand the capacity first.

Why do we have to tree to see if the total capacity is greater than 64?

When the bucket array capacity is small, the collision rate of key-value on node hash may be high, resulting in long list length, resulting in low query efficiency. At this point, we have two options, one is to expand, to make the hash collision rate lower. The other is treification to improve query efficiency.

If we use scaling, then all we need to do is make a copy of the linked list data. If we were to tree, we would need to turn the list into a red-black tree. Up to this point, it doesn’t look like there’s much difference, but we let the scene continue. As we insert more and more data, we see more and more linked lists that need to be converted into trees.

At a certain capacity, we need to expand. At this point, we had a lot of tree-like red-black trees, and when we expanded, we had to split a lot of red-black trees into linked lists, which was a pretty big cost. If we expand when the capacity is small, the fewer lists we need to tree, the lower the cost of expansion.

Let’s see how list tree works:

    final void treeifyBin(Node<K,V>[] tab, int hash) {
        int n, index; Node<K,V> e;
        // 1. If the capacity is smaller than MIN_TREEIFY_CAPACITY, capacity expansion is preferred
        if (tab == null || (n = tab.length) < MIN_TREEIFY_CAPACITY)
            resize();
        // 2. If the bucket is not empty, tree it
        else if ((e = tab[index = (n - 1) & hash]) ! =null) {
            TreeNode<K,V> hd = null, tl = null;
             // 2.1 First convert the linked list to TreeNode's bidirectional list
            do {
                TreeNode<K,V> p = replacementTreeNode(e, null);
                if (tl == null)
                    hd = p;
                else {
                    p.prev = tl;
                    tl.next = p;
                }
                tl = p;
            } while((e = e.next) ! =null);
            // 2.2 Tree the bidirectional linked list represented by TreeNode
            if((tab[index] = hd) ! =null) hd.treeify(tab); }}Copy the code

We can see that the whole idea of tree lists is also a little bit clearer. The list is first turned into a bidirectional list represented by TreeNode, and then the treeify() method is called. So let’s move on to the implementation of the Treeify () method.

final void treeify(Node<K,V>[] tab) {
    TreeNode<K,V> root = null;
    // 1. Traverse the bidirectional TreeNode linked list and insert TreeNode nodes one by one
    for (TreeNode<K,V> x = this, next; x ! =null; x = next) {
        next = (TreeNode<K,V>)x.next;
        x.left = x.right = null;
        // 2. If the root node is empty, set node X directly as the root node
        if (root == null) {
            x.parent = null;
            x.red = false;
            root = x;
        }
        else {
            K k = x.key;
            inth = x.hash; Class<? > kc =null;            
            // 3. If the root node is not empty, compare and insert it in the appropriate place
            for (TreeNode<K,V> p = root;;) {
                int dir, ph;
                K pk = p.key;
                // 4. Calculate the dir value, -1 means to find the insertion point from the left subtree, 1 means to find the insertion point from the right subtree
                if ((ph = p.hash) > h)
                    dir = -1;
                else if (ph < h)
                    dir = 1;
                else if ((kc == null &&
                          (kc = comparableClassFor(k)) == null) ||
                         (dir = compareComparables(kc, k, pk)) == 0)
                    dir = tieBreakOrder(k, pk);

                TreeNode<K,V> xp = p;
                // 5. If a point p is found, this point is a leaf node
                // Then this is the insertion position
                if ((p = (dir <= 0)? p.left : p.right) ==null) {
                    x.parent = xp;   
                    if (dir <= 0)
                        xp.left = x;
                    else
                        xp.right = x;
                    // Perform dynamic balancing after insertion
                    root = balanceInsertion(root, x);
                    break; }}}}// 6. Move the root node to the list head
    moveRootToFront(tab, root);
}
Copy the code

As you can see from the code above, the Treeify () method simply treify the bidirectional TreeNode list further into a red-black tree. The general steps are as follows:

  • The TreeNode bidirectional linked list is traversed and the TreeNode nodes are inserted one by one
  • If the root node is empty, then the red-black tree is now empty and the node is directly used as the root node. Otherwise, you need to find an appropriate position to insert the TreeNode.
  • Constantly look for the most appropriate point by comparing the location with the root node. If the final leaf node of the node is empty, then the node P is the parent of the inserted node. Next, the parent reference of the X node points to the XP node, and the left or right child of the XP node points to the X node.
  • Then I called the balanceInsertion to do a dynamic balancing.
  • Finally, the moveRootToFront method is called to move the root node to the head of the list.

Insertion insertion () for red black tree dynamic balancing. Finally, let’s move on to the moveRootToFront method.

static <K,V> void moveRootToFront(Node<K,V>[] tab, TreeNode<K,V> root) {
    int n;
    if(root ! =null&& tab ! =null && (n = tab.length) > 0) {
        int index = (n - 1) & root.hash;
        TreeNode<K,V> first = (TreeNode<K,V>)tab[index];
        // If the root node inserted into the red-black tree is not the first element in the list
        // Insert the root node before the first node
        if(root ! = first) { Node<K,V> rn; tab[index] = root; TreeNode<K,V> rp = root.prev;// The following two if statements do the same thing: take the root node out
            // make the predecessor of the root node point to the successor node
            // make the successor of the root node point to its predecessor
            if((rn = root.next) ! =null)
                ((TreeNode<K,V>)rn).prev = rp;
            if(rp ! =null)
                rp.next = rn;
            // Insert the root node directly before the first node
            if(first ! =null)
                first.prev = root;
            root.next = first;
            root.prev = null;
        }
        assert checkInvariants(root); }}Copy the code
Red black tree split

After capacity expansion, common nodes need to be remapped, and red-black tree nodes are no exception. As a general idea, we can first turn a red-black tree into a linked list and then remap the linked list. However, because the TreeNode stores the insertion order of the elements when the red-black tree is inserted, it can be directly restored to the linked list according to the insertion order. In this way, the hash mapping is avoided after the red-black tree is transformed into a linked list, which improves efficiency virtually.

final void split(HashMap<K,V> map, Node<K,V>[] tab, int index, int bit) {
    TreeNode<K,V> b = this;
    // Relink into lo and hi lists, preserving order
    TreeNode<K,V> loHead = null, loTail = null;
    TreeNode<K,V> hiHead = null, hiTail = null;
    int lc = 0, hc = 0;
    // 1. Think of a red-black tree as a bidirectional list of Treenodes
    // Add them to the linked list starting with loHead and hiHead
    for(TreeNode<K,V> e = b, next; e ! =null; e = next) {
        next = (TreeNode<K,V>)e.next;
        e.next = null;
        // 1.1. The node is added to the loHead list in the same position after expansion
        if ((e.hash & bit) == 0) {
            if ((e.prev = loTail) == null)
                loHead = e;
            else
                loTail.next = e;
            loTail = e;
            ++lc;
        }
        // 1.2 After expansion, the position is changed and added to the hiHead linked list
        else {
            if ((e.prev = hiTail) == null)
                hiHead = e;
            elsehiTail.next = e; hiTail = e; ++hc; }}// 2. Tree or list loHead and hiHead
    if(loHead ! =null) {
        If the length of the list is less than the threshold, then list, otherwise tree
        if (lc <= UNTREEIFY_THRESHOLD)
            tab[index] = loHead.untreeify(map);
        else {
            tab[index] = loHead;
            if(hiHead ! =null) // (else is already treeified)loHead.treeify(tab); }}if(hiHead ! =null) {
        if (hc <= UNTREEIFY_THRESHOLD)
            tab[index + bit] = hiHead.untreeify(map);
        else {
            tab[index + bit] = hiHead;
            if(loHead ! =null) hiHead.treeify(tab); }}}Copy the code

We know from the above code that the expansion of a red-black tree is the same as the transfer of a linked list. The difference is that after it is converted to a hiHead or loHead, it will choose whether to split into a linked list or maintain the structure of a red-black tree according to the length of the list.

Red black tree chain

When we talked about red-black tree splitting, we said that if the red-black tree structure is below the threshold when it expands, it will be converted to a linked list. Its implementation code is as follows:

final Node<K,V> untreeify(HashMap<K,V> map) {
    Node<K,V> hd = null, tl = null;
    for (Node<K,V> q = this; q ! =null; q = q.next) {
        Node<K,V> p = map.replacementNode(q, null);
        if (tl == null)
            hd = p;
        else
            tl.next = p;
        tl = p;
    }
    return hd;
}
Copy the code

Because the red-black tree contains the order in which the elements are inserted, when we split the red-black tree into two linked lists, hiHead and loHead, the order remains the same. So all we need to do is loop through and replace TreeNode with Node.

HashMap: JDK1.7 VS JDK1.8

JDK1.8 mainly addresses or optimizes the following issues:

  • Resize Capacity expansion optimization
  • Red black tree is introduced to avoid the query efficiency caused by the excessively long single list
  • It solves the problem of multi-thread dead-loop, but it is still non-thread-safe, which may cause data loss problems when multi-threading
different JDK 1.7 JDK 1.8
Storage structure Array + linked list Array + linked list + red-black tree
Initialization mode Single function: inflateTable() Directly integrated into the expansion function resize()
Hash value calculation method Perturbation = 9 perturbations = 4 bit operations + 5 XOR operations Perturbation = 2 perturbations = 1 bit operation + 1 xOR operation
Rules for storing data When there are no conflicts, store arrays. In case of conflicts, store the linked list When there are no conflicts, store arrays. Collision & List length < 8: store single linked list; Collision & list length > 8: tree and red black tree
Data insertion mode Head insertion method (first move the data from the original position to the last 1 bit, and then insert the data to the position) Tail insertion (directly into the end of the list/red-black tree)
How to calculate the storage location after capacity expansion Do all the calculations in the original way (i.e. HashCode ->> Perturbation function ->> (h&Length-1)) Calculate the capacity based on the rule after expansion (after expansion = original location or original location + old capacity)

Hashtable

Hashtable and HashMap are hash tables and hash tables implemented using the “zip method”. Save data is an Entity object like a HashMap in JDK7, except that almost all public methods in Hashtable are synchronized, and some methods are implemented internally with synchronized blocks. Efficiency is bound to decrease. And the put() method does not allow null values.

Difference between HashMap and Hashtable

  1. Thread-safe: HashMap is not thread-safe, HashTable is thread-safe; Methods inside a HashTable are basically synchronized modified. (Use ConcurrentHashMap if you want to be thread-safe!) ;

  2. Efficiency: Because of thread-safety issues, HashMap is a little more efficient than HashTable. Also, HashTable is largely obsolete, so don’t use it in your code;

  3. Support for Null keys and Null values: In a HashMap, Null can be used as a key. There is only one such key, and there can be one or more keys corresponding to the value Null. A NullPointerException is thrown whenever a key value in a HashTable contains a NULL.

  4. The difference between the initial capacity and each expansion capacity:

    (1) If you do not specify an initial capacity when creating a Hashtable, the default initial size of the Hashtable is 11. After each expansion, the original capacity is 2n+1. The default initialization size of a HashMap is 16. After each expansion, the capacity is doubled.

    (2) If you create a Hashtable with an initial capacity, then the Hashtable will use the given size, and the HashMap will expand it to a power of 2. That is, a HashMap always uses a power of 2 as its hash table size, and I’ll explain why later.

  5. Underlying data structure: HashMap since JDK1.8 has a major change in resolving hash collisions, converting lists to red-black trees to reduce search time when the list length is greater than the threshold (8 by default). Hashtable has no such mechanism.

  6. HashMap iterators are fail-fast iterators, but Hashtable iterators are not Fail-fast iterators. If there are other threads to add/remove elements, HashMap will throw ConcurrentModificationException, but the remove method remove elements of iterator itself will not throw an exception. This is also the difference between Enumeration and Iterator.

ConcurrentHashMap

When a HashMap is multithreaded, when a put exceeds the capacity (as determined by the load factor) it triggers an expansion operation, known as a rehash, which rehashes the contents of the original array into the new expansion array. If the hash value is the same, it may be represented by a linked list in the same array at the same time, resulting in a closed loop, resulting in an infinite loop during get. Therefore, the HashMap is not thread safe. (refer to: www.jianshu.com/p/e2f75c8cc…

Hashtable is thread-safe. It uses the synchronized keyword to lock the entire table, which means that all threads are competing for a lock. In a multi-threaded environment, Hashtable is safe, but undoubtedly inefficient.

JDK1.7 implementation

The reason the Hashtable container is inefficient in a highly competitive concurrent environment is that all threads accessing the Hashtable must compete for the same lock. If there are multiple locks in the container, and each lock is used to lock one portion of the container’s data, then when multiple threads access different segments of the container’s data, There is no lock contention between threads, which is the lock fragmentation technique used by ConcurrentHashMap.

In JDK1.7, the data structure of ConcurrentHashMap consists of an array of segments and multiple hashentries. The Segment array is used to split a large table into smaller tables for locking. Each Segment element stores a HashEntry array + linked list, which is the same data storage structure as a HashMap.

The ConcurrentHashMap class contains two static inner classes, HashEntry and Segment. HashEntry is used to encapsulate key-value pairs of the mapping table. Segment acts as a lock. Each Segment object guards buckets of the entire hash mapping table. Each bucket is a linked list of several HashEntry objects. An instance of ConcurrentHashMap contains an array of Segment objects. Each Segment guards an element in a HashEntry array. When modifying the HashEntry array, you must first acquire its Segment lock.

Segment class

The Segment class inherits from the ReentrantLock class, enabling the Segment object to act as a ReentrantLock. A Segment is a subhash table, and a HashEntry array is maintained in the Segment. In a concurrent environment, there is no need to consider lock contention for different segments.

As you can see from the source code, the Segment inner class is very similar to the HashMap we saw above. There are load factors, thresholds, all sorts of things.

static final class Segment<K.V> extends ReentrantLock implements Serializable {
  
  static final int MAX_SCAN_RETRIES =
    Runtime.getRuntime().availableProcessors() > 1 ? 64 : 1;
  transient volatile HashEntry<K,V>[] table;
  transient int count;
  transient int modCount;  // Record the number of changes
  transient int threshold;
  final float loadFactor;

  Segment(float lf, int threshold, HashEntry<K,V>[] tab) {
    this.loadFactor = lf;
    this.threshold = threshold;
    this.table = tab;
  }

  // the put method will add the lock,
  final V put(K key, int hash, V value, boolean onlyIfAbsent) {
    HashEntry<K,V> node = tryLock() ? null :
                scanAndLockForPut(key, hash, value);
    // ...
  }

  @SuppressWarnings("unchecked")
  private void rehash(HashEntry<K,V> node) {
    // ...
  }

  private HashEntry<K,V> scanAndLockForPut(K key, int hash, V value) {
 		/ /...
  }

  private void scanAndLock(Object key, int hash) {
 		/ /...
  }

  final V remove(Object key, int hash, Object value) {
    / /...
  }

  final boolean replace(K key, int hash, V oldValue, V newValue) {
  	/ /...
  }

  final V replace(K key, int hash, V value) {
    / /...
  }

  final void clear(a) {
		/ /...}}Copy the code

HashEntry class

HashEntry is currently the smallest logical processing unit. A ConcurrentHashMap maintains an array of segments, and a Segment maintains a HashEntry array.

static final class HashEntry<K.V> {
  final int hash;
  final K key;
  volatile V value;   // Value is of the volatie type, which is guaranteed to be visible
  volatile HashEntry<K,V> next;
	/ /...
}
Copy the code

ConcurrentHashMap class

By default, each ConcurrentHashMap class creates 16 concurrent segments, each containing multiple Hash tables, and each Hash chain consists of HashEntry nodes.

public class ConcurrentHashMap<K.V> extends AbstractMap<K.V>
        implements ConcurrentMap<K.V>, Serializable {
  // The default capacity is 16, that is, 16 buckets by default
  static final int DEFAULT_INITIAL_CAPACITY = 16;
  static final float DEFAULT_LOAD_FACTOR = 0.75 f;
  // The default concurrency level is 16. This value represents the estimate of the current update thread
  static final int DEFAULT_CONCURRENCY_LEVEL = 16;
  
  static final int MAXIMUM_CAPACITY = 1 << 30;
  static final int MIN_SEGMENT_TABLE_CAPACITY = 2;  
  static final int MAX_SEGMENTS = 1 << 16; // slightly conservative  
  static final int RETRIES_BEFORE_LOCK = 2;
  final int segmentMask;  // Segment mask is used to locate segments
  final int segmentShift;
  final Segment<K,V>[] segments;   // The trunk is the segmented lock array
  
  / / the constructor
  public ConcurrentHashMap(int initialCapacity,
                             float loadFactor, int concurrencyLevel) {
        if(! (loadFactor >0) || initialCapacity < 0 || concurrencyLevel <= 0)
            throw new IllegalArgumentException();
       //MAX_SEGMENTS = 1<<16=65536, that is, the maximum number of concurrent connections is 65536
        if (concurrencyLevel > MAX_SEGMENTS)
            concurrencyLevel = MAX_SEGMENTS;
        // Sshif = ssize,sshift=4; ssize=32,sshif=5
        int sshift = 0;
        // ssize is the length of the segments array, calculated based on concurrentLevel
        int ssize = 1;
        while (ssize < concurrencyLevel) {
            ++sshift;
            ssize <<= 1;
        }
        this.segmentShift = 32 - sshift;
        this.segmentMask = ssize - 1;
        if (initialCapacity > MAXIMUM_CAPACITY)
            initialCapacity = MAXIMUM_CAPACITY;
        int c = initialCapacity / ssize;
        if (c * ssize < initialCapacity)
            ++c;
        int cap = MIN_SEGMENT_TABLE_CAPACITY;
        while (cap < c)
            cap <<= 1;
        // Create a segments array and initialize the first Segment, delaying initialization for the rest of the Segment
        Segment<K,V> s0 =
            new Segment<K,V>(loadFactor, (int)(cap * loadFactor),
                             (HashEntry<K,V>[])new HashEntry[cap]);
        Segment<K,V>[] ss = (Segment<K,V>[])new Segment[ssize];
        UNSAFE.putOrderedObject(ss, SBASE, s0); // ordered write of segments[0]
        this.segments = ss; }}Copy the code

Put () method

  1. ** Locate the segment and ensure that the segment is initialized **
  2. Call the Segment put method.
public V put(K key, V value) {
  Segment<K,V> s;
  //concurrentHashMap Key /value cannot be empty
  if (value == null)
    throw new NullPointerException();
  // The hash function rehashes the key's hashCode to avoid bad, irrational hashcodes and ensure a uniform hash
  int hash = hash(key);
  // The hash value returned is unsigned right shift segmentShift bit and segment mask bit arithmetic to locate the segment
  int j = (hash >>> segmentShift) & segmentMask;
  if ((s = (Segment<K,V>)UNSAFE.getObject          // nonvolatile; recheck
       (segments, (j << SSHIFT) + SBASE)) == null) // in ensureSegment
    s = ensureSegment(j);
  return s.put(key, hash, value, false);
}
Copy the code

The get () method

The get method does not need to be locked, because the shared variables involved are volatile, which guarantees memory visibility and therefore does not read stale data

public V get(Object key) {
  Segment<K,V> s; // manually integrate access methods to reduce overhead
  HashEntry<K,V>[] tab;
  int h = hash(key);
  long u = (((h >>> segmentShift) & segmentMask) << SSHIFT) + SBASE;
  if((s = (Segment<K,V>)UNSAFE.getObjectVolatile(segments, u)) ! =null&& (tab = s.table) ! =null) {
    for (HashEntry<K,V> e = (HashEntry<K,V>) UNSAFE.getObjectVolatile
         (tab, ((long)(((tab.length - 1) & h)) << TSHIFT) + TBASE); e ! =null; e = e.next) {
      K k;
      if ((k = e.key) == key || (e.hash == h && key.equals(k)))
        returne.value; }}return null;
}
Copy the code

JDK1.8 implementation

ConcurrentHashMap has been greatly changed in JDK8, increasing the code volume from 1000 + lines to 6000 lines alone! 1.8 Abandons the concept of Segment and adopts CAS + synchronized to ensure the security of concurrency.

As you can see, it is very similar to the data structure of HashMap 1.8. The underlying data structure is changed to the data form of array + linked list + red-black tree.

Some of the same things as HashMap1.8

  • The underlying data structure is consistent
  • HashMap initialization is done when the element is first put, not init
  • The underlying array length of a HashMap is always a whole power of 2
  • The default tree threshold is 8 and the linked list threshold is 6 (below this threshold, red-black trees become linked lists)
  • The hash algorithm is similar, but with an extra step& HASH_BITS, this step is to eliminate the negative symbol of the highest bit. The negative of the hash has special meaning in ConcurrentHashMapExpansion or tree node
static final int HASH_BITS = 0x7fffffff; // usable bits of normal node hash

static final int spread(int h) {
    return (h ^ (h >>> 16)) & HASH_BITS;
}
Copy the code

Some key attributes

private static final int MAXIMUM_CAPACITY = 1 << 30; // The maximum size of the array is the same as that of HashMap

private static final int DEFAULT_CAPACITY = 16;// Array default size

static final int MAX_ARRAY_SIZE = Integer.MAX_VALUE - 8; // The maximum possible value of an array, which needs to be associated with the toArray () method

private static final int DEFAULT_CONCURRENCY_LEVEL = 16; // The default thread concurrency, similar to semaphores

private static final float LOAD_FACTOR = 0.75 f;// The default map expansion ratio is (n << 1) - (n >>> 1), which is more efficient

static final int TREEIFY_THRESHOLD = 8; // List to tree threshold, greater than 8

static final int UNTREEIFY_THRESHOLD = 6; <=UNTREEIFY_THRESHOLD untreeify(LO); // The tree list threshold is less than or equal to 6 (for tranfer, lc, hc=0 and two counters ++ record the number of original bin and new binTreeNode respectively, <=UNTREEIFY_THRESHOLD untreeify(LO)). Tree list is only possible with tranfer expansion

static final int MIN_TREEIFY_CAPACITY = 64;

private static final int MIN_TRANSFER_STRIDE = 16;// The minimum array group size for expansion transfer

private static int RESIZE_STAMP_BITS = 16;// This class does not provide a modified method to generate a timestamp function for position n

private static final int MAX_RESIZERS = (1< < (32 - RESIZE_STAMP_BITS)) - 1; // 2^15-1, help resize Maximum number of threads

private static final int RESIZE_STAMP_SHIFT = 32 - RESIZE_STAMP_BITS; // If 32-16=16, record the offset of size size in sizeCtl

static final int MOVED = -1; // Hash for Forwarding nodes (hash value of forwarding nodes), hash bit

static final int TREEBIN = -2; // Hash for roots of trees

static final int RESERVED = -3; // The hash of ReservationNode

static final int HASH_BITS = 0x7fffffff; // Eliminate negative hashing using bits and computations when calculating hashes

static final int NCPU = Runtime.getRuntime().availableProcessors(); // Number of available processors

/* ---------------- Fields -------------- */

transient volatile Node<K,V>[] table; // An array of nodes, used as a data container for ConcurrentHashMap, is lazily loaded and not initialized until the first insertion. The array size is always a power of 2.

private transient volatile Node<K,V>[] nextTable; // The value is null during capacity expansion and non-null only during capacity expansion

/** * actually holds the number of elements in the hashMap updated using the CAS lock, but it does not return the number of elements in the current hashmap */
private transient volatile long baseCount;

/** * This attribute controls the size of the table array, depending on whether it is initialized or expanding. * When the value is negative, -1 indicates that it is initializing, and -n indicates that n-1 threads are expanding. * When the value is positive: If the current array is null, it indicates the size of the array to be created during table initialization. If it has been initialized, it indicates that the available capacity of the current data container (table array) can also be regarded as the critical value (if the number of inserted nodes exceeds the critical value, the capacity needs to be expanded). Specifically, it refers to the length of the array N times the loading factor loadFactor. When the value is 0, the array length is the default initial value. * /
private transient volatile int sizeCtl;
Copy the code

Put () method

  1. First check whether the key and value are empty, if empty throw exception!
  2. spread()Method to get the hash and reduce hash collisions
  3. Check whether the table array is initialized or notinitTable()Method to initialize
  4. Determines whether the new value can be inserted directly into the table array
  5. Determine whether the current capacity is being expanded.MOVEDIf the value is -1, ConcurrentHashMap is expanding capacity. If the capacity is expanding, help the capacity expansion
  6. When table[I] is the head node of the linked list, new values are inserted into the linked list and locked by synchronized (f) to achieve thread safety.
    1. If a node in the linked list is found with the same key as the key pair to be inserted, it is overwritten
    2. If not, append the key-value pair to the end of the list
  7. Insert new value/overwrite old value in red-black tree when table[I] is root node
  8. The value is adjusted according to the current number of nodes. If the number of nodes is greater than or equal to 8, the value is calledtreeifyBinMethods the tabel [I]The ith hash bucketZip to red black tree)
  9. The current capacity is checked and expanded if it exceeds the threshold (actual size x load factor)
final V putVal(K key, V value, boolean onlyIfAbsent) {
    // Neither key nor value can be null
    if (key == null || value == null) throw new NullPointerException();
    // Compute the hash value based on the key
    int hash = spread(key.hashCode());
    int binCount = 0;
    for (Node<K,V>[] tab = table;;) {
        Node<K,V> f; int n, i, fh;
         // Determine whether initialization is required
        if (tab == null || (n = tab.length) == 0)
            tab = initTable();
        // f is the Node located by the current key. If f is empty, data can be written to the current position. If CAS is used to attempt to write data, the spin is guaranteed to succeed
        else if ((f = tabAt(tab, i = (n - 1) & hash)) == null) {
            if (casTabAt(tab, i, null.new Node<K,V>(hash, key, value, null)))
                break;                   // no lock when adding to empty bin
        }
        // If the current location of hashCode == MOVED == -1, the capacity needs to be expanded
        else if ((fh = f.hash) == MOVED)
            tab = helpTransfer(tab, f);
        // Synchronized locks are used to write data
        else {
          // Insert a linked list and a red-black tree
            V oldVal = null;
          // Use synchronous built-in lock for concurrency control
            synchronized (f) {
                if (tabAt(tab, i) == f) {
                    // If fh=f.hash >=0, the list is currently linked, and a new key-value pair is inserted into the list
                    if (fh >= 0) {
                        binCount = 1;
                      If a node is found, change the value, otherwise add the node directly to the end of the list
                        for (Node<K,V> e = f;; ++binCount) {
                            K ek;
                            if(e.hash == hash && ((ek = e.key) == key || (ek ! =null && key.equals(ek)))) {
                                oldVal = e.val;
                                if(! onlyIfAbsent) e.val = value;break;
                            }
                            Node<K,V> pred = e;
                            if ((e = e.next) == null) {
                                pred.next = new Node<K,V>(hash, key,
                                                          value, null);
                                break; }}}// Insert the new key-value pair into the red-black tree
                    else if (f instanceof TreeBin) {
                        Node<K,V> p;
                        binCount = 2;
                        if((p = ((TreeBin<K,V>)f).putTreeVal(hash, key, value)) ! =null) {
                            oldVal = p.val;
                            if(! onlyIfAbsent) p.val = value; }}}}// Convert the key-value pair to a red-black tree based on the actual size
            if(binCount ! =0) {
                if (binCount >= TREEIFY_THRESHOLD)
                    treeifyBin(tab, i);
                if(oldVal ! =null)
                    return oldVal;
                break; }}}// Check the current capacity. If it exceeds the threshold (actual size x load factor), you need to expand the capacity
    addCount(1L, binCount);
    return null;
}
Copy the code

We can see that the implementation in JDK8 is the same idea of locking a Node, but not a Segment in JDK7. The operations before locking a Node are unlocked and thread-safe, based on the atomic operations mentioned earlier.

The get () method

The get method does not need to be locked, because the shared variables involved are volatile, which guarantees memory visibility and therefore does not read stale data

public V get(Object key) {
  Node<K,V>[] tab; Node<K,V> e, p; int n, eh; K ek;
  int h = spread(key.hashCode());
  // Check whether the array is empty
  if((tab = table) ! =null && (n = tab.length) > 0 &&
      (e = tabAt(tab, (n - 1) & h)) ! =null) {
    // Check whether the first element of node is to be found
    if ((eh = e.hash) == h) {
      if((ek = e.key) == key || (ek ! =null && key.equals(ek)))
        return e.val;
    }
    If // // hash is less than 0, a special node (TreeBin or ForwardingNode) invokes find
    else if (eh < 0)
      return(p = e.find(h, key)) ! =null ? p.val : null;
    // If it is not the case above, it is the linked list
    while((e = e.next) ! =null) {
      if(e.hash == h && ((ek = e.key) == key || (ek ! =null && key.equals(ek))))
        returne.val; }}return null;
}
Copy the code

The difference between Hashtable and ConcurrentHashMap

The main difference between ConcurrentHashMap and Hashtable is in the way thread-safe is implemented.

  • Data structure: ConcurrentHashMap in JDK1.7 uses a partialized array + linked list implementation. The data structure in JDK1.8 is similar to that in HashMap1.8: array + linked list/red-black binary tree. The underlying data structure of Hashtable and HashMap before JDK1.8 is similar to that of array + linked list. Array is the body of HashMap, while linked list is mainly used to solve hash conflicts.
  • Thread-safe approach (important) :
    • In JDK1.7, ConcurrentHashMap segments the entire bucket array. Each lock locks only a portion of the container’s data. When multiple threads access different data segments in the container, there is no lock contention, which improves the concurrent access rate. (The default allocation of 16 segments is 16 times more efficient than Hashtable.) In JDK1.8, the concept of Segment has been abandoned, but directly using Node array + linked list/red-black tree data structure to achieve concurrency control using synchronized and CAS to operate. The whole thing looks like an optimized, thread-safe HashMap. Although you can still see the Segment data structure in JDK1.8, the attributes have been simplified to make it compatible with older versions.
    • Hashtable(the same lock) : Using synchronized for thread safety is extremely inefficient. When one thread accesses the synchronous method, other threads also access the synchronous method, and may enter the blocking or polling state. For example, put is used to add elements. Another thread cannot use PUT to add elements, nor can it use GET.

The difference between Java fail-Fast and Fail-safe

Fail — fast

In use iterators iterate over a collection object, if the traversal process to modify the contents of the collection objects (add, delete, modify), will throw ConcurrentModificationException.

How it works: Iterators directly access the contents of a collection during traversal and use a modCount variable during traversal. If the contents of the collection change during traversal, the value of modCount is changed. Whenever the iterator iterates over the next element using hashNext()/next(), it checks whether the modCount variable is the expectedmodCount value and returns the traversal if it is; Otherwise, an exception is thrown and the traversal is terminated.

Note: The exception is thrown if modCount is detected! =expectedmodCount This condition. If the set changes when the modCount value is just set to the expectedmodCount value, the exception will not be thrown. Therefore, concurrent operations cannot be programmed depending on whether or not this exception is thrown; this exception is only recommended for detecting concurrent modification bugs.

Scenario: Collection classes in the java.util package fail fast and cannot be modified concurrently (iteratively) in multiple threads.

Fail-safe

The collection container using security failure mechanism is not directly accessed on the collection content in traversal, but copies the original collection content first and traverses the copied collection.

Principle: Since the copy of the original collection is iterated during iteration, the changes made to the original collection during iteration cannot be detected by the iterator, so Concurrent Modification Modification is not triggered.

Disadvantages: Copy-based has the advantage of avoiding Concurrent Modification Exceptions, but again, the iterator cannot access the modified content, i.e., iterators iterate over the copy of the collection at the beginning of the iteration, and the iterator is not aware of the changes that occurred during the iteration of the original collection.

Scenario: Containers under the java.util.concurrent package are security failures and can be used and modified concurrently in multiple threads.

Fast failures and safe failures are for iterators.

Fail fast: When iterating over a collection, ConcurrentModification is thrown if another thread is modifying the collection. Fail fast under java.util.

Security failure: a copy is made at layer 2 of the collection during iteration, so modifying elements at the upper level of the collection does not affect the lower level. Both are security failures under java.util.concurrent

How to avoidfail-fast

  • If you want to remove during a single-threaded walk, you can call the remove method of the iterator ListIterator instead of the remove method of the collection class. Look at the source code for the remove method of the iterator in ArrayList. This method cannot specify element deletion, but only remove the current iterator.
public void remove(a) {
    if (lastRet < 0)
        throw new IllegalStateException();
    checkForComodification();

    try {
        SubList.this.remove(lastRet);
        cursor = lastRet;
        lastRet = -1;
        expectedModCount = ArrayList.this.modCount;  //
    } catch (IndexOutOfBoundsException ex) {
        throw newConcurrentModificationException(); }}Copy the code
  • Use and send packets (java.util.concurrent) instead of ArrayList and hashMap
    • CopyOnWriterArrayList ArrayList instead
    • Instead of a HashMap ConcurrentHashMap

The difference between Iterator and Enumeration

In Java collections, we typically iterate through collections by “Iterator” or “Enumeration”.

public interface Enumeration<E> {
    boolean hasMoreElements(a);
    E nextElement(a);
}
Copy the code
public interface Iterator<E> {
    boolean hasNext(a);
    E next(a);
    void remove(a);
}
Copy the code
  • Function interfaces are different. Enumeration** has only two function interfaces. With Enumeration, we can only read the data of the collection, not modify it. Iterator has only three function interfaces. The **Iterator can read and delete data from collections.
  • Iterator supports fail-fast, but Enumeration does not. Enumeration is an interface added to JDK 1.0. Some of the functions that use it include the Vector and Hashtable classes, which were added in JDK 1.0 and whose purpose is to provide a traversal interface. Enumeration does not support synchronization itself, but it does support synchronization in Vector and Hashtable implementations. Iterator was added in JDK 1.2 to provide a traversal interface for collections like HashMap and ArrayList. Iterator supports fail-fast: When multiple threads operate on the contents of the same collection, fail-fast events can be generated

What is the difference between the Comparable and Comparator interfaces?

There are two ways to sort collection objects or arrays in Java:

  • Object implements the Comparable interface

    • Comparable, under the java.lang package, is an interface with only one internal method compareTo()

      public interface Comparable<T> {
          public int compareTo(T o);
      }
      Copy the code
    • Comparable allows the objects of the class implementing it to be compared according to the rules in the compareTo method. This order is called the natural order.

    • Lists or Arrays that implement the Comparable interface can be sorted using the collections.sort () or arrays.sort () methods

  • Define the Comparator and implement the Comparator interface

    • Comparator is an interface in the java.util package. JDK 1.8 had only two methods:

      public interface Comparator<T> {
          public int compare(T lhs, T rhs);
          public boolean equals(Object object);
      }
      Copy the code

Comparable is equivalent to an internal comparator. A comparator is equivalent to an external comparator

The difference between:

  • Comparator is under the java.util package, while Comparable is under the java.lang package

  • The Comparable interface is implemented inside a class. The Comparable interface is implemented outside a class. The Comparable interface is implemented outside a class. One is external implementation comparison.)

  • Implementing the Comparable interface overrides the compareTo method to implement the comparison in the compareTo method. An object or data that has implemented a Comparable class can be sorted via ** collections.sort (list) or arrays.sort (arr)**. Through the Collections. The sort (list, Collections. ReverseOrder ()) on the list are arranged in reverse chronological order.

  • To implement the Comparator, you need to override the compare method

HashSet

A HashSet is a collection class that stores no duplicate elements, and it is unordered. The internal implementation of HashSet is based on HashMap and implements the Set interface.

As you can see from the constructor provided by HahSet, all but the constructor for the last HashSet is to create a HashMap. There is no other operation. The last constructor is not public, so it is not public.

How does a HashSet check for repetitions

The bottom layer of a HashSet is actually a HashMap, but we implement the Set interface and treat the data as a K value, while the V value is always stored with the same virtual value. The K value of a HashMap itself is not allowed to repeat. And in a HashMap, if K/V is the same, the new V overwrites the old V and returns the old V.

What is the difference between Iterater and ListIterator?

  • We can use Iterator to iterate over sets and lists, whereas ListIterator can only iterate over lists
  • ListIterator has an add method that adds objects to a List, whereas Iterator does not
  • ListIterator and Iterator both have hasNext() and next() methods for backward traversal, whereas ListIterator has hasPrevious() and previous() methods for backward (forward) traversal. The Iterator can’t
  • ListIterator can locate the current index, nextIndex() and previousIndex() can do this. Iterator does not do this
  • ListIterator can delete objects, but ListIterator and set() can modify objects. The Iterator can only be iterated, but cannot be modified

Reference and thanks

All content is based on the source code reading and a variety of summing up before the knowledge of collation, input and output, Ollie to.

www.javatpoint.com/java-arrayl…

www.runoob.com/java/java-c…

www.javazhiyin.com/21717.html

Yuqirong. Me / 2018/01/31 /…

Youzhixueyuan.com/the-underly…

Detailed analysis of the HashMap source www.tianxiaobo.com/2018/01/18/…

“Analysis of ConcurrentHashMap1.7 source” www.cnblogs.com/chengxiao/p…

www.justdojava.com/2019/12/18/…