Java Collections Framework Internals

Authors

Introduction

There are many books and materials about the C++ Standard Template Library (STL), but little about the Java Collections Framework (JCF)*, It’s hard even to find a book dedicated to it, which can be a real headache for Java learners. I am deeply puzzled as to why. Although the JCF design references THE STL, it is positioned not as the Java VERSION of THE STL, but as a compact container framework, and an introduction to the STL is no substitute for an introduction to the JCF.

This series of articles mainly analyzes the List, Set, Map, Stack, Queue and other typical containers in JCF from the data structure and algorithm level, combined with vivid illustrations and source code, to help readers establish a clear and in-depth understanding of the Java collection framework. This article does not specifically cover the features of the Java language, but succinctly explains them when necessary.

Contents

The specific arrangement is as follows:

  1. Overview provides a basic introduction to the Java Collections Framework and features of the Java language.
  2. ArrayList explains ArrayList with source code.
  3. LinkedList combined with source code to explain LinkedList.
  4. Stack and Queue This section uses AarryDeque as an example to describe Stack and Queue.
  5. TreeSet and TreeMap with source code to explain TreeSet and TreeMap.
  6. HashSet and HashMap
  7. LinkedHashSet and LinkedHashMap combine source code to explain LinkedHashSet and LinkedHashMap.
  8. PriorityQueue combined with source code to explain PriorityQueue.
  9. WeakHashMap Makes a basic introduction to WeakHashMap.

An overview of

Containers are objects that can hold other Java objects. *Java Collections Framework (JCF) * Provides a common container for Java developers, starting with JDK 1.2, with the following benefits:

  • Reduce programming difficulty
  • Improve program performance
  • Improve interoperability between apis
  • Reduce the difficulty of learning
  • Reduce the difficulty of designing and implementing related apis
  • Increase program reuse

Java containers can only hold objects. Primitive types (int, Long, float, double, etc.) need to be wrapped as object types (Integer, Long, float, double, etc.) before they can be placed in the container. In many cases unpacking and unpacking can be done automatically. This leads to additional performance and space overhead, but simplifies design and programming.

Generics

Java container can hold any type of object, it is completed through generic mechanism on the surface, the Java generics are not something amazing, just the compiler to provide us with a “syntactic sugar”, generic itself does not need the support of the Java virtual machine, only need to do a simple string substitution can compile stage. In essence, Java’s single inheritance mechanism is the fundamental guarantee of this feature, because all objects are Object subclasses, as long as the container can store Object objects on the line. In fact, all containers contain objects inside. Generics simply simplify programming, and the compiler automatically casts them for us. JDK 1.4 and earlier versions do not support generics, and type conversions need to be done explicitly by the programmer.

/ / JDK 1.4 or before
ArrayList list = new ArrayList();
list.add(new String("Monday"));
list.add(new String("Tuesday"));
list.add(new String("Wensday"));
for(int i = 0; i < list.size(); i++){
    String weekday = (String)list.get(i);// Explicit type conversion
    System.out.println(weekday.toUpperCase());
}
Copy the code
/ / JDK 1.5 or latter
ArrayList<String> list = new ArrayList<String>();// Parameterize the type
list.add(new String("Monday"));
list.add(new String("Tuesday"));
list.add(new String("Wensday"));
for(int i = 0; i < list.size(); i++){
    String weekday = list.get(i);// Implicit type conversion, done automatically by the compiler
    System.out.println(weekday.toUpperCase());
}
Copy the code

Memory management

Unlike the complex memory management mechanisms of C++, the Java GC does everything automatically and Java programs don’t have to deal with memory headaches, so JCF doesn’t need a dedicated space adapter (alloctor) like C++ STL does. In addition, since Java objects are stored on the heap and can only be accessed by reference, the container actually contains the reference of the object rather than the object itself, so there is no copying problem of C++ containers.

Interfaces and Implementations

interface

To standardize the behavior and design of containers, JCF defines 14 collection interfaces. Their relationships are shown in the following figure:

Map
Collection
Map
Map
Collection
Map
Queue
Stack
Stack
Deque

implementation

The general implementation of the above interfaces is shown in the following table:

Implementations
Hash Table Resizable Array Balanced Tree Linked List Hash Table + Linked List
Interfaces Set HashSet TreeSet LinkedHashSet
List ArrayList LinkedList
Deque ArrayDeque LinkedList
Map HashMap TreeMap LinkedHashMap

In the following sections, the data structures of the containers in the above table and the algorithms used are described one by one.

Iterators

Like the C++ STL, JCF’s Iterator gives us a way to traverse the elements of a container. Only the container itself knows how the elements in the container are organized, so iterators can only be obtained from the container itself. Each container implements its own iterators in the form of inner classes. JCF iterators are easier to use than STL iterators.

//visit a list with iterator
ArrayList<String> list = new ArrayList<String>();
list.add(new String("Monday"));
list.add(new String("Tuesday"));
list.add(new String("Wensday"));
Iterator<String> it = list.iterator();// Get the iterator
while(it.hasNext()){
    String weekday = it.next();// Access elements
    System.out.println(weekday.toUpperCase());
}
Copy the code

JDK 1.5 introduced enhanced for loops to simplify writing when iterating over containers.

// Use enhanced for iteration
ArrayList<String> list = new ArrayList<String>();
list.add(new String("Monday"));
list.add(new String("Tuesday"));
list.add(new String("Wensday"));
for(String weekday : list){//enhanced for statement
	System.out.println(weekday.toUpperCase());
}
Copy the code

The source code

The JDK installation directory src.zip contains the Java Core API source code, this article uses JDK 1.7 U79 source code, download address.

ArrayList

General introduction

ArrayList implements the List interface. ArrayList is a sequential container, that is, elements hold the same order of data as they were put in, allowing null elements to be put in. This class is similar to a Vector except that it is not synchronized. Each ArrayList has a capacity, which represents the actual size of the underlying array. The number of elements stored in the container cannot exceed the current capacity. When you add elements to a container, if there is not enough capacity, the container automatically increases the size of the underlying array. As mentioned earlier, Java generics are just syntactic sugar provided by the compiler, so the array here is an Object array in order to be able to hold any type of Object.

The size(), isEmpty(), get() and set() methods can all be completed in constant time. The time cost of add() method is related to the insertion position, and the time cost of addAll() method is proportional to the number of added elements. The rest are mostly linear time.

For efficiency, ArrayList does not implement synchronized. If multiple threads need to access it concurrently, users can either manually synchronize or use Vector instead.

Methods analyze the

set()

Since the underlying set() method is an ArrayList, it is easy to simply assign a value to the specified location of the array.

public E set(int index, E element) {
    rangeCheck(index);// subscript out of bounds check
    E oldValue = elementData(index);
    elementData[index] = element;// Assign to the specified location and copy only the reference
    return oldValue;
}
Copy the code

get()

The get() method is also simple, the only thing to note is that since the underlying array is Object[], you need to cast the element.

public E get(int index) {
    rangeCheck(index);
    return (E) elementData[index];// Note the type conversion
}
Copy the code

add()

Unlike C++ vectors, arraylists do not have push_back(), which corresponds to add(E E). Arraylists also do not have insert(), which corresponds to add(int index, E E). Both methods add new elements to the container, which can lead to insufficient capacity, so before adding elements, you need to check the remaining space and automatically expand the capacity if necessary. The expansion operation is ultimately done with the grow() method.

private void grow(int minCapacity) {
    int oldCapacity = elementData.length;
    int newCapacity = oldCapacity + (oldCapacity >> 1);// 1.5 times as much as before
    if (newCapacity - minCapacity < 0)
        newCapacity = minCapacity;
    if (newCapacity - MAX_ARRAY_SIZE > 0)
        newCapacity = hugeCapacity(minCapacity);
    elementData = Arrays.copyOf(elementData, newCapacity);// Expand space and copy
}
Copy the code

Since the Java GC automatically manages memory, there is no need to worry about source array frees here.

Once the problem of space is solved, the insertion process is very simple.

Add (int index, E E) requires the element to be moved and then inserted, which means that the method has a linear time complexity.

addAll()

The addAll() method can add more than one element at a time. There are two versions, depending on the location, of the addAll(Collection
c) method, an addAll(int index, Collection
c) method. Similar to the add() method, space is checked before insertion and automatically expanded if necessary. If you insert from the specified position, you can also move elements. The time complexity of addAll() depends not only on how many elements are inserted, but also on where they are inserted.

remove()

The remove() method also has two versions, one is remove(int index) removes the element at the specified position, and the other is remove(Object O) removes the first element that satisfies O.squares (elementData[index]). The delete operation is the reverse of the add() operation, requiring the element after the delete point to be moved one position forward. Note that in order for GC to work, null must be explicitly assigned to the last position.

public E remove(int index) {
    rangeCheck(index);
    modCount++;
    E oldValue = elementData(index);
    int numMoved = size - index - 1;
    if (numMoved > 0)
        System.arraycopy(elementData, index+1, elementData, index, numMoved);
    elementData[--size] = null; // Clear references to that location and let GC work
    return oldValue;
}
Copy the code

A special note about the Java GC is that having a garbage collector does not necessarily mean that there will be no memory leaks. An object can be GC based on whether there is a reference to it. If you do not manually assign a null value in the above code, the original object will not be collected until the corresponding position is overwritten by another element.

LinkedList

General introduction

LinkedList implements both the List and Deque interfaces, meaning that it can be viewed as an order container, a Queue, or a Stack. In this way, LinkedList is an all-around champion. Consider using LinkedList when you need to use stacks or queues, partly because The Stack class is officially deprecated and, unfortunately, there is no class in Java called Queue (which is an interface name). For stacks or queues, the current preferred is ArrayDeque, which has better performance than LinkedList (when used as a stack or queue).

The bottom layer of LinkedList is realized by bidirectional LinkedList. This section will focus on the maintenance process of bidirectional LinkedList when inserting and deleting elements, that is, functions directly related to the List interface, while the knowledge related to Queue, Stack and Deque will be discussed in the next section. Each Node of a bidirectional linked list is represented by an inner class Node. LinkedList refers to the first and last element of the list via first and last references, respectively. Note that there is no dummy element here, where first and last point to NULL when the list is empty.

/ / the Node inner classes
private static class Node<E> {
    E item;
    Node<E> next;
    Node<E> prev;
    Node(Node<E> prev, E element, Node<E> next) {
        this.item = element;
        this.next = next;
        this.prev = prev; }}Copy the code

The way LinkedList is implemented dictates that all subscript related operations are linear time, while deleting elements at the beginning or end takes only constant time. In the pursuit of efficiency LinkedList no synchronization (synchronized), if you need multiple threads concurrent access, can first use the Collections. SynchronizedList () method on the packaging.

Methods analyze the

add()

There are two versions of the add() method, add(E E), which inserts elements at the end of the LinkedList because last points to the end of the list, and takes constant time to insert elements at the end. Simply change a few related references; The other method is add(int index, E element). This method inserts the element in the specified table. It needs to find the specific location through linear search, and then modify the reference to complete the insert operation.

Combining the above figure, you can see that the logic of add(E E) is very simple.

//add(E e)
public boolean add(E e) {
    final Node<E> l = last;
    final Node<E> newNode = new Node<>(l, e, null);
    last = newNode;
    if (l == null)
        first = newNode;// The list is empty. This is the first element to be inserted
    else
        l.next = newNode;
    size++;
    return true;
}
Copy the code

The logic of add(int index, E element) is slightly complicated and can be divided into two parts, 1. Insert position according to index; 2. Modify the reference and insert the reference successfully.

//add(int index, E element)
 public void add(int index, E element) {
        checkPositionIndex(index);//index >= 0 && index <= size;
        if (index == size)// The insertion position is at the end, including when the list is empty
            add(element);
        else {
            Node<E> succ = node(index);//1. Insert into index
            //2. Modify the reference to complete the insert operation.
            final Node<E> pred = succ.prev;
            final Node<E> newNode = new Node<>(pred, e, succ);
            succ.prev = newNode;
            if (pred == null)// Insert position 0
                first = newNode;
            elsepred.next = newNode; size++; }}Copy the code

The node(int index) function is a bit of a trick, because the list is two-way, and can be searched from the beginning to the end, depending on the condition index < (size >> 1), i.e. the index is near the front or back.

remove()

The remove() method also has two versions. One removes the first element equal to the specified element, remove(Object O), and the other removes the element at the specified subscript, remove(int index).

For both deletion operations, 1. Find the reference to the element to be deleted, and 2. Modify the reference to complete the deletion. Remove (Object O) calls equals (int index) and remove(int index) uses subscript counting to find references to deleted elements. Both methods are linear time. In Step 2, both remove() methods are done using the unlink(Node

x) method. Here we need to consider the boundary case if the deleted element is the first or last.

//unlink(Node
      
        x), delete a Node
      
E unlink(Node<E> x) {
    final E element = x.item;
    final Node<E> next = x.next;
    final Node<E> prev = x.prev;
    if (prev == null) {// Delete the first element
        first = next;
    } else {
        prev.next = next;
        x.prev = null;
    }
    if (next == null) {// Delete the last element
        last = prev;
    } else {
        next.prev = prev;
        x.next = null;
    }
    x.item = null;//let GC work
    size--;
    return element;
}
Copy the code

get()

Get (int index) gets a reference to the element at the specified index by calling the node(int index) method mentioned above.

public E get(int index) {
    checkElementIndex(index);//index >= 0 && index < size;
    return node(index).item;
}
Copy the code

set()

The set(int index, E element) method modifies the element at the specified subscript to the specified value.

public E set(int index, E element) {
    checkElementIndex(index);
    Node<E> x = node(index);
    E oldVal = x.item;
    x.item = element;// Replace the new value
    return oldVal;
}
Copy the code

Stack and Queue

preface

Java has a class called Stack, but no class called Queue (which is an interface name). Instead of using Stack, Java recommends the more efficient ArrayDeque. Since Queue is just an interface, ArrayDeque is preferred when using queues (LinkedList is preferred).

General introduction

To talk about stacks and queues, we’ll start with the Deque interface. A Deque stands for “double ended queue”, which can be used as either a stack or a queue. The following table lists the interfaces corresponding to a Deque and Queue:

Queue Method Equivalent Deque Method instructions
add(e) addLast(e) Inserts elements to the end of the queue, and throws an exception on failure
offer(e) offerLast(e) Inserts elements to the end of the queue, and returns on failurefalse
remove() removeFirst() Gets and deletes the first element of the queue, or throws an exception on failure
poll() pollFirst() Gets and deletes the header element, or returns on failurenull
element() getFirst() Gets, but does not delete, the first element of the queue, or throws an exception on failure
peek() peekFirst() Gets but does not delete the header element, or returns on failurenull

The following table lists the interfaces corresponding to the Deque and Stack:

Stack Method Equivalent Deque Method instructions
push(e) addFirst(e) Inserts elements to the top of the stack, and throws an exception on failure
There is no offerFirst(e) Inserts elements to the top of the stack, returns on failurefalse
pop() removeFirst() Gets and removes the top element on the stack, or throws an exception on failure
There is no pollFirst() Gets and deletes the top element on the stack, or returns on failurenull
peek() peekFirst() Gets, but does not remove, the top element on the stack, or throws an exception on failure
There is no peekFirst() Gets but does not remove the top element on the stack, or returns on failurenull

The above two tables together define 12 interfaces to the Deque. There are two sets of interfaces for adding, deleting, and setting values. They have the same functions, but the difference is that they handle failure cases differently. One set of interfaces throws an exception on failure, the other returns a special value (false or NULL) on failure. Unless an implementation has a capacity limit, add operations will not fail in most cases. Although the Deque has as many as 12 interfaces, it does nothing more than add, delete, or view both ends of the container. And once you understand that, it’s really easy to explain.

ArrayDeque and LinkedList are two general implementations of Deque. Since AarryDeque is officially recommended to be used as a stack and queue, and LinkedList has been explained in the previous article, this article will focus on the specific implementation of ArrayDeque.

As the name implies, the underlying implementation of ArrayDeque is an array. To satisfy the requirement that elements can be inserted or deleted at both ends of an array, the array must also be circular, that is, circular array, that is, any point in the array can be regarded as a starting point or an ending point. ArrayDeque is not thread-safe, requiring manual synchronization by programmers when multiple threads are in use at the same time. In addition, the container does not allow null elements.

As you can see in the figure above, head points to the first valid element at the end, and tail points to the first void at the end where an element can be inserted. Head is not always equal to 0, and tail is not always greater than head.

Methods analyze the

addFirst()

AddFirst (E E) inserts elements at the beginning of the Deque, that is, before the head. If there is enough space and subscripts are not out of bounds, just insert elements[–head] = E.

The actual need to consider: 1. Whether the space is enough, and 2. In the figure above, if addFirst() is called after head is 0, there is enough free space, but head is -1, and the subscript is out of bounds. The following code solves both of these problems nicely.

//addFirst(E e)
public void addFirst(E e) {
    if (e == null)// Null is not allowed
        throw new NullPointerException();
    elements[head = (head - 1) & (elements.length - 1)] = e;//2. Whether the subscript is out of bounds
    if (head == tail)//1. Is there enough space
        doubleCapacity();/ / capacity
}
Copy the code

As we saw in the code above, the space problem is solved after insertion, because tail always points to the next pluggable space, which means the Elements array has at least one space, so you don’t have to worry about space when inserting elements.

Head = (head – 1) & (elements. Length-1); If head-1 is negative (it can only be -1), the length of the binary element must be an exponential of 2. Is equivalent to taking the complement of elements. Length.

Let’s talk about the doubleCapacity() function. The logic is to apply for a larger array (twice as large as the original one) and then copy it over. The process is shown in the figure below:

In the figure, we can see that there are two copies: the first copy of the element to the right of the head and the second copy of the element to the left of the head.

//doubleCapacity()
private void doubleCapacity(a) {
    assert head == tail;
    int p = head;
    int n = elements.length;
    int r = n - p; // The number of elements to the right of head
    int newCapacity = n << 1;// Double the original space
    if (newCapacity < 0)
        throw new IllegalStateException("Sorry, deque too big");
    Object[] a = new Object[newCapacity];
    System.arraycopy(elements, p, a, 0, r);// Copy the right half, corresponding to the green part in the figure above
    System.arraycopy(elements, 0, a, r, p);// Copy the left half, corresponding to the gray part in the figure above
    elements = (E[])a;
    head = 0;
    tail = n;
}
Copy the code

addLast()

Elements [tail] = E; addLast(E) inserts elements at the end of the Deque. Can. Check the space after insertion. If the space is used up, call doubleCapacity() to expand it.

public void addLast(E e) {
    if (e == null)// Null is not allowed
        throw new NullPointerException();
    elements[tail] = e;/ / assignment
    if ( (tail = (tail + 1) & (elements.length - 1)) == head)// subscript out of bounds
        doubleCapacity();/ / capacity
}
Copy the code

The out-of-bounds handling of subscripts is described in addFirt() and will not be repeated.

pollFirst()

PollFirst () removes and returns the first element of the Deque, which is the element at the head position. If the container is not empty, simply return elements[head], and of course deal with subscripts. Since null is not allowed in ArrayDeque, elements[head] == null means that the container is empty.

public E pollFirst(a) {
    E result = elements[head];
    if (result == null)// A null value means that the deque is empty
        return null;
    elements[h] = null;//let GC work
    head = (head + 1) & (elements.length - 1);// subscript out of bounds
    return result;
}
Copy the code

pollLast()

PollLast () removes and returns the element at the end of the Deque, that is, the element before the tail position.

public E pollLast(a) {
    int t = (tail - 1) & (elements.length - 1);// The position above tail is the last element
    E result = elements[t];
    if (result == null)// A null value means that the deque is empty
        return null;
    elements[t] = null;//let GC work
    tail = t;
    return result;
}
Copy the code

peekFirst()

PeekFirst () returns elements[head] at the beginning of the Deque, but does not delete them.

public E peekFirst(a) {
    return elements[head]; // elements[head] is null if deque empty
}
Copy the code

peekLast()

PeekLast () returns but does not remove the element at the end of the Deque, that is, the element before the tail position.

public E peekLast(a) {
    return elements[(tail - 1) & (elements.length - 1)];
}
Copy the code

TreeSet and TreeMap

General introduction

TreeSet and TreeMap are presented together because they have the same implementation in Java, and the former is just a wrapper around the latter, meaning that TreeSet has a TreeMap (adapter pattern) in it **. Therefore, this article will focus on TreeMap analysis.

Java TreeMap implements the SortedMap interface, that is, the elements in the Map are ordered according to the key order. The key size can be determined by the natural ordering. You can also use the Comparator passed in at construction time.

The underlying TreeMap is implemented using a red-black tree, which means that containsKey(), get(), PUT (), and remove() all have log(n) time complexity. Its specific algorithm implementation refers to “Introduction to Algorithms”.

For performance reasons, TreeMap is not synchronized and requires manual synchronization by the programmer if it needs to be used in a multithreaded environment. Alternatively, TreeMap can be wrapped synchronically as follows:

SortedMap m = Collections.synchronizedSortedMap(new TreeMap(...) );

A red-black tree is an approximately balanced binary search tree that ensures that the height difference between the left and right subtrees of any node is no more than twice that of the lower of the two. In particular, a red-black tree is a binary search tree that satisfies the following conditions:

  1. Each node is either red or black.
  2. The root node must be black
  3. The red node cannot be consecutive (i.e., neither the child nor the father of the red node can be red).
  4. For each node, from this point tonullAny path at the end of the tree has the same number of black nodes.

When the structure of the tree changes (insert or delete operation), condition 3 or condition 4 above will be broken, and the search tree needs to be adjusted to meet the red-black tree constraint again.

Preliminary knowledge

As mentioned above, when the structure of the search tree changes, the constraint conditions of the red black tree may be destroyed. It is necessary to adjust the search tree to meet the constraint conditions of the red black tree again. Adjustment can be divided into two types: one is color adjustment, that is, to change the color of a node; The other is structural adjustment, where sets change the structural relationship of the retrieval tree. The structural adjustment process consists of two basic operations: left-rotation and right-rotation.

left-handed

The process of left-handed rotation is to rotate the right subtree of X counterclockwise around x so that the right subtree of X becomes the parent of X, while modifying references to relevant nodes. After the rotation, the properties of the binary search tree are still satisfied.

The left-handed code in TreeMap is as follows:

//Rotate Left
private void rotateLeft(Entry<K,V> p) {
    if(p ! =null) {
        Entry<K,V> r = p.right;
        p.right = r.left;
        if(r.left ! =null)
            r.left.parent = p;
        r.parent = p.parent;
        if (p.parent == null)
            root = r;
        else if (p.parent.left == p)
            p.parent.left = r;
        elsep.parent.right = r; r.left = p; p.parent = r; }}Copy the code

Right hand

The process of dextral rotation is to rotate the left subtree of X clockwise around x so that the left subtree of X becomes the parent of X, while modifying references to related nodes. After the rotation, the properties of the binary search tree are still satisfied.

The right-handed code in TreeMap is as follows:

//Rotate Right
private void rotateRight(Entry<K,V> p) {
    if(p ! =null) {
        Entry<K,V> l = p.left;
        p.left = l.right;
        if(l.right ! =null) l.right.parent = p;
        l.parent = p.parent;
        if (p.parent == null)
            root = l;
        else if (p.parent.right == p)
            p.parent.right = l;
        elsep.parent.left = l; l.right = p; p.parent = l; }}Copy the code

Look for node successors

For a binary lookup tree, given a node t, the successor (the smallest element in the tree with a ratio greater than t) can be found as follows:

  1. If t’s right subtree is not empty, then t’s successor is the smallest element in its right subtree.
  2. If t’s right child is empty, then t’s successor is its first ancestor that went left.

Subsequent nodes will be used in the deletion operation of the red-black tree.

The following code is used to find nodes in TreeMap:

// Find the node successor function ()
static <K,V> TreeMap.Entry<K,V> successor(Entry<K,V> t) {
    if (t == null)
        return null;
    else if(t.right ! =null) {// If the right subtree of t is not empty, then t's successor is the smallest element in the right subtree
        Entry<K,V> p = t.right;
        while(p.left ! =null)
            p = p.left;
        return p;
    } else {// 2. If t's right child is empty, then t's successor is its first left ancestor
        Entry<K,V> p = t.parent;
        Entry<K,V> ch = t;
        while(p ! =null && ch == p.right) {
            ch = p;
            p = p.parent;
        }
        returnp; }}Copy the code

Methods analyze the

get()

The get(Object key) method returns the corresponding value based on the specified key value. This method calls getEntry(Object key) to get the corresponding entry, and then returns entry.value. So getEntry() is the heart of the algorithm. The algorithm idea is to search the binary search tree according to the natural order (or comparator order) of keys until an entry satisfying k.compareTo(P.key) == 0 is found.

The specific code is as follows:

/ / the getEntry () method
final Entry<K,V> getEntry(Object key) {...if (key == null)// The key value cannot be null
        throw new NullPointerException();
    Comparable<? super K> k = (Comparable<? super K>) key;// Use the natural order of elements
    Entry<K,V> p = root;
    while(p ! =null) {
        int cmp = k.compareTo(p.key);
        if (cmp < 0)/ / to the left to find
            p = p.left;
        else if (cmp > 0)/ / to the right
            p = p.right;
        else
            return p;
    }
    return null;
}
Copy the code

put()

The put(K key, V value) method adds the specified key and value pair to the map. This method first does a lookup on the map to see if it contains the tuple and returns it if it does, similar to the getEntry() method. If not, a new entry is inserted into the red-black tree, and if the constraint of the red-black tree is broken after insertion, adjustments are required (rotation, changing the color of some nodes).

public V put(K key, V value) {...int cmp;
    Entry<K,V> parent;
    if (key == null)
        throw new NullPointerException();
    Comparable<? super K> k = (Comparable<? super K>) key;// Use the natural order of elements
    do {
        parent = t;
        cmp = k.compareTo(t.key);
        if (cmp < 0) t = t.left;/ / to the left to find
        else if (cmp > 0) t = t.right;/ / to the right
        else return t.setValue(value);
    } while(t ! =null);
    Entry<K,V> e = new Entry<>(key, value, parent);// Create and insert a new entry
    if (cmp < 0) parent.left = e;
    else parent.right = e;
    fixAfterInsertion(e);/ / adjust
    size++;
    return null;
}
Copy the code

The insertion part of the code above is not hard to understand: first find the appropriate position in the red-black tree, then create a new entry and insert it (the newly inserted node must be a leaf of the tree, of course). The difficulty is the insertion function fixAfterInsertion(). Change the color of some nodes. 2. Rotate some nodes.

The insertion function fixAfterInsertion() uses the rotateLeft() and rotateRight() functions. We can see from the code that case 2 actually falls inside case 3. Cases 4 to 6 are symmetric in the first three cases, so the last three cases are not drawn in the diagram, and readers can refer to the code to understand.

// fixAfterInsertion()
private void fixAfterInsertion(Entry<K,V> x) {
    x.color = RED;
    while(x ! =null&& x ! = root && x.parent.color == RED) {if (parentOf(x) == leftOf(parentOf(parentOf(x)))) {
            Entry<K,V> y = rightOf(parentOf(parentOf(x)));
            if (colorOf(y) == RED) {
                setColor(parentOf(x), BLACK);              / / 1
                setColor(y, BLACK);                        / / 1
                setColor(parentOf(parentOf(x)), RED);      / / 1
                x = parentOf(parentOf(x));                 / / 1
            } else {
                if (x == rightOf(parentOf(x))) {
                    x = parentOf(x);                       / / 2
                    rotateLeft(x);                         / / 2
                }
                setColor(parentOf(x), BLACK);              / / 3
                setColor(parentOf(parentOf(x)), RED);      / / 3
                rotateRight(parentOf(parentOf(x)));        / / 3}}else {
            Entry<K,V> y = leftOf(parentOf(parentOf(x)));
            if (colorOf(y) == RED) {
                setColor(parentOf(x), BLACK);              / / 4
                setColor(y, BLACK);                        / / 4
                setColor(parentOf(parentOf(x)), RED);      / / 4
                x = parentOf(parentOf(x));                 / / 4
            } else {
                if (x == leftOf(parentOf(x))) {
                    x = parentOf(x);                       / / 5
                    rotateRight(x);                        / / 5
                }
                setColor(parentOf(x), BLACK);              / / 6
                setColor(parentOf(parentOf(x)), RED);      / / 6
                rotateLeft(parentOf(parentOf(x)));         / / 6
            }
        }
    }
    root.color = BLACK;
}
Copy the code

remove()

Remove (Object key) is used to delete the entry corresponding to the key value. This method first uses the getEntry(Object key) method mentioned above to find the entry corresponding to the key value. Then call deleteEntry(Entry

Entry) to delete the corresponding Entry. Because deleting changes the structure of the red-black tree, potentially breaking the constraints of the red-black tree, adjustments may be required.
,v>

The getEntry() function was explained earlier. Here we focus on deleteEntry(). This function deletes the specified entry and calls fixAfterDeletion(entry

x) when the red-black tree constraint is destroyed.
,v>

Since the red-black tree is an enhanced binary search tree, the deletion operation of the red-black tree is very similar to the deletion operation of the ordinary binary search tree, the only difference is that the red-black tree may need to be adjusted after the node deletion. Now consider the deletion process of an ordinary binary search tree, which can be simply divided into two cases:

  1. Both the left and right subtrees of p are empty, or only one subtree is not empty.
  2. The left and right subtrees of delete point P are non-empty.

For the above case 1, it is relatively simple to directly delete P (when both left and right subtrees are empty), or replace P with non-empty subtrees (when only one subtree is non-empty). For case 2, p can be replaced by s (the smallest element in the tree greater than x), and then s can be deleted using case 1 (where S must satisfy case 1). You can draw pictures).

Based on the above logic, the red-black tree node deletion function deleteEntry() looks like this:

// Red-black tree entry deletion function deleteEntry()
private void deleteEntry(Entry<K,V> p) {
    modCount++;
    size--;
    if(p.left ! =null&& p.right ! =null) {// 2. The left and right subtrees of delete point p are not empty.
        Entry<K,V> s = successor(p);/ / the subsequentp.key = s.key; p.value = s.value; p = s; } Entry<K,V> replacement = (p.left ! =null ? p.left : p.right);
    if(replacement ! =null) {// 1. Delete point p only one subtree is not empty.
        replacement.parent = p.parent;
        if (p.parent == null)
            root = replacement;
        else if (p == p.parent.left)
            p.parent.left  = replacement;
        else
            p.parent.right = replacement;
        p.left = p.right = p.parent = null;
        if (p.color == BLACK)
            fixAfterDeletion(replacement);/ / adjust
    } else if (p.parent == null) {
        root = null;
    } else { // 1. The left and right subtrees of delete point p are empty
        if (p.color == BLACK)
            fixAfterDeletion(p);/ / adjust
        if(p.parent ! =null) {
            if (p == p.parent.left)
                p.parent.left = null;
            else if (p == p.parent.right)
                p.parent.right = null;
            p.parent = null; }}}Copy the code

The above code, which takes up a large number of lines of code, is used to modify the reference relationship between parent and child nodes, and the logic is not hard to understand. The following focuses on the function fixAfterDeletion(). First of all, think about what points were deleted that would cause the adjustment? The tuning function is triggered only if the delete point is BLACK, because removing the RED node does not break any constraints on the red-black tree, whereas removing the BLACK node breaks rule 4.

Just like in the fixAfterInsertion() function, it’s divided into cases. Remember, no matter how many cases there are, there are only two specific adjustments: 1. Change the color of some nodes, 2. Rotate some nodes.

The general idea of the diagram above is to first convert case 1 to case 2, or to case 3 and case 4. Of course, the diagram does not imply that the adjustment process necessarily begins with case 1. A). If case 2 is entered immediately after case 1, it must exit the loop after case 2 (because x is red); B). Once cases 3 and 4 are entered, the loop must exit (because x is root).

The specific code of fixAfterDeletion() is as follows, which uses the rotateLeft() and rotateRight() functions mentioned above. We can see from the code that case 3 actually falls inside case 4. Cases 5 to 8 are symmetric in the first four cases, so the last four cases are not drawn in the diagram, and readers can refer to the code to understand.

private void fixAfterDeletion(Entry<K,V> x) {
    while(x ! = root && colorOf(x) == BLACK) {if (x == leftOf(parentOf(x))) {
            Entry<K,V> sib = rightOf(parentOf(x));
            if (colorOf(sib) == RED) {
                setColor(sib, BLACK);                   / / 1
                setColor(parentOf(x), RED);             / / 1
                rotateLeft(parentOf(x));                / / 1
                sib = rightOf(parentOf(x));             / / 1
            }
            if (colorOf(leftOf(sib))  == BLACK &&
                colorOf(rightOf(sib)) == BLACK) {
                setColor(sib, RED);                     / / 2
                x = parentOf(x);                        / / 2
            } else {
                if (colorOf(rightOf(sib)) == BLACK) {
                    setColor(leftOf(sib), BLACK);       / / 3
                    setColor(sib, RED);                 / / 3
                    rotateRight(sib);                   / / 3
                    sib = rightOf(parentOf(x));         / / 3
                }
                setColor(sib, colorOf(parentOf(x)));    / / 4
                setColor(parentOf(x), BLACK);           / / 4
                setColor(rightOf(sib), BLACK);          / / 4
                rotateLeft(parentOf(x));                / / 4
                x = root;                               / / 4}}else { // The first four cases are symmetric
            Entry<K,V> sib = leftOf(parentOf(x));
            if (colorOf(sib) == RED) {
                setColor(sib, BLACK);                   / / 5
                setColor(parentOf(x), RED);             / / 5
                rotateRight(parentOf(x));               / / 5
                sib = leftOf(parentOf(x));              / / 5
            }
            if (colorOf(rightOf(sib)) == BLACK &&
                colorOf(leftOf(sib)) == BLACK) {
                setColor(sib, RED);                     / / 6
                x = parentOf(x);                        / / 6
            } else {
                if (colorOf(leftOf(sib)) == BLACK) {
                    setColor(rightOf(sib), BLACK);      / / case 7
                    setColor(sib, RED);                 / / case 7
                    rotateLeft(sib);                    / / case 7
                    sib = leftOf(parentOf(x));          / / case 7
                }
                setColor(sib, colorOf(parentOf(x)));    / / is 8
                setColor(parentOf(x), BLACK);           / / is 8
                setColor(leftOf(sib), BLACK);           / / is 8
                rotateRight(parentOf(x));               / / is 8
                x = root;                               / / is 8
            }
        }
    }
    setColor(x, BLACK);
}
Copy the code

TreeSet

As mentioned earlier, TreeSet is a simple wrapper around TreeMap, and all function calls to TreeSet are converted to appropriate TreeMap methods, so TreeSet implementation is simple. I won’t repeat it here.

// TreeSet is a simple wrapper around TreeMap
public class TreeSet<E> extends AbstractSet<E>
    implements NavigableSet<E>, Cloneable.java.io.Serializable
{...private transient NavigableMap<E,Object> m;
    // Dummy value to associate with an Object in the backing Map
    private static final Object PRESENT = new Object();
    public TreeSet(a) {
        this.m = new TreeMap<E,Object>();// There is a TreeMap in TreeSet}...public boolean add(E e) {
        return m.put(e, PRESENT)==null; }... }Copy the code

HashSet and HashMap

General introduction

HashSet and HashMap are explained together because they have the same implementation in Java, and the former is just a wrapper around the latter, meaning that a HashSet has a HashMap (adapter pattern) inside it. Therefore, this article will focus on HashMap.

HashMap implements the Map interface, which allows elements with a null key and a null value to be inserted. This class is similar to a Hashtable except that it is not synchronized. Unlike TreeMap, this container does not guarantee the order of the elements, and the elements may be rehashed as needed, so the order of iterating over the same HashMap may be different at different times. Hash tables can be implemented in two ways, depending on how conflicts are handled. One is Open Addressing, and the other is Separate chaining with linked lists. The Java HashMap uses a collision linked list approach.

It is easy to see from the figure above that the put() and get() methods can be completed in constant time if the appropriate hash function is chosen. But when iterating over a HashMap, you need to traverse the entire table and any conflicting linked lists that follow. Therefore, it is not advisable to set the initial size of HashMap too large for scenarios with frequent iterations.

There are two parameters that affect the performance of a HashMap: inital capacity and load factor. The initial capacity specifies the initial table size, and the load factor specifies the threshold for automatic expansion. When the number of entries exceeds capacity*load_factor, the container automatically expands and hashes again. For scenarios with a lot of inserts, setting the initial capacity to be large can reduce the number of rehashes.

When putting objects into a HashMap or HashSet, there are two methods of special concern: hashCode() and equals(). The hashCode() method determines which bucket objects will be placed in, and when multiple objects have conflicting hash values, equals() determines whether they are “the same object.” So, if you want to put custom objects into a HashMap or HashSet, you need the * @override *hashCode() and equals() methods.

Methods analyze the

get()

The get(Object key) method returns the corresponding value based on the specified key value. This method calls getEntry(Object key) to get the corresponding entry, and then returns entry.getValue(). So getEntry() is the heart of the algorithm. The idea is to first hash() the index of the corresponding bucket, and then iterate through the conflicting list to determine whether it is the desired entry using key.equals(k).

hash(k)&(table.length-1)
hash(k)%table.length
HashMap
table.length
table.length-1
hash(k)

/ / the getEntry () method
final Entry<K,V> getEntry(Object key) {...int hash = (key == null)?0 : hash(key);
    for (Entry<K,V> e = table[hash&(table.length-1)];// Get the conflicting linked liste ! =null; e = e.next) {// Iterate through each entry in the conflicting list
        Object k;
        // Determine equality according to equals()
        if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k))))
            return e;
    }
    return null;
}
Copy the code

put()

The put(K key, V value) method adds the specified key and value pair to the map. This method first does a lookup on the map to see if it contains the tuple and returns it if it does, similar to the getEntry() method. If no entry is found, the addEntry(int hash, K key, V value, int bucketIndex) method is used to insert a new entry. The insertion method is header insertion.

//addEntry()
void addEntry(int hash, K key, V value, int bucketIndex) {
    if ((size >= threshold) && (null! = table[bucketIndex])) { resize(2 * table.length);// Automatically expand and rehash
        hash = (null! = key) ? hash(key) :0;
        bucketIndex = hash & (table.length-1);//hash%table.length
    }
    // Insert a new entry into the head of the conflicting list
    Entry<K,V> e = table[bucketIndex];
    table[bucketIndex] = new Entry<>(hash, key, value, e);
    size++;
}
Copy the code

remove()

The remove(Object key) function is to delete the entry corresponding to the key value. The logic of this method is implemented in removeEntryForKey(Object Key). The removeEntryForKey() method first finds the entry corresponding to the key value and then deletes it (modifying the corresponding reference to the linked list). The search procedure is similar to the getEntry() procedure.

//removeEntryForKey()
final Entry<K,V> removeEntryForKey(Object key) {...int hash = (key == null)?0 : hash(key);
    int i = indexFor(hash, table.length);//hash&(table.length-1)
    Entry<K,V> prev = table[i];// Get the conflicting linked list
    Entry<K,V> e = prev;
    while(e ! =null) {// Iterate over the conflicting linked list
        Entry<K,V> next = e.next;
        Object k;
        if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k)))) {// Find the entry to delete
            modCount++; size--;
            if (prev == e) table[i] = next;// Delete the first entry in the conflicting list
            else prev.next = next;
            return e;
        }
        prev = e; e = next;
    }
    return e;
}
Copy the code

HashSet

As mentioned earlier, a HashSet is a simple wrapper around a HashMap, and every function call to a HashSet is converted to a proper HashMap method, so the implementation of a HashSet is very simple, with less than 300 lines of code. I won’t repeat it here.

//HashSet is a simple wrapper around HashMap
public class HashSet<E>
{...private transient HashMap<E,Object> map;// Inside the HashSet there is a HashMap
    // Dummy value to associate with an Object in the backing Map
    private static final Object PRESENT = new Object();
    public HashSet(a) {
        map = newHashMap<>(); }...public boolean add(E e) {// Simple method conversion
        return map.put(e, PRESENT)==null; }... }Copy the code

LinkedHashSet and LinkedHashMap

General introduction

If you’ve looked at HashSet and HashMap, and TreeSet and TreeMap, you can probably imagine that LinkedHashSet and LinkedHashMap are the same thing. LinkedHashSet and LinkedHashMap have the same implementation in Java, the former is just a wrapper around the latter, that is, LinkedHashSet has a LinkedHashMap (adapter pattern) **. Therefore, this article focuses on the LinkedHashMap.

LinkedHashMap implements the Map interface, which allows elements with a null key and a null value to be inserted. As you can see from the name, this container is a hybrid of Linked List and HashMap, meaning that it meets some of the features of both HashMap and Linked List. Think of LinkedHashMap as a HashMap enhanced with Linked List.

In fact, LinkedHashMap is a direct subclass of HashMap. The only difference between the two is that LinkedHashMap connects all entries in the form of doubly-linked list based on HashMap. This is to ensure that elements are iterated in the same order as they were inserted. The figure above shows the structure of LinkedHashMap. The main body of LinkedHashMap is exactly the same as HashMap, with more headers pointing to the head of the bidirectional list (a dummy element). The iteration order of the bidirectional list is the insertion order of entries.

In addition to preserving the order of the iteration, this structure has another advantage: Iteration of LinkedHashMap does not need to traverse the entire table as a HashMap does. Instead, it only needs to traverse the two-way linked list pointed to by the header. In other words, the iteration time of LinkedHashMap is only related to the number of entries, not the size of the table.

There are two parameters that affect LinkedHashMap performance: inital Capacity and load factor. The initial capacity specifies the initial table size, and the load factor specifies the threshold for automatic expansion. When the number of entries exceeds capacity*load_factor, the container automatically expands and hashes again. For scenarios with a lot of inserts, setting the initial capacity to be large can reduce the number of rehashes.

There are two methods of special concern when putting objects into a LinkedHashMap or LinkedHashSet: hashCode() and equals(). The hashCode() method determines which bucket objects will be placed in, and when multiple objects have conflicting hash values, equals() determines whether they are “the same object.” So, if you want to put a custom object into a LinkedHashMap or LinkedHashSet, you need the * @override *hashCode() and equals() methods.

To get a LinkedHashMap in the same order as the source Map iteration:

void foo(Map m) {
    Map copy = newLinkedHashMap(m); . }Copy the code

LinkedHashMap is not synchronized for performance reasons and requires manual synchronization by the programmer if used in a multithreaded environment; Or wrap the LinkedHashMap as wrapped synchronously as follows:

Map m = Collections.synchronizedMap(new LinkedHashMap(...) );

Methods analyze the

get()

The get(Object key) method returns a value based on the specified key value. This method is almost identical to the flow of the hashmap.get () method, which you can refer to for your own reference.

put()

The put(K key, V value) method adds the specified key and value pair to the map. This method first does a lookup on the map to see if it contains the tuple, and returns it if it does, similar to the get() method. If no entry is found, the addEntry(int Hash, K key, V value, int bucketIndex) method is used to insert a new entry.

Note that the insertion here has two meanings:

  1. fromtableThe Angle of view is newentryYou need to insert into the correspondingbucketIn, when there is a hash conflict, use the header method to newentryInsert into the head of the conflict list.
  2. fromheaderThe Angle of view is newentryIt needs to be inserted at the end of the bidirectional list.

The addEntry() code looks like this:

// LinkedHashMap.addEntry()
void addEntry(int hash, K key, V value, int bucketIndex) {
    if ((size >= threshold) && (null! = table[bucketIndex])) { resize(2 * table.length);// Automatically expand and rehash
        hash = (null! = key) ? hash(key) :0;
        bucketIndex = hash & (table.length-1);// hash%table.length
    }
    // 1. Insert a new entry into the head of the conflicting list
    HashMap.Entry<K,V> old = table[bucketIndex];
    Entry<K,V> e = new Entry<>(hash, key, value, old);
    table[bucketIndex] = e;
    // 2. Insert a new entry at the end of the bidirectional list
    e.addBefore(header);
    size++;
}
Copy the code

The code above uses the addBefore() method to insert the new entry E before the header reference so that e becomes the last element in the bidirectional list. The code for addBefore() looks like this:

/ / LinkedHashMap. Entry. AddBefor (), insert this to the front of the existingEntry
private void addBefore(Entry<K,V> existingEntry) {
    after  = existingEntry;
    before = existingEntry.before;
    before.after = this;
    after.before = this;
}
Copy the code

The above code simply modifies the reference to the relevant entry.

remove()

The remove(Object key) function is to delete the entry corresponding to the key value. The logic of this method is implemented in removeEntryForKey(Object Key). The removeEntryForKey() method first finds the entry corresponding to the key value and then deletes it (modifying the corresponding reference to the linked list). The search process is similar to the get() method.

Note that deletion here also has two meanings:

  1. fromtableFrom the point of view of the need to beentryFrom the correspondingbucketIf the corresponding conflict list is not empty, the corresponding reference to the conflict list needs to be modified.
  2. fromheaderFrom the point of view of the need to beentryRemoves from a bidirectional list, and modifies references to preceding and following elements in the list.

The code for removeEntryForKey() is as follows:

/ / LinkedHashMap. RemoveEntryForKey (), remove the key value corresponding to the entry
final Entry<K,V> removeEntryForKey(Object key) {...int hash = (key == null)?0 : hash(key);
    int i = indexFor(hash, table.length);// hash&(table.length-1)
    Entry<K,V> prev = table[i];// Get the conflicting linked list
    Entry<K,V> e = prev;
    while(e ! =null) {// Iterate over the conflicting linked list
        Entry<K,V> next = e.next;
        Object k;
        if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k)))) {// Find the entry to delete
            modCount++; size--;
            // 1. Delete e from the conflict list for the corresponding bucket
            if (prev == e) table[i] = next;
            else prev.next = next;
            // 2. Delete e from the bidirectional list
            e.before.after = e.after;
            e.after.before = e.before;
            return e;
        }
        prev = e; e = next;
    }
    return e;
}
Copy the code

LinkedHashSet

As mentioned above, LinkedHashSet is a simple wrapper for LinkedHashMap. Function calls to LinkedHashSet are converted to the appropriate LinkedHashMap method, so the implementation of LinkedHashSet is very simple and will not be described here.

public class LinkedHashSet<E>
    extends HashSet<E>
    implements Set<E>, Cloneable.java.io.Serializable {...// LinkedHashSet contains a LinkedHashMap
    public LinkedHashSet(int initialCapacity, float loadFactor) {
        map = newLinkedHashMap<>(initialCapacity, loadFactor); }...public boolean add(E e) {// Simple method conversion
        return map.put(e, PRESENT)==null; }... }Copy the code

LinkedHashMap classic usage

In addition to ensuring iteration order, LinkedHashMap has a very useful use: it is easy to implement a cache with a FIFO replacement strategy. More specifically, LinkedHashMap has a subclass method, protected Boolean removeEldestEntry(map.Entry

Younger), which tells the Map whether to remove the “oldest” Entry, The oldest element is the earliest Entry in the current Map. If this method returns true, the oldest element is deleted. LinkedHashMap automatically asks removeEldestEntry() whether to remove the oldest element after each new element is inserted. By simply overloading the method in subclasses and making removeEldestEntry() return true when there are more than a certain number of elements, you can implement a fixed-size FIFO policy cache. Example code is as follows:
,v>

/** A fixed size FIFO replacement policy cache */
class FIFOCache<K.V> extends LinkedHashMap<K.V>{
    private final int cacheSize;
    public FIFOCache(int cacheSize){
        this.cacheSize = cacheSize;
    }

    // Delete the oldest Entry when the number of entries exceeds cacheSize
    @Override
    protected boolean removeEldestEntry(Map.Entry<K,V> eldest) {
       returnsize() > cacheSize; }}Copy the code

PriorityQueue

General introduction

In the case of Stack and Queue, a Java ArrayDeque, there is a special Queue called PriorityQueue. The function of priority queues is to ensure that each fetched element has the lowest weight in the queue (Java priority queues take the smallest element, C++ priority queues take the largest element). This is where size comes in. The size of an element can be determined by the natural ordering of the elements, or by a Comparator passed in during construction.

Java PriorityQueue implements the Queue interface and does not allow null elements. It is implemented via a heap, specifically a small top heap via a complete binary tree (the weight of any non-leaf node is no greater than the weight of its left and right children), which means that an array can be used as the underlying implementation of PriorityQueue.

In the figure above, we have numbered each element in the way of sequential traversal. If you are careful enough, you will find that there is a correlation between the number of the parent node and that of the child node. Specifically, there is a relationship between the number of the parent node and that of the child node:

leftNo = parentNo*2+1

rightNo = parentNo*2+2

parentNo = (nodeNo-1)/2

With the above three formulas, it is easy to calculate the subscripts of the parent and child nodes of a node. This is why you can store the heap directly in arrays.

Peek () and Element operations for PriorityQueue are constant time, while add(), Offer (), remove() without arguments, and poll() all have log(N) time complexity.

Methods analyze the

The add () and offer ()

Add (E E) and Offer (E E) have the same semantics. They both insert elements into the priority Queue, but the Queue interface specifies that they are handled differently when the insert fails. The former raises an exception when the insert fails, and the latter returns false. There’s really no difference between these two methods for PriorityQueue.

The addition of new elements may destroy the nature of the small top heap, so necessary adjustments are required.

//offer(E e)
public boolean offer(E e) {
    if (e == null)// Null elements are not allowed
        throw new NullPointerException();
    modCount++;
    int i = size;
    if (i >= queue.length)
        grow(i + 1);// Automatic capacity expansion
    size = i + 1;
    if (i == 0)// The queue was empty, which was the first element to be inserted
        queue[0] = e;
    else
        siftUp(i, e);/ / adjust
    return true;
}
Copy the code

In the code above, the grow() function is similar to the grow() function in the ArrayList. It obtains a larger array and copies the elements of the original array, which is not described here. Note the siftUp(int k, E x) method, which is used to insert element X and maintain the characteristics of the heap.

//siftUp()
private void siftUp(int k, E x) {
    while (k > 0) {
        int parent = (k - 1) > > >1;//parentNo = (nodeNo-1)/2
        Object e = queue[parent];
        if (comparator.compare(x, (E) e) >= 0)// Call the comparison method of the comparator
            break;
        queue[k] = e;
        k = parent;
    }
    queue[k] = x;
}
Copy the code

The addition of element X may break the nature of the small top heap, so it needs to be adjusted. The adjustment process is as follows: from the position specified by k, x is compared and swapped layer by layer with the parent of the current point until x >= queue[parent] is satisfied. Note that the comparison can be either a natural order of elements or a comparator order.

Element () and peek ()

Element () and peek() are semantically identical in that they both get but do not remove the first element of the queue, the element with the least weight in the queue. The only difference is that the former throws an exception when the method fails, while the latter returns NULL. According to the properties of the small top heap, the element at the top of the heap is the smallest globally; Since the heap is represented as an array, the element at subscript 0 is both the top element of the heap according to the subscript relationship. So just return the element at the index of array 0.

The code is very succinct:

//peek()
public E peek(a) {
    if (size == 0)
        return null;
    return (E) queue[0];// The element at the index 0 is the smallest element
}
Copy the code

Remove () and poll ()

The semantics of the remove() and poll() methods are identical, fetching and deleting the first element of the queue, except that the former throws an exception when the method fails, while the latter returns NULL. Because deletion changes the structure of the queue, necessary adjustments are required to maintain the nature of the small top heap.

public E poll(a) {
    if (size == 0)
        return null;
    int s = --size;
    modCount++;
    E result = (E) queue[0];// The element at the index 0 is the smallest element
    E x = (E) queue[s];
    queue[s] = null;
    if(s ! =0)
        siftDown(0, x);/ / adjust
    return result;
}
Copy the code

This code first records the element at 0 subscript, replaces the element at 0 subscript with the last one, adjusts the heap by calling siftDown(), and finally returns the original element at 0 subscript (the smallest element). The focus is on the siftDown(int k, E x) method, which starts at the position specified by K and swaps x layer by layer with the smaller of the left and right children of the current point until x is less than or equal to either of the left and right children.

//siftDown()
private void siftDown(int k, E x) {
    int half = size >>> 1;
    while (k < half) {
    	// First find the younger of the left and right children, record it in c, and record its subscript with child
        int child = (k << 1) + 1;//leftNo = parentNo*2+1
        Object c = queue[child];
        int right = child + 1;
        if (right < size &&
            comparator.compare((E) c, (E) queue[right]) > 0)
            c = queue[child = right];
        if (comparator.compare(x, (E) c) <= 0)
            break;
        queue[k] = c;// Then replace the original value with c
        k = child;
    }
    queue[k] = x;
}
Copy the code

remove(Object o)

The remove(Object O) method is used to remove an element in a Queue that is equal to o (if there are multiple equal elements, only one is removed). This method is not a method in the Queue interface, but a method in the Collection interface. Because deleting changes the queue structure, it is adjusted; And because the location of the deleted element can be arbitrary, the adjustment process is a bit more tedious than other functions. Specifically, remove(Object O) can be divided into two cases: 1. The last element is deleted. Delete directly, do not need to adjust. 2. Instead of deleting the last element, call siftDown() once from the deletion point, referencing the last element. I will not repeat it here.

The specific code is as follows:

//remove(Object o)
public boolean remove(Object o) {
	// Find the first subscript that satisfies o.dice (queue[I]) by iterating through the array
    int i = indexOf(o);
    if (i == -1)
        return false;
    int s = --size;
    if (s == i) //情况1
        queue[i] = null;
    else {
        E moved = (E) queue[s];
        queue[s] = null;
        siftDown(i, moved);//情况2. }return true;
}
Copy the code

WeakHashMap

General introduction

At the end of the Java Collections Framework series, I’m going to introduce a special member: WeakHashMap, which, as its name suggests, is some kind of Map. It is special because entries in a WeakHashMap may be automatically deleted by GC, even if the programmer does not call the remove() or clear() methods.

More intuitively, when using WeakHashMap, even if no element is added or deleted as shown, the following situation may occur:

  • Call twicesize()Methods return different values;
  • Two callsisEmpty()Method, returns the first timefalse, return the second timetrue;
  • Two callscontainsKey()Method, returns the first timetrue, return the second timefalse“Even though it was the same onekey;
  • Two callsget()Method that returns one the first timevalue, return the second timenullEven though you’re using the same object twice.

Do you think users will go crazy with such a bizarre phenomenon? This feature of the WeekHashMap is particularly useful for scenarios that require caching. In the cache scenario, all objects cannot be cached because memory is limited. Object cache hits can improve system efficiency, but cache misses also don’t cause errors because they can be computed back.

To understand how the WeekHashMap works, you need to introduce another concept: WeakReference. We all know that memory in Java is managed automatically by GC, which determines which objects can be reclaimed during application execution and frees memory when appropriate. The GC determines whether an object is recyclable based on whether there is a valid reference to the object. If there is no valid reference to the object (which basically means there is no way to access it), then the object is recyclable. ** “valid references” here do not include weak references **. That is, although weak references can be used to access objects, weak references are not taken into account for garbage collection, and only objects referred to by weak references are still collected by GC.

WeakHashMap internally manages entries through weak references. What does the property of weak references mean to WeakHashMap? Placing a pair of keys and values in a WeakHashMap does not prevent the key value from being collected by GC unless there is a strong reference to the key outside of a WeakHashMap.

The concepts of strong references and weak references will be discussed later, but it is enough to know that Java references are categorized and that different types of references have different effects on GC.

The specific implementation

The storage structure of WeakHashMap is similar to that of HashMap. Readers can refer to the previous paragraphs and will not repeat them here.

The blogger will explain the management of strong and weak references separately.

Weak HashSet?

If you’ve read the previous articles on maps and sets, you might be wondering: If there is a WeekHashMap, is there a WeekHashSet? The answer is no :(. But Java Collections utility class solution is given, and the Collections. NewSetFromMap (Map < E, Boolean > Map) method, any Map can be packaged into a Set. You can quickly get an Weak HashSet as follows:

// Wrap WeakHashMap as a Set
Set<Object> weakHashSet = Collections.newSetFromMap(
        new WeakHashMap<Object, Boolean>());
Copy the code

As you might expect, the newSetFromMap() method simply wraps the Map passed in:

/ / Collections. NewSetFromMap () is used to any Map packaged as a Set
public static <E> Set<E> newSetFromMap(Map<E, Boolean> map) {
    return new SetFromMap<>(map);
}

private static class SetFromMap<E> extends AbstractSet<E>
    implements Set<E>, Serializable
{
    private final Map<E, Boolean> m;  // The backing map
    private transient Set<E> s;       // Its keySet
    SetFromMap(Map<E, Boolean> map) {
        if(! map.isEmpty())throw new IllegalArgumentException("Map is non-empty");
        m = map;
        s = map.keySet();
    }
    public void clear(a)               {        m.clear(); }
    public int size(a)                 { return m.size(); }
    public boolean isEmpty(a)          { return m.isEmpty(); }
    public boolean contains(Object o) { return m.containsKey(o); }
    public boolean remove(Object o)   { returnm.remove(o) ! =null; }
    public boolean add(E e) { return m.put(e, Boolean.TRUE) == null; }
    public Iterator<E> iterator(a)     { return s.iterator(); }
    public Object[] toArray()         { return s.toArray(); }
    public <T> T[] toArray(T[] a)     { return s.toArray(a); }
    public String toString(a)          { return s.toString(); }
    public int hashCode(a)             { return s.hashCode(); }
    public boolean equals(Object o)   { return o == this || s.equals(o); }
    public boolean containsAll(Collection
        c) {returns.containsAll(c); }public boolean removeAll(Collection
        c)   {returns.removeAll(c); }public boolean retainAll(Collection
        c)   {returns.retainAll(c); }// addAll is the only inherited implementation. }Copy the code

conclusion

Now that the Java Collections Framework Internals series is complete, I hope this blog post has helped you build a basic understanding of the Java Container Framework.

The blogger would be very happy if he could help you at least a little! If there are any flaws and fallacies in the blog, I welcome you to correct them.