This paper briefly introduces the composition structure and characteristics of our most common set from the following aspects:

  1. List
  2. Set
  3. Map
  4. Queue

Note that the implementation arrows are used in this article to represent inheritance relationships, and the dotted arrows represent implementation relationships: –> represents inheritance relationships –> represents implementation interfaces

First, the complexity of data structures

There is also a fast and slow way of storing data structures, called time complexity O(N?). Of course, the data structure has its own spatial complexity O(N), which will not be introduced here. The following two articles will introduce time complexity in detail

Time complexity of data structures

Data Structures and Algorithms (Java)

Ii. Composition of Collection

classDiagram
Iterable <|-- Collection

Collection <|-- List
Collection <|.. AbstractCollection
Collection <|-- Set
Collection <|.. ArraySet
 
AbstractCollection <|-- AbstractList
AbstractCollection <|-- AbstractQueue
AbstractCollection <|-- AbstractSet

List <|.. CopyOnWriteArrayList
List <|.. AbstractList
 
AbstractList <|-- ArrayList
AbstractList <|-- Vector
AbstractList <|-- AbstractSequentialList
 
AbstractSequentialList <|-- LinkedList
 
Vector <|-- Stack

Set <|.. AbstractSet
Set <|-- SortSet

AbstractSet <|-- HashSet
AbstractSet <|-- TreeSet
AbstractSet <|-- EnumSet
HashSet <|-- LinkedHashSet
EnumSet <|-- JumboEnumSet

Third, the List

Set structure: All elements belong to a group and have no relationship other than belonging to the same set.

classDiagram

Collection <|-- List
Collection <|-- Queue

Queue <|-- Deque

RandomAccess <|.. ArrayList
RandomAccess <|.. CopyOnWriteArrayList
RandomAccess <|.. Vector

Cloneable <|.. ArrayList
Cloneable <|.. CopyOnWriteArrayList
Cloneable <|.. Vector
Cloneable <|.. LinkedList
 
Serializable <|.. ArrayList
Serializable <|.. CopyOnWriteArrayList
Serializable <|.. Vector
Serializable <|.. LinkedList

List <|.. ArrayList
List <|.. CopyOnWriteArrayList
List <|.. Vector
List <|.. LinkedList

Deque <|.. LinkedList

Vector\Stack

The internal structure of a Vector is made up of arrays and is thread-safe. AddElement () and removeElement are both synchronous methods.

Public class Vector<E> extends List<E> implements List<E>, RandomAccess, Cloneable, Serializable { Protected Object[] elementData; protected int capacityIncrement; // capacityIncement = 0 to double the capacity of the old array. Private void grow(int minCapacity) {// overflow-conscious code oldCapacity = elementdata.length; int newCapacity = oldCapacity + ((capacityIncrement > 0) ? capacityIncrement : oldCapacity); ... elementData = Arrays.copyOf(elementData, newCapacity); } public synchronized E firstElement() { if (elementCount == 0) { throw new NoSuchElementException(); } return elementData(0); } public synchronized E lastElement() { if (elementCount == 0) { throw new NoSuchElementException(); } return elementData(elementCount - 1); } public synchronized void addElement(E obj) {modCount++; ensureCapacityHelper(elementCount + 1); elementData[elementCount++] = obj; }}Copy the code

A Stack inherits from a Vector, and its data structure is an array like a Vector, so it is thread-safe. The initial capacity and expansion methods are the same as for Vector.

The stack of three important methods, FILO (first in, last out) a set of structures

public E push(E item) {
    addElement(item);
    return item;
}

public synchronized E pop() {
    E obj;
    int len = size();
    obj = peek();
    removeElementAt(len - 1);
    return obj;
}

public synchronized E peek() {
    int len = size();
    if (len == 0)
        throw new EmptyStackException();
    return elementAt(len - 1);
}
Copy the code
ArrayList
  • The bottom layer is stored in arrays
  • When created, the default value is assignedelementDataAn empty array
  • When adding a value, the default assigned initial capacity length is 10
  • Size = size + size >> 1 (size = 1.5 * size)
  • Size refers to the number of elements in the set. Capacity refers to the number of contiguous memory addresses allocated for each element.
  • In C, array names are treated as Pointers. More specifically, an array name is a pointer to the address of the first element of the array, and an array index is the offset from the address of the first element of the array.
CopyOnWriteArrayList

Its internal data structure is also an array, which is very similar to the idea of ReentrantReadWriteLock read-write lock: read share, write mutually exclusive, read/write mutually exclusive, write/read mutually exclusive.

  • Unlike ArrayList, COWAL has no default value, and there is no scaling
  • Each time you add data, you create a new array, copy the old array, and then add the new value
  • Each time data is deleted, the other elements of the old array are copyOf into the new array using Arrays.copyof
  • In this way, the reads and writes are completely separate, and when you read and write at the same time, you’re reading the old array and writing the new array

Therefore, there are two problems:

  • Memory footprint: If a CopyOnWriteArrayList is constantly adding, deleting, and modifying data in it, often performing add(), set(), or remove(), that’s a lot of memory. Because we know that each add(), set(), and remove() operation copies an array.

  • Data consistency: The CopyOnWrite container only ensures final data consistency, not real-time data consistency. You can also see this in the example above, where thread A iterates over CopyOnWriteArrayList data. Thread B has modified the CopyOnWriteArrayList portion of the data between iterations of thread A (setArray() has already been called). But thread A iterates over the old data.

public class CopyOnWriteArrayList<E> implements List<E>, RandomAccess, Cloneable, Java.io.Serializable {// previously ReentrantLock, now object lock. final transient Object lock = new Object(); public E get(int index) { return get(getArray(), index); } public boolean add(E e) { synchronized (lock) { Object[] elements = getArray(); int len = elements.length; Object[] newElements = Arrays.copyOf(elements, len + 1); newElements[len] = e; setArray(newElements); return true; }}}Copy the code
LinkedList
public class LinkedList<E> extends AbstractSequentialList<E> implements List<E>, Deque<E>, Cloneable, Java.io.Serializable {// Header <E> first; <E> last; Private static class Node<E> {E item; // next Node<E> next; // Node<E> prev; Node(Node<E> prev, E element, Node<E> next) { this.item = element; this.next = next; this.prev = prev; Public E peek() {final Node<E> f = first; return (f == null) ? null : f.item; } public E Element () {return getFirst(); Public E poll() {final Node<E> f = first; public E poll() {final Node<E> f = first; return (f == null) ? null : unlinkFirst(f); Public Boolean offer(E E) {return add(E); Public Boolean offerFirst(E E) {addFirst(E); return true; Public Boolean offerLast(E E) {addLast(E); return true; Public E peekFirst() {final Node<E> f = first; return (f == null) ? null : f.item; Public E peekLast() {final Node<E> l = last; return (l == null) ? null : l.item; Public pollFirst() {final Node<E> f = first; return (f == null) ? null : unlinkFirst(f); Public pollLast() {final Node<E> l = last; return (l == null) ? null : unlinkLast(l); Public void push(E E) {addFirst(E); } public E pop() {return removeFirst(); } // get(int index) get(int index); Node<E> Node (int index) {if (index < (size >> 1)) {Node<E> x = first; for (int i = 0; i < index; i++) x = x.next; return x; } else { Node<E> x = last; for (int i = size - 1; i > index; i--) x = x.prev; return x; }} public Boolean add(E E) {linkLast(E); return true; }}Copy the code

As you can see from the code above, LinkedList can implement a variety of data structures:

  • Regular linked lists
  • PollFirst, push, pollLast, add: pollFirst, push, add: pollFirst, push, add: pollFirst, push, add
  • Stack (push, pop) FIFO

Fourth, the Set

classDiagram
SortSet <|-- NavigableSet
NavigableSet <|.. TreeSet

Cloneable <|.. HashSet
Cloneable <|.. LinkedHashSet
Cloneable <|.. TreeSet

Serializable <|.. HashSet
Serializable <|.. LinkedHashSet
Serializable <|.. TreeSet

Set <|.. HashSet
Set <|.. LinkedHashSet
Set <|.. ArraySet

A brief introduction to the underlying data structure of set.

ArraySet

When performing add and remove operations, ArraySet operates on mHashes of type int[] and mArray of type Object[], where mHashes stores the hash value of each element of mArray, and mHashes correspond to elements with the same subscript of mArray one by one. That is, it has two arrays, one containing the hashcode location of obj and the other containing the corresponding location of OBj.

The open addressing method is used to resolve hash conflicts. Search backward first, search forward again if search fails. ArraySet also contains a lot of knowledge about expansion, caching (freeArrays, allocArrays), which you can check out in the following article. ArraySet adds and removes element analysis

HashSet
public class HashSet<E>
    extends AbstractSet<E>
    implements Set<E>, Cloneable, java.io.Serializable
{
    private transient HashMap<E,Object> map;
    private static final Object PRESENT = new Object();

    public HashSet() {
        map = new HashMap<>();
    }

    public HashSet(Collection<? extends E> c) {
        map = new HashMap<>(Math.max((int) (c.size()/.75f) + 1, 16));
        addAll(c);
    }

    HashSet(int initialCapacity, float loadFactor, boolean dummy) {
        map = new LinkedHashMap<>(initialCapacity, loadFactor);
    }

    public boolean contains(Object o) {
        return map.containsKey(o);
    }

    public boolean add(E e) {
        return map.put(e, PRESENT)==null;
    }
Copy the code

From the simplified code above, you can see that a HashSet stores values using a HashMap or LinkedHashMap. Map. put(key,value) returns the old value to check whether it is duplicate data.

Fifth, the Map

classDiagram
Map <|-- ConcurrentMap
Map <|.. AbstractMap
Map <|-- SortMap
Map <|.. Dictionary
Map <|.. ArrayMap

ConcurrentMap <|-- ConcurrentHashMap
ConcurrentMap <|-- LocalCache

AbstractMap <|-- ConcurrentHashMap
AbstractMap <|-- LocalCache
AbstractMap <|-- HashMap
AbstractMap <|-- WeakHashMap
AbstractMap <|-- IdentityHashMap
AbstractMap <|-- EnumMap
AbstractMap <|-- TreeMap

Dictionary <|-- HashTable

HashMap <|-- LinkedHashMap

SortMap <|-- NavigableMap
NavigableMap <|.. TreeMap
classDiagram
Map <|.. HashMap
Map <|.. WeakHashMap
Map <|.. LinkedHashMap
Map <|.. IdentityHashMap
Map <|.. HashTable

Cloneable <|.. HashMap
Cloneable <|.. HashTable
Cloneable <|.. TreeMap
Cloneable <|.. IdentityHashMap
Cloneable <|.. EnumMap

Serializable <|.. HashMap
Serializable <|.. HashTable
Serializable <|.. TreeMap
Serializable <|.. IdentityHashMap
Serializable <|.. EnumMap
HashMap
  • TableSizeFor (initialCapacity). Change the initial value to >=initialCapacity 2^n
  • When the initialCapacity value is not passed, the default capacity is 16 and the load factor is 0.75
  • Adjust the array size (when the number of elements in the container is greater than capacity * loadfactor, the container will be expanded. Resize is 2n)
  • Call the hash(K) method to compute the hash value of K,
  • The list is converted to a red-black tree when the collision results in a list greater than TREEIFY_THRESHOLD = 8, and the red-black tree is converted to a linked list when the number of lists is reduced to 6.
  • A HashMap allows at most one record to have a null key and multiple records to have a NULL value.
WeakHashMap

WeakHashMap is used to store data that can be recovered, and Entry node is used to store K and V, which inherit from WeakReference and can be recovered during GC.

Internal ReferenceQueue is used to judge the data being recovered.

Default values and expansions are similar to HashMap, except that there is no red-black tree, just a linked list.

LinkedHashMap
  • Inheriting from HashMap, accessing data is the same as HashMap.
  • AccessOrder, which indicates whether to access in order or to insert in order. The default value is false.

HashMap

void afterNodeAccess(Node<K,V> p) { }
void afterNodeInsertion(boolean evict) { }
void afterNodeRemoval(Node<K,V> p) { }
Copy the code

LinkedHashMap

static class LinkedHashMapEntry<K,V> extends HashMap.Node<K,V> { LinkedHashMapEntry<K,V> before, after; ... } Node<K,V> newNode(int hash, K key, V value, Node<K,V> e) { LinkedHashMapEntry<K,V> p = new LinkedHashMapEntry<K,V>(hash, key, value, e); linkNodeLast(p); return p; } void afterNodeAccess(Node<K,V> e) { // move node to last LinkedHashMapEntry<K,V> last; if (accessOrder && (last = tail) ! = e) { LinkedHashMapEntry<K,V> p = (LinkedHashMapEntry<K,V>)e, b = p.before, a = p.after; p.after = null; if (b == null) head = a; else b.after = a; if (a ! = null) a.before = b; else last = b; if (last == null) head = p; else { p.before = last; last.after = p; } tail = p; ++modCount; }}Copy the code

AfterNodeAccess () is called after the insert, and is then joined to the back and forth nodes. In this way, the LinkedHashMap not only stores the data in the same way as the original HashMap, but also forms a linked list for the whole data, which can balance the lookup and use the linked list function.

LruCache
public class LruCache<K, V> { @UnsupportedAppUsage private final LinkedHashMap<K, V> map; private int size; private int maxSize; Public LruCache(int maxSize) {if (maxSize <= 0) {throw new IllegalArgumentException("maxSize <= 0"); } this.maxSize = maxSize; This. map = new LinkedHashMap<K, V>(0, 0.75f, true); } // Adjust the capacity to remove the data that exceeds the capacity. public void resize(int maxSize) { if (maxSize <= 0) { throw new IllegalArgumentException("maxSize <= 0"); } synchronized (this) { this.maxSize = maxSize; } trimToSize(maxSize); } public final V get(K key) { if (key == null) { throw new NullPointerException("key == null"); } V mapValue; Synchronized (this) {mapValue = map.get(key); if (mapValue ! = null) { hitCount++; return mapValue; } missCount++; } V createdValue = create(key); if (createdValue == null) { return null; } synchronized (this) { createCount++; mapValue = map.put(key, createdValue); if (mapValue ! = null) { // There was a conflict so undo that last put map.put(key, mapValue); } else { size += safeSizeOf(key, createdValue); } } if (mapValue ! = null) { entryRemoved(false, key, createdValue, mapValue); return mapValue; } else { trimToSize(maxSize); return createdValue; } } public void trimToSize(int maxSize) { while (true) { K key; V value; synchronized (this) { if (size < 0 || (map.isEmpty() && size ! = 0)) { throw new IllegalStateException(getClass().getName() + ".sizeOf() is reporting inconsistent results!" ); If (size <= maxSize) {break; } // header, which needs to be removed map.entry <K, V> toEvict = map.eldest(); if (toEvict == null) { break; } key = toEvict.getKey(); value = toEvict.getValue(); // map.remove(key); size -= safeSizeOf(key, value); evictionCount++; } entryRemoved(true, key, value, null); }}}Copy the code
IdentityHashMap

HashMap

static final int hash(Object key) {
    int h;
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}
Copy the code

IdentityHashMap

private static int hash(Object x, int length) {
    int h = System.identityHashCode(x);
    // Multiply by -127, and left-shift to use least bit as part of hash
    return ((h << 1) - (h << 8)) & (length - 1);
}
Copy the code

The two computes different hash values.

HashTable

Index = (hash & 0x7fffff) % tab.length

Unlike a HashMap, a Hashtable does not allow a null key or value.

The hash of a HashMap is a bit more complicated than that of a Hashtable.

In the process of inserting K/V pairs in HashMap, it is always inserted first and then checked whether expansion is needed. Hashtable is inserted after checking whether it needs to be expanded.

Hashtable differs from HashMap in that its PUT operation is thread-safe.

(1). HashMap and Hashtable have different implementation templates: Although they both implement the Map interface, Hashtable inherits from Dictionary class, while HashMap inherits from AbstractMap. Dictionary is the abstract parent of any class that maps keys to corresponding values, and AbstractMap is an implementation based on the Map interface to minimize the effort required to implement this interface.

(2). HashMap and Hashtable have different restrictions on key values: A HashMap can allow a null key and any null value, but neither key nor value in a Hashtable can be null.

(3). The thread-safety of HashMap and Hashtable is different: the method of Hashtable is synchronous, implementing thread-safe Map; The methods of HashMap are not synchronous and are a non-thread-safe implementation of the Map.

(4). The status of HashMap and Hashtable is different: In concurrent environments, Hashtable is thread-safe, but it is generally not recommended because ConcurrentHashMap is a more efficient and better alternative. In a single-threaded environment, HashMap is more efficient than Hashtable (its operations are all synchronized, making it inefficient), so it’s not necessary.

ArrayMap

When ArrayMap performs add and remove operations, it operates mHashes of type int[] and mArray of type Object[], where mHashes stores the hash value of each element of mArray, and mHashes correspond to elements with the same subscript of mArray one by one. That is, there are two arrays, one containing the hashcode location of the key and the other containing the node (key, value, etc.).

The open addressing method is used to resolve hash conflicts. Search backward first, search forward again if search fails. ArraySet also contains a lot of knowledge about expansion, caching (freeArrays, allocArrays), which you can check out in the following article.

TreeMap

It’s a red-black tree:

  1. Each node can only be red or black
  2. The root node is black
  3. Each leaf node (NIL node, empty node) is black.
  4. If one node is red, both of its children are black. That is, you can’t have two red nodes next to each other on a path.
  5. All paths from any node to each of its leaves contain the same number of black nodes.

Comparable to the user-supplied comparator, the Comparable key value is also required to perform interpolation comparisons.

Public V put(K key, V value) {...... if (key == null) throw new NullPointerException(); @SuppressWarnings("unchecked") Comparable<? super K> k = (Comparable<? super K>) key; do { parent = t; cmp = k.compareTo(t.key); if (cmp < 0) t = t.left; else if (cmp > 0) t = t.right; else return t.setValue(value); } while (t ! = null); } TreeMapEntry<K,V> e = new TreeMapEntry<>(key, value, parent); if (cmp < 0) parent.left = e; else parent.right = e; fixAfterInsertion(e); ... }Copy the code

Interpolate first, then rotate and color again. The default color of the inserted node is red.

Blog.csdn.net/chenssy/art…

Sixth, the Queue

classDiagram

Collection <|.. AbstractCollection
Collection <|-- Queue

AbstractCollection <|-- AbstractQueue
AbstractCollection <|-- ConcurrentLinkedDeque
AbstractCollection <|-- ArrayDeque

BlockingQueue <|-- SynchronousQueue
BlockingQueue <|-- ArrayBlockingQueue
BlockingQueue <|-- LinkedBlockingQueue
BlockingQueue <|-- PriorityBlockingQueue

AbstractQueue <-- SynchronousQueue
AbstractQueue <-- ArrayBlockingQueue
AbstractQueue <-- LinkedBlockingQueue
AbstractQueue <-- PriorityBlockingQueue
AbstractQueue <-- PriorityQueue

Queue <|.. AbstractQueue
Queue <|-- Deque

Deque <|.. LinkedList
Deque <|.. ArrayDeque
Deque <|.. ConcurrentLinkedDeque

Images fromTalk about LinkedBlockingQueue

ArrayDeque

Array form implementation of the double – ended queue, has the following characteristics:

  • The initial capacity is 16
  • DoubleCapacity, as the name suggests, doubles each capacity
  • AllocateElements, which initializes capacity, converts initialCapacity to a number greater than numElements of 2^n, similar to but different from hashMap.
  • Double-endian queues can also be used as stacks, with methods like push and POP.
ConcurrentLinkedDeque

ConcurrentLinkedDeque uses a spin +CAS non-blocking algorithm to ensure data consistency when concurrently accessed by threads. Since the queue itself is a double-linked list structure, the algorithm looks very simple, but in fact, it needs to consider all kinds of concurrent situations and the implementation complexity is high. Moreover, ConcurrentLinkedDeque does not have real-time data consistency. In practical application, if a thread-safe stack structure is needed, You can use ConcurrentLinkedDeque.

www.lanxinbase.com/?p=2037

PriorityQueue
  • The underlying structure uses arrays to hold data
  • The initial default capacity is 11
  • OldCapacity + ((oldCapacity < 64)? (oldCapacity + 2) : (oldCapacity >> 1))
  • Permissions implement the Comparable interface by comparing the incoming Comparator or the value itself
  • This is done through the small top heap

The numbers of the parent and child nodes are as follows:

leftNo = parentNo*2+1

rightNo = parentNo*2+2

parentNo = (nodeNo-1)/2

Build a small top heap and insert data to float up

@SuppressWarnings("unchecked")
private void siftUpComparable(int k, E x) {
    Comparable<? super E> key = (Comparable<? super E>) x;
    while (k > 0) {
        int parent = (k - 1) >>> 1;
        Object e = queue[parent];
        if (key.compareTo((E) e) >= 0)
            break;
        queue[k] = e;
        k = parent;
    }
    queue[k] = key;
}
Copy the code

Delete data, sink

// k = 0; The last digit in the x array. @SuppressWarnings("unchecked") private void siftDownComparable(int k, E x) { Comparable<? super E> key = (Comparable<? super E>)x; int half = size >>> 1; Int child = (k << 1) + 1; int child = (k << 1) + 1; int child = (k << 1) + 1; // assume left child is least Object c = queue[child]; int right = child + 1; If (right < size && ((Comparable<? super E>) c).compareTo((E) queue[right]) > 0) c = queue[child = right]; if (key.compareTo((E) c) <= 0) break; queue[k] = c; // replace k = child with c; } queue[k] = key; }Copy the code

Binary heap is a binary tree, is a complete binary tree, the most intuitive expression of a binary tree is at most 1 layer deeper than the right side, binary heap we often discuss is the big top heap and small top heap, the big top heap has the largest root node, the left and right nodes are recursive, the small top heap is similar

Binary heap is an important data structure. In practice, binary heap is involved in heap sorting, and it is also the basis of priority queue

Complete binary trees follow certain size rules to adjust their own characteristics

Blog.csdn.net/u010623927/…

Blog.csdn.net/u011240877/…

Blocking queue

There are applications in thread pools.

www.cnblogs.com/tjudzj/p/44…

public E poll(long timeout, TimeUnit unit) throws InterruptedException { long nanos = unit.toNanos(timeout); final ReentrantLock lock = this.lock; lock.lockInterruptibly(); Try {// If there is no message in the queue, wait for the nanOS duration, otherwise fetch the message immediately. while (count == 0) { if (nanos <= 0L) return null; nanos = notEmpty.awaitNanos(nanos); } return dequeue(); } finally { lock.unlock(); }}Copy the code
public boolean offer(E e) { if (e == null) throw new NullPointerException(); final AtomicInteger count = this.count; if (count.get() == capacity) return false; int c = -1; Node<E> node = new Node<E>(e); // Insert lock final ReentrantLock putLock = this.putLock; putLock.lock(); try { if (count.get() < capacity) { enqueue(node); c = count.getAndIncrement(); if (c + 1 < capacity) notFull.signal(); } } finally { putLock.unlock(); } if (c == 0) signalNotEmpty(); return c >= 0; }Copy the code
private void signalNotEmpty() { final ReentrantLock takeLock = this.takeLock; takeLock.lock(); Try {// Release lock notempty.signal (); } finally { takeLock.unlock(); }}Copy the code

conclusion

This is the collection of information I summarized in the process of preparing for the interview. It briefly introduces the basic structure of various data structures. All data structures are arrays and linked lists, and the data structures only extend on these two structures.

One of the interviewers asked me about the interface that ArrayList implements, so you know they’re looking at how well you know ArrayList. ArrayList implements Cloneable, Serializable, RandomAccess, and List. From these points, ArrayList can be serialized, clone and generate random numbers.

All collections implement Serializable, and all collections except BlockingQueue implement Cloneable. In the study of data structure, we should not only learn its data structure composition principle, but also learn its inheritance class and implementation interface, so as to better understand the characteristics of data structure.

This summary will continue to be updated.