One, foreword

Java
Collection
Map

1.1 the Collection

A Collection is an abstract interface to a List and Set that contains the basic operations for those collections.

(1) List

The List interface typically represents a List (array, queue, LinkedList, stack, etc.) whose elements can be repeated. Common implementation classes are ArrayList, LinkedList, and Vector.

(2) Set

The Set interface usually represents a collection, whose elements are not allowed to be duplicated (guaranteed by hashCode and equals). Common implementation classes include HashSet and TreeSet. HashSet is implemented using a HashMap in a Map. TreeSet is implemented through TreeMap in Map. In addition, TreeSet also implements SortedSet interface, so it is an ordered set.

(3) The difference between List and Set

  • SetInterfaces store unordered, non-repetitive data
  • ListInterfaces store ordered, repeatable data
  • SetLow retrieval efficiency, high efficiency of delete and insert, insert and delete will not cause element position change.
  • ListHigh efficiency in finding elements, low efficiency in deleting and inserting elements,ListLike an array, it can grow dynamically, automatically depending on the size of the actual storageListThe length of the.

(4) Design patterns used

Abstract classes AbstractCollection, AbstractList, and AbstractSet implement the Collection, List, and Set interfaces, respectively. These are many of the adapter design patterns used in the Java Collections framework. Implement some or all of the methods in the interface in an abstract class, so that some of the following classes can simply inherit the abstract class directly and implement their own methods instead of implementing all of the abstract methods in the interface.

1.2 the Map

Map is a mapping interface in which each element is a key-value pair. AbstractMap also implements most functions of Map interface through adapter pattern. Implementation classes such as TreeMap, HashMap, and WeakHashMap are all implemented by inheriting AbstractMap.

1.3 the Iterator

Iterator is an Iterator that iterates over collections. It can be used to iterate over collections, but not over maps. The implementation classes of a Collection all implement the iterator() function, which returns an iterator object that iterates through the Collection. ListIterator is specifically used to iterate over lists. Enumeration, introduced in JDK 1.0, does the same but less well than Iterator. It can only be used in hashtables, vectors, and stacks.

1.4 the Arrays and Collections

Arrays and Collections are utility classes for manipulating Arrays and Collections. For example, ArrayList and Vector use the array.copyof () method extensively. In Collections, however, there are many static methods that can return synchronized, thread-safe, versions of collection classes. Of course, if you want thread-safe collection classes, you prefer concurrent and the corresponding collection classes under the package.

Second, the ArrayList

ArrayList is based on a dynamic array of growth, ArrayList is not thread-safe, in the case of multiple threads can consider to use Collections. SynchronizedList List (T) function returns a thread-safe ArrayList class, You can also use and distribute the CopyOnWriteArrayList class under the package.

The ArrayList

class inherits from AbstractList

and implements the following four interfaces:

  • List<T>
  • RandomAccess: Supports fast random access
  • Cloneable: can be cloned
  • Serializable: Supports serialization

The ArrayList expansion

Since ArrayList is implemented based on arrays, we need to make sure we have enough space to hold new elements before adding them to the array using the addXX method. This is done using ensureCapacityInternal, which takes the required array size as an argument:

  • If the current array is empty and the required capacity is less than10, then set the required capacity to10
  • Then try to expand the array size to the current size2.5times
  • If you still can’t meet the requirements, set the array size to the required capacity
  • If the required capacity is greater than the maximum value of the default integer minus8, then callhugeCapacityMethod to set the size of the array to the maximum value of an integer
  • Finally, callArrays.copyOfCopies elements from the original array into the new array.

Array.copyof will eventually call the system.arrayCopy () method. The Native function actually ended up calling C memmove () function, so it can ensure the correct copy and move within the same array element, higher than the average efficiency of the implementation of the replication methods many, very suitable for batch processing array, a Java when copying a large number of array elements are strongly recommended to use this method, in order to obtain higher efficiency.

ArrayList is converted to a static array

ArrayList provides two methods of converting to static arrays:

  • Object[] toArray()It is possible that the method will throwjava.lang.ClassCastExceptionThe exception, if you directly use the downward transformation method, will be the wholeArrayListCollection converted to the specified typeArrayArray, throws the exception, and if converted toArrayThis exception will not be thrown if each element in the array is cast downward instead of being cast downward. Obviously, it is inefficient and inconvenient to cast down each element in the array.
  • T[] toArray(T[] a)This method can be used directlyArrayListTransformedArrayOverall downward transformation, and from the method source can be seen, parametersaIs called internally when the size of theArrays.copyOfMethod that internally creates a new array return, so the usual form for this method is as follows:
public static Integer[] vectorToArray2(ArrayList<Integer> v) {    
    Integer[] newText = (Integer[])v.toArray(new Integer[0]);    
    return newText;    
}   
Copy the code

Element access mode

ArrayList is implemented based on arrays. It can directly find elements at specified locations through subscript indexes, so the search efficiency is high. However, each time an element is inserted or deleted, a large number of elements are moved, and the insertion and deletion efficiency is low.

In methods such as searching for a given element index value, the source code divides the value of the element into null and non-null. ArrayList allows elements to be null.

Third, LinkedList

LinkedList is implemented based on a bidirectional circular LinkedList and can be used as a stack, queue, and double-ended queue in addition to acting as a LinkedList.

LinkedList is thread-safe, also in the case of multiple threads can consider to use Collections. SynchronizedList List (T) function returns a thread-safe LinkedList class, LinkedList inherits from the AbstractSequentialList class and implements the following four interfaces:

  • List<T>
  • DequeandQueue: Dual-ended queue
  • Cloneable: Supports cloning operations
  • Serializable: Supports serialization

The linked list node

The realization of LinkedList is based on bidirectional circular LinkedList, and there is no data stored in the head node voidLink, so there is no way to expand it, only need to change the point of the node, each LinkedList node contains the data of the node, as well as the reference of the precursor and successor nodes, its definition is as follows:

    private static final class Link<ET> {
        // Data for this node.
        ET data;
        // Precursor node and successor node.Link<ET> previous, next; Link(ET o, Link<ET> p, Link<ET> n) { data = o; previous = p; next = n; }}Copy the code

Find and delete operations

When looking for corresponding node need according to the location of the data, the first is to find the position and the size of the list, if less than half, so nodes from the subsequent began to search backwards, precursors from start node node to start looking for forward conversely, so for a find operation, its efficiency is very low, but the efficiency of data insertion and deletion to the end node is higher.

Fourth, the Vector

Vector is also implemented based on arrays and can grow dynamically. Many of its implementations incorporate synchronous statements and are therefore thread-safe.

Vector inherits from AbstractList and implements the following four interfaces:

  • List<E>
  • RandomAccess: Supports random access
  • Cloneable, java.io.SerializableSupport:CloneAnd serialization.

The implementation of Vector is generally similar to ArrayList, with the following features:

  • VectorThere are four different constructors, and the capacity of the no-parameter constructor is the default10, the constructors containing only capacity set the capacity growth as0.
  • whenVectorIf the capacity is insufficient to accommodate new elements, the system will expand the capacity. First, determine whether the capacity growth value is0If for0, then set the new capacity to twice the old capacity, otherwise set the new capacity to the old capacity plus the capacity growth. If the new capacity is not enough, set the new capacity to the parameter passed in.
  • Elements are stored and read based on whether the element value isnullThat is to say,VectorThe allowed element isnull.

Five, the HashSet

HashSet has the following characteristics:

  • The order of elements cannot be guaranteed, and the order may change
  • Not synchronous
  • The set elements can benull, but only onenull

When an element is stored in a HashSet, the HashSet calls the object’s hashCode() method to get the object’s hashCode value, which is then used to determine where the object will be stored in the HashSet. Simply put, the criteria for determining the equality of two elements in a HashSet is that the two objects are equal through equals and that their hashCode() method returns the same value.

Note that if you want to put an object into a HashSet and override the equals method of the object’s corresponding class, you should also override its hashCode() method. The rule is that if two objects return true via equals, their hashCode should also be the same. In addition, any property in an object that is used as an Equals comparison standard should be used to calculate the value of hashCode.

Six, TreeSet

TreeSet is the only implementation class of the SortedSet interface, and TreeSet ensures that the collection elements are in sorted state. TreeSet supports two sorting modes, natural sorting and custom sorting. Natural sorting is the default sorting mode.

Objects of the same class should be added to TreeSet. TreeSet determines that two objects are not equal if they return false through equals or zero through CompareTo.

Natural ordering

Natural sort uses the CompareTo(Object obj) method of the elements to be sorted to compare the size relationships between the elements, and then ranks the elements in ascending order. Java provides a Comparable interface that defines a compareTo(Object obj) method that returns an integer value. Objects that implement the Comparable interface can be compared in size.

The obj1.compareTo(obj2) method, if it returns 0, indicates that the two objects being compared are equal. If it returns a positive number, it indicates that obj1 is greater than obj2. If it is negative, it indicates that obj1 is less than obj2. If we always return true to equals on both objects, then the compareTo on both objects should return 0.

Custom sorting

Int compare(T o1,T O2) by using the Comparator interface. Compare (T o1,T O2)

  • TreeSetIs implemented by binary trees,TreesetThe data in is automatically sorted and cannot be addednullValue.
  • HashSetIt’s implemented by hash tables,In the HashSetThe data is unordered and can be put innull, but only onenullValues in both cannot be repeated, just like a unique constraint in a database.
  • HashSetThe objects required to be put in must be implementedhashCode()Method, the object put in, ishashcode()Having the same content as the codeStringObject,hashcodeIt’s the same, so you can’t repeat what you put in. But objects of the same class can fit into different instances.

Seven, HashMap

A HashMap is implemented based on a hash table. Each element is a key-value pair, which internally resolves conflicts through a single linked list. When the capacity is insufficient, it will also grow automatically. HashMap is not thread-safe and is only used in a single-threaded environment, where ConcurrentHashMap can be used to send packets.

HashMap inherits from AbstractMap and implements both Cloneable and Serializable interfaces, so it supports cloning and serialization.

The overall structure of the HashMap

HashMap is implemented based on arrays and linked lists:

  • First of all, according to theKeythehashCodeMethod to calculate the coordinates in an array.
// Computes the hash value of the Key.
int hash = sun.misc.Hashing.singleWordWangJenkinsHash(key);
// Calculate the subscript based on the hash value of the Key and the length of the list.
int i = indexFor(hash, table.length);
Copy the code
  • Checks if there is already an element in the current position of the array, if not, thenKey/ValueEncapsulated intoHashMapEntryData structure and treat it as an element of the array at that location. Otherwise, the list is iterated from the beginning, ifThe following two conditions are satisfied, then replace the linked list node’sValue
//Value Specifies the substitution condition
// Condition 1: The hash values are identical
// Condition 2: The key points to the same memory address or the key's equals method returns true
(e.hash == hash && ((k = e.key) == key || key.equals(k)))
Copy the code
  • If no alternative node can be found after traversing the entire list, replace this node with a new oneHashMapEntryAs the head node of the list, and as the element of the array at that position, the original head node is its successor.

The data structure of HashMapEntry

HashMapEntry is defined as follows:

static class HashMapEntry<K.V> implements Map.Entry<K.V> {
        //Key
        final K key;
        //Value
        V value;
        // Subsequent nodes.
        HashMapEntry<K,V> next;
        // If Key is not null, then it is its hash value, otherwise 0.
        int hash;
        //....
}
Copy the code

Element in

In the first section, we briefly calculated the overall structure of the HashMap, from which we can infer that the design should be as uniform as possible, keeping the list as short as possible at each position of the array, and avoiding the process of traversing from the head of the list.

If the elements are evenly distributed, it depends on the process of calculating the index of the array based on the Hash value of the Key. Let’s look at the calculation of the Hash value.

    public static int secondaryHash(Object key) {
        return secondaryHash(key.hashCode());
    }

    private static int secondaryHash(int h) {
        h += (h <<  15) ^ 0xffffcd7d;
        h ^= (h >>> 10);
        h += (h <<   3);
        h ^= (h >>>  6);
        h += (h <<   2) + (h << 14);
        return h ^ (h >>> 16);
    }
Copy the code

Then, the calculated Hash value is mod by subtracting the length of the current array to obtain the corresponding array subscript:

hash & (tab.length - 1)
Copy the code

Because the HashMap in capacity, ensure that the length of the array is moderate for 2 n power, so the length 1 binary representation to all 1 all the time, so for & operation results not only ensure the scope of the final results will not exceed the length of the array, but also to ensure the two Hash value of the same element not mapped to an array of the same place, This, coupled with the above quadratic Hash process plus a high degree of computational optimization, results in as even a distribution of data as possible.

Read elements

In fact, the array index is calculated by Key, traversing the list of array elements in that position to find the process.

capacity

As the number of elements in a HashMap increases, the probability of hash collisions increases. Since the length of the array is fixed, it is necessary to expand the array of a HashMap to improve query efficiency.

When the number of elements in the HashMap exceeds the array size * loadFactor (the default value of loadFactor is 0.75), the array is expanded to double the original size, and the location of each element in the array is recalculated. The data in the original array must be recalculated into the new array. And put it in.

Expansion is a performance-intensive operation, so if we already know the number of elements in a HashMap, the preset number of elements can effectively improve the performance of the HashMap.

The mechanism of Fail – Fast

HashMap is not thread safe, so if you have any other in the process of using the iterator thread modifies the map, then throw ConcurrentModificationException, this is called fail – fast strategy.

This strategy is implemented in the source code via the modCount field, which as the name implies is the number of changes. Any changes to the HashMap content will increase this value, which is then assigned to the expectedModCount of the iterator during initialization.

During iteration, test whether modCount and expectedModCount are equal. If they are not equal, it indicates that other threads have modified the Map, and then throw an exception as follows:

    HashMapEntry<K, V> nextEntry(a) {
        if(modCount ! = expectedModCount)throw new ConcurrentModificationException();
           / / to omit...
    }
Copy the code

ModCount is declared volatile, which ensures memory visibility in multithreaded situations.

After the iterator to create, if the mapping from the structure modification, except through the iterator itself the remove method, other any time in any way modify, iterator will throw ConcurrentModificationException. Therefore, in the face of concurrent modifications, iterators will soon fail completely, without the risk of arbitrary uncertain behavior at some unspecified future time

Eight, HashTable

A HashTable is often compared to a HashMap, which is thread-safe and is not. HashTable is older than HashMap. The synchronized keyword is used for both write and read synchronization, and is not recommended because ConcurrentHashMap provides a better solution. Its internal implementation is more complex, which will be analyzed in the next article.

Here’s a quick summary of the differences between it and HashMap:

  • HashTableBased on theDictionaryClass, andHashMapIs based onAbstractMap.DictionaryIs the abstract parent of any class that maps keys to corresponding values, whileAbstractMapBased on theMapAn implementation of an interface to minimize the effort required to implement the interface.
  • HashMapthekeyandvalueAllow for thenullAnd theHashtablethekeyandvalueAre not allowed tonull.HashMapencounterkeyfornullIs calledputForNullKeyMethod to process, while thevalueNo processing,Hashtableencounternull, return directlyNullPointerException.
  • HashtableThe method is synchronization, andHashMapIt isn’t. We can look at the source code,HashtableAlmost all of thempublicThe methods are allsynchronizedAnd some methods are also passed internallysynchronizedCode block to implement. So it’s generally recommended if it involves multi-threaded synchronizationHashTable, adopt without referenceHashMap, but inCollectionsThere is a static method in the class:synchronizedMap()That method creates a thread-safeMapObject and return it as a wrapped object.

Nine, TreeMap

A TreeMap is an ordered set of key-values implemented through a red-black tree. TreeMap inherits from AbstractMap, so it is a Map, that is, a key-value collection. TreeMap implements the NavigableMap interface, which means it supports a number of navigation methods, such as returning an ordered collection of keys. TreeMap implements both Cloneable and Serializable interfaces, meaning that it can be Clone and serialized.

TreeMap is implemented based on a red-black tree, and the map sorts according to the natural order of its keys, or according to the Comparator provided when the map was created, depending on the constructor used. The basic TreeMap operations containsKey, GET, PUT, and remove have log(n) time complexity. In addition, TreeMap is asynchronous and its iterator methods return fail-fastl iterators.

Ten, LinkedHashMap

  • LinkedHashMapisHashMapSubclasses of, andHashMapIt has the same storage structure, but it joins the head of a bidirectional linked list, which will allputtoLinkedHashmapThe nodes are strung together in a bidirectional circular linked list, so it preserves the insertion order of nodes so that the output order of nodes is the same as the input order.
  • LinkedHashMapCan be used to implementLRUAlgorithm.
  • LinkedHashMapAlso non-thread-safe, used only in single-threaded environments.

Eleven, LinkedHashSet

LinkedHashSet is a hash table and linked-list implementation of the Set interface with a predictable iteration order. This implementation differs from HashSet in that it maintains a double-linked list running on all items. This list of links defines an iteration order, which can be an insert or access order.

Implementation of LinkedHashSet: For LinkedHashSet, it inherits from HashSet and is based on LinkedHashMap.