Collection of Java collections

This is the first day of my participation in the August Text Challenge.More challenges in August

A collection is a collection of numbers, like a container, but it should be clear that the collection holds references to objects, not actual entities. When we talk about objects in collections, we really mean references to objects.

We can think of a Collection as a small database for storing data. Our operations on the Collection are data addition, deletion, change and search. In Java, there are two top-level interfaces, Collection and Map, which are used to define and standardize the operations related to the Collection. This article focuses on the Collection section of the Collection framework.

A Collection represents a set of objects, which can be ordered or unordered, and provides different subinterfaces to meet our requirements. Let’s focus on lists and sets.

The whole characteristic of a List is that it is ordered and repeatable. What we need to explore is what the specific implementation classes in the figure above have in common. Vector is an ancient implementation of the List class. It was introduced in JDK 1.0. Go away and start reading the source code. The source code comes from JDK 1.7.

public class Vector<E> extends AbstractList<E> implements List<E>, RandomAccess, Cloneable, java.io.Serializable { /** * The array buffer into which the components of the vector are * stored. */ protected Object[] elementData; /** * The number of valid components in this {@code Vector} object. */ protected int elementCount; /** * The amount by which the capacity of the vector is automatically * incremented when its size becomes greater than its capacity. If * the capacity increment is less than or equal to zero, the capacity * of the vector is doubled each time it needs to grow. */ protected int capacityIncrement; public Vector() { this(10); } public synchronized Boolean add(E E) {modCount++; ensureCapacityHelper(elementCount + 1); elementData[elementCount++] = e; return true; } public synchronized E remove(int index) {modCount++; if (index >= elementCount) throw new ArrayIndexOutOfBoundsException(index); E oldValue = elementData(index); int numMoved = elementCount - index - 1; if (numMoved > 0) System.arraycopy(elementData, index+1, elementData, index, numMoved); elementData[--elementCount] = null; // Let gc do its work return oldValue; } public synchronized E set(int index, synchronized E set(int index, synchronized E set) E element) { if (index >= elementCount) throw new ArrayIndexOutOfBoundsException(index); E oldValue = elementData(index); elementData[index] = element; return oldValue; Public synchronized E get(int index) {if (index >= elementCount) throw new ArrayIndexOutOfBoundsException(index); return elementData(index); }...Copy the code

From the above source code analysis, we can know about the characteristics of Vector. 1 The underlying implementation uses arrays to store data, so it is relatively fast to find and add elements, and slow to delete and insert elements. The initial length of the 2 array is 10, and when the length is insufficient, the increment is also 10 represented by capacityIncrement. Synchronized keyword is added to the declaration of 3 methods, thread-safe, so the efficiency is reduced accordingly. 4. Look again if you haven’t analyzed it.

Let’s start with the source code for ArrayList.

public class ArrayList<E> extends AbstractList<E> implements List<E>, RandomAccess, Cloneable, java.io.Serializable { private static final long serialVersionUID = 8683452581122892189L; /** * Default initial capacity. */ private static final int DEFAULT_CAPACITY = 10; /** * Shared empty array instance used for empty instances. */ private static final Object[] EMPTY_ELEMENTDATA = {}; /** * The array buffer into which the elements of the ArrayList are stored. * DEFAULT_CAPACITY when the first element is  added. */ private transient Object[] elementData; /** * Constructs an empty list with an initial capacity of ten. */ public ArrayList() { super(); this.elementData = EMPTY_ELEMENTDATA; } public Boolean add(E E) {ensureCapacityInternal(size + 1); // Increments modCount!! elementData[size++] = e; return true; Private void grow(int minCapacity) {// overflow-conscious code int oldCapacity = elementdata.length; int newCapacity = oldCapacity + (oldCapacity >> 1); if (newCapacity - minCapacity < 0) newCapacity = minCapacity; if (newCapacity - MAX_ARRAY_SIZE > 0) newCapacity = hugeCapacity(minCapacity); // minCapacity is usually close to size, so this is a win: elementData = Arrays.copyOf(elementData, newCapacity); }...Copy the code

Since the source code is similar to Vector, I will not post some, but we will continue to analyze ArrayList. The underlying data store is an array of the same length of 10, but it’s improved. Instead of being initialized at first creation, it’s initialized when the first element is added. The declaration of method 2 is less synchronized keyword, which is not thread safe, but the performance is improved. 3 If the array length is insufficient, it is automatically increased to 1.5 times the original length.

The above analysis also illustrates the difference between Vector and ArrayList. The main point is that Vector is no longer used. Use an ArrayList. More on thread safety later.

Then look at the implementation of LinkedList, source ~

public class LinkedList<E> extends AbstractSequentialList<E> implements List<E>, Deque<E>, Cloneable, java.io.Serializable { transient int size = 0; /** * Pointer to first node. */ transient Node<E> first; /** * Pointer to last node. */ transient Node<E> last; ** * Constructs an empty list. */ Constructs an empty list. */ Constructs an empty list. Private static class Node<E> {E item; Node<E> next; Node<E> prev; Node(Node<E> prev, E element, Node<E> next) { this.item = element; this.next = next; this.prev = prev; Public Boolean add(E E) {linkLast(E); return true; } void linkLast(E e) { final Node<E> l = last; final Node<E> newNode = new Node<>(l, e, null); last = newNode; if (l == null) first = newNode; else l.next = newNode; size++; modCount++; } unlink(Node<E> x) {// Assert x! = null; final E element = x.item; final Node<E> next = x.next; final Node<E> prev = x.prev; if (prev == null) { first = next; } else { prev.next = next; x.prev = null; } if (next == null) { last = prev; } else { next.prev = prev; x.next = null; } x.item = null; size--; modCount++; return element; }...Copy the code

The point is that the underlying implementation of LinkedList is a double-linked list. This results in the following features: 1 Elements are slower to find but faster to add and remove. 2 takes up memory because each node maintains two indexes. 3 The thread is unsafe. 4 There is no limit on the set length.

Now that we’ve analyzed the List implementation, we’ll talk about the differences between Vector, ArrayList, and LinkedList. Then we move on to Set, another subinterface of Collection. First, the big premise is that the elements stored in a Set are unordered and non-repeatable. Then we’ll look at how the implementation class is implemented. Let’s start with the HashSet.

public class HashSet<E> extends AbstractSet<E> implements Set<E>, Cloneable, java.io.Serializable { private transient HashMap<E,Object> map; // Dummy value to associate with an Object in the backing Map private static final Object PRESENT = new Object(); /** * Constructs a new, empty set; The backing <tt>HashMap</tt> instance has * default initial capacity (16) and load factor (0.75). */ public HashSet() {backing <tt>HashMap</tt> instance has * default initial capacity (16) and load factor (0.75). map = new HashMap<>(); Public Boolean add(E E) {return Map. Put (E, PRESENT)==null;} public Boolean add(E E) {return Map. } // Implement the Iterator method in the Iterable interface. public Iterator<E> iterator() { return map.keySet().iterator(); }...Copy the code

HashSet is the encapsulation of HashMap. The underlying implementation is based on hash table. It is necessary to introduce hash related knowledge again. All values in a Set are stored in a Map as keys, so elements in a Set cannot be repeated. 3 The thread is unsafe. 4 allows null values because Map keys are allowed to be null.

The last implementation class for today is TreeSet, and the elements in this implementation class are ordered! Note that ordering means ordering according to certain rules, and we say that elements in a Set are unordered because the order in which they are added to the Set is not guaranteed to be the same as the order in which they are printed. How does TreeSet keep order we’ll talk about that in a second, but it’s the same thing, it’s a wrapper around TreeMap, threads are still not safe.

public class TreeSet<E> extends AbstractSet<E> implements NavigableSet<E>, Cloneable, java.io.Serializable { /** * The backing map. */ private transient NavigableMap<E,Object> m; // Dummy value to associate with an Object in the backing Map private static final Object PRESENT = new Object(); /** * Constructs a set backed by the specified navigable map. */ TreeSet(NavigableMap<E,Object> m) { this.m = m; } /** * Constructs a new, empty tree set, sorted according to the * natural ordering of its elements. All elements inserted into * the set must implement the Comparable interface. */ public TreeSet() { this(new TreeMap<E,Object>()); } /** * Constructs a new, empty tree set, sorted according to the specified * comparator. All elements inserted into the set must be mutually * comparable by the specified comparator * If the Comparable natural ordering of the elements will be used. */ public TreeSet(Comparator<? super E> comparator) { this(new TreeMap<>(comparator)); } public Boolean add(E E) {return m.put(E, PRESENT)==null; }...Copy the code

You can see the comment above the constructor in TreeSet. TreeSet wants to keep elements in order, and the idea is to compare them as they are added.

. // This is an excerpt from TreeMap's PUT method, to see how it compares. public V put(K key, V value) { ... // split comparator and comparable paths Comparator<? super K> cpr = comparator; if (cpr ! = null) { do { parent = t; cmp = cpr.compare(key, t.key); if (cmp < 0) t = t.left; else if (cmp > 0) t = t.right; else return t.setValue(value); } while (t ! = null); } else { if (key == null) throw new NullPointerException(); Comparable<? super K> k = (Comparable<? super K>) key; do { parent = t; cmp = k.compareTo(t.key); if (cmp < 0) t = t.left; else if (cmp > 0) t = t.right; else return t.setValue(value); } while (t ! = null); }... }Copy the code

The first method requires the class of the object we are adding to implement the Comparable interface, which in turn implements the compareTo method. This method is also called natural sorting, and we pass no sorting rules. This approach corresponds to TreeSet’s empty parameter constructor. The other way is custom sorting, that is, we define the sorting rules of two elements by ourselves and pass in the corresponding sorting rules when instantiating TreeSet, corresponding to the constructor with Comparator interface in TreeSet, in which compare method needs to be implemented. It’s a little confusing, isn’t it? Here’s an example

public class Person implements Comparable<Person>{ public String name; public Integer age; public Person(String name,Integer age) { this.name = name; this.age = age; } /* Custom comparison logic: * The method returns 0, greater than 0, less than 0, corresponding to equal, greater than, And less than */ @override public int compareTo(Person o) {int I = this.name.compareTo(o.name); if(i == 0){ return this.age.compareTo(o.age); }else { return i; } } @Override public String toString() { return "[Person] name:"+this.name+" age:"+this.age; Public static void main(String[] args) {TreeSet<Person> set = new TreeSet<>(); set.add(new Person("AJK923",20)); set.add(new Person("BJK923",20)); set.add(new Person("AJK923",21)); set.add(new Person("BJK923",21)); for (Person person : set) { System.out.println(person.toString()); } /* [Person] name:AJK923 age:20 [Person] name:AJK923 age:21 [Person] name:BJK923 age:20 [Person] name:BJK923 age:21 */ ---- The following is part of the custom sorting ---- anonymous inner class that implements the Comparator interface ----Copy the code

TreeSet<Person> set2 = new TreeSet<>(new Comparator<Person>() { @Override public int compare(Person o1, Person o2) { int i = o1.age.compareTo(o2.age); if(i == 0){ return o1.name.compareTo(o2.name); }else { return i; }}}); set2.add(new Person("AJK923",20)); set2.add(new Person("BJK923",20)); set2.add(new Person("AJK923",21)); set2.add(new Person("BJK923",21)); for (Person person : set2) { System.out.println(person.toString()); } /* [Person] name:AJK923 age:20 [Person] name:BJK923 age:20 [Person] name:AJK923 age:21 [Person] name:BJK923 age:21 */ }Copy the code

Above is the application of natural sort and custom sort. A few points:

If both custom sort and natural sort exist, custom sort takes precedence. Take a look at the implementation of the PUT method above.

2 Natural sort corresponds to the Comparable interface, and custom sort corresponds to the Comparator interface.

The String, wrapper, and date classes already implement the Comparable interface’s comparaTo method, so I’ve been lazy in the previous example.

Here, I have a general understanding of Collection, but I did not mention the specific API, and I do not use a few methods in daily life now. Let’s put a structure diagram of Collection interface, and the method is relatively simple, so you can get a general idea of the meaning from the name.

Again, I want to focus on the Iterator method. This method is defined in the Iterable interface (the Collection inherits the Iterable interface) and returns an iterator object. The rules for iterators are defined in the iterator, and the concrete implementation is in the Collection implementation class. Next, hasNext, and remove methods are defined in the Iterator interface. Take a look at how this works in an ArrayList.

private class Itr implements Iterator<E> { int cursor; // index of next element to return int lastRet = -1; // index of last element returned; -1 if no such int expectedModCount = modCount; public boolean hasNext() { return cursor ! = size; } @SuppressWarnings("unchecked") public E next() { checkForComodification(); int i = cursor; if (i >= size) throw new NoSuchElementException(); Object[] elementData = ArrayList.this.elementData; if (i >= elementData.length) throw new ConcurrentModificationException(); cursor = i + 1; return (E) elementData[lastRet = i]; } public void remove() { if (lastRet < 0) throw new IllegalStateException(); checkForComodification(); try { ArrayList.this.remove(lastRet); cursor = lastRet; lastRet = -1; expectedModCount = modCount; } catch (IndexOutOfBoundsException ex) { throw new ConcurrentModificationException(); } } final void checkForComodification() { if (modCount ! = expectedModCount) throw new ConcurrentModificationException(); }}Copy the code

It’s relatively easy to use, just use a while loop. Look at the following example.

public static void main(String[] args) { List<Integer> list = new ArrayList<>(); list.add(1); list.add(2); list.add(3); Iterator<Integer> it = list.iterator(); while (it.hasNext()) { Integer i = it.next(); System.out.println(i); }}Copy the code

Have you noticed that, except for Vector, any variable that holds a collection element is defined as transient, whether you are an array, Node, or Map?

Take a look at transient function first! Java’s serialization provides a mechanism for persisting object instances. When an object is persisted, there may be a special object data member that we do not want to use serialization mechanisms to store. To turn off serialization on a field for a particular object, you can prefix the field with the keyword TRANSIENT. When an object is serialized, transient variables are not included in the serialization representation; non-transient variables are included.

So why do this, good standard serialization is not, the reasons are as follows:

For an ArrayList, the underlying implementation is an array, and the length of an array is not necessarily equal to its capacity, which may be 10, whereas our element is only 5. So ArrayList implements its own two methods, writeObject and readObject, for serialization and deserialization.

For both hashsets and Hashmaps, the underlying implementation relies on hash tables, and different JVMS may compute different hashCode values, which can lead to serialization dislocations across platforms. So I overwrote those two methods as well. To borrow a vague phrase:

There are N drawbacks to using the default serialization form when the physical representation of an object is materially different from its logical data content, so you should rewrite the serialization method as much as possible.

For Collection, there is a Collections utility class that provides methods such as sorting Collections, finding subcollections, maximizing, minimizing, swapping, populating, shuffling Collections, and so on. Remember that the implementation class was thread unsafe. This utility class provides a number of synchronized counterpart methods.

Postscript: imperceptibly expanded the so many knowledge points, to be honest, there must be missing, as far as I can think of now still have a lot of, get rid of the Hash table and the Hash algorithm to this part, I haven’t for further analysis of the underlying data structure, array, list, binary tree, and so on, now is to analyze, this article has very long.

Related Posts

Dynamic programming → 1143. Longest common subsequence

Publish open source projects to Sonatype

Swoole project: Implementing an online chat room