Collection of basic source code analysis | Java development combat

This is the fifth day of my participation in Gwen Challenge. This article is participating in “Java Theme Month – Java Development in Action”, see the activity link for details.

In Java, we often use some of the collection class processing cache data, these collection classes have their own characteristics, today mainly share the Java collection of several commonly used Map, List, Set.

Set 1: Map

background

If you need to search for a specific information in a large amount of data, it may be like looking for a needle in a haystack. In this case, you can use Map to obtain the information. Because a Map is a set of key-value pairs. Map is a data structure of a key-value mapping table. It helps users efficiently and quickly search for values based on keys.

Here’s an example:

import java.util.HashMap; import java.util.Map; import java.lang.Object; public class Test { public static void main(String[] args) { Object o = new Object(); Map<String, Object> map = new HashMap<>(); map.put("aaa", o); // Map "aaa" to Object instance and associate Student target = map.get("aaa"); System.out.println(target == o); Student another = map.get(" BBB "); // Find system.out.println (another) with another key; // Return null}}Copy the code

Map<K, V> is a key-value mapping table. When we call put(K key, V value), the key and value are mapped and put into the Map. When we call V get(K key), we can get the corresponding value by key. If the key does not exist, null is returned. Like List, Map is an interface, and the most commonly used implementation class is HashMap.

In Map<K, V>, the key is unordered when traversing.

import java.util.HashMap;
import java.util.Map;
public class Test {
    public static void main(String[] args) {
        Map<String, String> map = new HashMap<>();
        map.put("dog", "a");
        map.put("pig", "b");
        map.put("cat", "c");
        for (Map.Entry<String, Integer> entry : map.entrySet()) {
            String key = entry.getKey();
            Integer value = entry.getValue();
            System.out.println(key + " = " + value);
        }
    }
}


//print
cat = c
dog = a
pig = b

Copy the code

From the print above, it is unordered, and the ordered answers can be found below. Let’s take a look at the Map family:

Its descendants include HashMap, LinkedHashMap, TreeMap, and Hashtable, which inherits Dictionary and implements the Map interface.

The following describes the characteristics of each implementation class:

(1) HashMap: It stores data according to the hashCode value of the key. In most cases, its value can be directly located, so it has efficient access speed, but the traversal order is uncertain. A HashMap allows at most one record to have a null key and multiple records to have a NULL value. A HashMap is not thread-safe, meaning that more than one thread can write a HashMap at any one time, which can lead to data inconsistencies. If thread-safe is desired, HashMap can be made thread-safe using the synchronizedMap method, the Collections static method, or using ConcurrentHashMap(piecewise locked).

(2) LinkedHashMap: LinkedHashMap is a subclass of HashMap, which completes the recording function of the input order for HashMap. Therefore, in order to achieve the consistency of output and input order, LinkedHashMap should be used.

(3) TreeMap: TreeMap implements the SortedMap interface, which can sort the records it saves according to the key. The default is the ascending order of the key value. You can also specify the comparator for sorting. If sorted mappings are used, TreeMap is recommended. When using TreeMap, the key must implement the Comparable interface or pass in a custom Comparator when constructing TreeMap, otherwise an exception of type ClassCastException will be thrown at runtime.

(4) Hashtable: Hashtable uses the Dictionary class to implement the Map interface. Many common functions of HashMap are similar to those of HashMap. Hashtable uses the “zip method” to implement the Hashtable, but it comes from the Dictionary class and is thread safe. Only one thread can write a Hashtable at any time, but it is not as concurrent as ConcurrentHashMap, which introduces segmentalized locking. Hashtable uses synchronized to ensure thread safety. In the case of fierce thread competition, Hashtable is very inefficient. It is not recommended for use in new code, but can be replaced with HashMap for cases where thread safety is not required, and ConcurrentHashMap for cases where thread safety is required. Instead of locking every position in the array like ConcurrentHashMap, Hashtable locks operations, which is poor performance.

HashMap ConcurrentHashMap ConcurrentHashMap

public class HashMap<K,V> extends AbstractMap<K,V> implements Map<K,V>, Cloneable, Serializable { private static final long serialVersionUID = 362498820763181265L; /** * default size 16 */ static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; Static final int MAXIMUM_CAPACITY = 1 << 30; Static final float DEFAULT_LOAD_FACTOR = 0.75f; static final float DEFAULT_LOAD_FACTOR = 0.75f; Static final int TREEIFY_THRESHOLD = 8; static final int TREEIFY_THRESHOLD = 8; /** * The table, initialized on first use, and resized as * necessary. When allocated, length is always a power of two. * (We also tolerate length zero in some operations to allow * bootstrapping mechanics that are currently not needed.) */ transient Node<K,V>[] table; }Copy the code

As seen above, HashMap is mainly composed of array + linked list structure. HashMap expansion is a multiple expansion. Why multiple, and not 1.5 or something? If the size of a HashMap is not 2 to the NTH power, it will be useless.

After expansion, the hash of all the original data keys will be recalculated into the expanded array. Why do you do that? Because different array sizes hash out different indices for keys. Also, keeping the length of the array to the power of 2, with all the low values of Length-1 equal to 1, will make the array index index obtained more uniform.

Why is Hashmap not thread-safe? Cause: When the number of concurrent threads exceeds the threshold, the resize operation will be called at the same time, and new arrays will be generated and assigned to the underlying array after rehash. As a result, only the new array generated by the last thread will be assigned to the table variable, and all other threads will lose. There is also a problem when some threads have already completed the assignment and other threads start using the assigned table as the original array.

Question:

An entry chain in the HashMap is too long and the query time reaches the upper limit. How to handle this problem? This is done in Jdk1.8, when the list becomes too long, the list is converted into a red-black tree (TreeNode) to achieve a higher time complexity lookup.

HashMap hash algorithm implementation? When we use put(key,value) method to add elements to HashMap, we first calculate the Hash value of the key, and then calculate the position of the element in the array by dissimilarity between the high 16 bits of the key and the low 16 bits of the key (the high 16 bits remain unchanged), and then match with the array size -1. Procedure:

Extension: What happens if an object overrides equals() instead of hashCode ()? Although the keys we use for get and put are logically equivalent (equal by equals comparison), we did not override hashCode(), so when we put, Key (hashcode1)–>hash–>indexFor–>index; key(hashcode2)–>hash–>indexFor–>index; Because hashCode1 is not equal to Hashcode2, null is returned because hashCode1 is not located in an array location. So, when overriding equals(), care must be taken to override hashCode() while ensuring that two objects are equal through equals() and that calling hashCode returns the same integer value. If equals determines that two objects are not equal, the hashCode can be the same (but hash collisions can occur, which should be avoided). 1. Have the same hash but not necessarily the same key: If key1 and key2 had the same hash, there would be no hash list. If key1 and key2 had the same hash, there would be no hash list.

ConcurrentHashMap uses segmented locking technology to divide data into segments and assign a lock to each segment. When a thread accesses one segment, the data in the other segments can also be accessed by other threads.

Set 2: List

The Collection List is a subinterface of the Collection interface, which is also used as a data cache. The List sorts the elements and allows the same elements to be stored, i.e. ordered and repeatable. Let’s start by looking at the subclasses:

ArrayList, LinkedList, ArrayList, LinkedList, ArrayList, LinkedList

ArrayList:

Advantages: efficient operation read operation, array based implementation, can be null, can allow repeated elements, ordered, asynchronous.

Disadvantages: Since it is implemented by dynamic arrays, it is not suitable for frequent insert and delete operations on elements, as each insert and delete requires moving elements in the array.

LinkedList:

Advantages: LinkedList is implemented by double LinkedList, adding and deleting, because there is no need to move the underlying array data, which is implemented by LinkedList, only need to modify the LinkedList node pointer, and the element insertion and deletion efficiency is high.

Disadvantages: Low traversal efficiency. Hashmaps are also related to double-linked lists.

The underlying ArrayList is a varied-length array, basically the same as a Vector, but ArrayList synchronizes the writeObjec() and readObject() methods.

transient Object[] elementData;


/**
 * Constructs an empty list with an initial capacity of ten.
 */
public ArrayList() {
    this.elementData = DEFAULTCAPACITY_EMPTY_ELEMENTDATA;
}
If multiple threads access an <tt>ArrayList</tt> instance 
concurrently, and at least one of the threads modifies 
the list structurally, it <i>must</i> be synchronized externally.
(A structural modification is any operation that adds or deletes one or more elements, or explicitly resizes the backing array; merely setting the value of an element is not a structural modification.)
This is typically accomplished by synchronizing 
on some object that naturally encapsulates the list.
Copy the code

Threads from the comments, we know the ArrayList is unsafe, multithreaded environment through external synchronization strategies after use, such as the List List = Collections. SynchronizedList (new ArrayList (…). ).

Source code implementation:

private void writeObject(java.io.ObjectOutputStream s) throws java.io.IOException{ // Write out element count, and any hidden stuff int expectedModCount = modCount; s.defaultWriteObject(); // Write out size as capacity for behavioural compatibility with clone() s.writeInt(size); // Write out all elements in the proper order. for (int i=0; i<size; i++) { s.writeObject(elementData[i]); } if (modCount ! = expectedModCount) { throw new ConcurrentModificationException(); } } /** * Reconstitute the <tt>ArrayList</tt> instance from a stream (that is, * deserialize it). */ private void readObject(java.io.ObjectInputStream s) throws java.io.IOException, ClassNotFoundException { elementData = EMPTY_ELEMENTDATA; // Read in size, and any hidden stuff s.defaultReadObject(); // Read in capacity s.readInt(); // ignored if (size > 0) { // be like clone(), allocate array based upon size not capacity int capacity = calculateCapacity(elementData, size); SharedSecrets.getJavaOISAccess().checkArray(s, Object[].class, capacity); ensureCapacityInternal(size); Object[] a = elementData; // Read in all elements in the proper order. for (int i=0; i<size; i++) { a[i] = s.readObject(); }}}Copy the code

When the add function is called, ensureCapacityInternal is called to expand the list by 1.5 times its original size, but the list size is 10 by default when the first element is added or the number of elements in the list is less than 10.

/**
 * Default initial capacity.
 */
private static final int DEFAULT_CAPACITY = 10;

/**
 * Shared empty array instance used for empty instances.
 */
private static final Object[] EMPTY_ELEMENTDATA = {};

/**
 * Shared empty array instance used for default sized empty instances.
 */
private static final Object[] DEFAULTCAPACITY_EMPTY_ELEMENTDATA = {};

/**
 * The array buffer into which the elements of the ArrayList are stored.
 */
transient Object[] elementData; // non-private to simplify nested class access

/**
 * The size of the ArrayList (the number of elements it contains).
 */
private int size;
Copy the code

Expansion principle: according to the size of the current array, check whether it is smaller than the default value 10. If it is larger, expand the array to 1.5 times the size of the current array, copy the data of the new expanded array to the current elementData, and assign the value of the element to size++.

private void ensureCapacityInternal(int minCapacity) {
  ensureExplicitCapacity(calculateCapacity(elementData, minCapacity));
}

private void ensureExplicitCapacity(int minCapacity) {
  modCount++;

  // overflow-conscious code
  if (minCapacity - elementData.length > 0)
    grow(minCapacity);
}
private void grow(int minCapacity) {
  // overflow-conscious code
  int oldCapacity = elementData.length;
  int newCapacity = oldCapacity + (oldCapacity >> 1);
  if (newCapacity - minCapacity < 0)
    newCapacity = minCapacity;
  if (newCapacity - MAX_ARRAY_SIZE > 0)
    newCapacity = hugeCapacity(minCapacity);
  // minCapacity is usually close to size, so this is a win:
  elementData = Arrays.copyOf(elementData, newCapacity);
}

private static int hugeCapacity(int minCapacity) {
  if (minCapacity < 0) // overflow
    throw new OutOfMemoryError();
  return (minCapacity > MAX_ARRAY_SIZE) ?
    Integer.MAX_VALUE :
    MAX_ARRAY_SIZE;
}
Copy the code

Why is ArrayList slow to add and delete, and fast to query?

public boolean add(E e) {
  ensureCapacityInternal(size + 1);  // Increments modCount!!
  elementData[size++] = e;
  return true;
}
Copy the code

EnsureCapacityInternal (size + 1) is a function that automatically increments the size of the existing array. Each time you add an array, you need to copy the values between the arrays. Finally, when the expansion is completed and space is available, add the position of array size+1 to element E to realize the addition.

Delete source code:

/** * Removes the element at the specified position in this list. * Shifts any subsequent elements to the left (subtracts one from their * indices). * * @param index the index of the element to be removed * @return the element that  was removed from the list * @throws IndexOutOfBoundsException {@inheritDoc} */ public E remove(int index) { rangeCheck(index); modCount++; E oldValue = elementData(index); int numMoved = size - index - 1; if (numMoved > 0) System.arraycopy(elementData, index+1, elementData, index, numMoved); elementData[--size] = null; return oldValue; }Copy the code

When deleting elements in index position, rangeCheck(index) should be called first to check the index. If the index exceeds the current number, it will be judged to be out of bounds and throw an exception. Throw new IndexOutOfBoundsException (outOfBoundsMsg (index), and other functions are also useful to such as: Get (int index), set(int index, E element), set(int index, E element); This one takes a lot of performance.

The query:

public E get(int index) {
  rangeCheck(index);

  return elementData(index);
}
Copy the code

To obtain the specified index element, first call rangeCheck(index) to check the index, and then directly obtain the index of the array to obtain the data, without any redundant operations, efficient. AbstractSequentialList class AbstractSequentialList class AbstractSequentialList Class AbstractSequentialList Class AbstractSequentialList Class AbstractSequentialList

Element (): retrieves, but does not remove, the first element of the list. Element (): Retrieves, but does not remove, the first element of the list OfferFirst (E E): Insert the specified element at the beginning of the list. OfferLast (E E): insert the specified element at the end of the list PollFirst (): Gets and removes the last element of the list. PollLast (): Gets and removes the last element of the list Pop (): pops an element push(E E) from the stack represented by the list; RemoveFirst (): Removes and returns the first element of the list removeLast(): removes and returns the last element of the list removeFirstOccurrence(E E): Removes the specified element from the list for the first time RemoveLastOccurrence (E E): Removes the last occurrence of the specified element from the listCopy the code

How to implement LinkedList: the implementation of LinkedList is a two-way LinkedList. In Jdk 1.6 it is a circular bidirectional list with shorts, and in Jdk1.7+ it is a bidirectional list without shorts, as shown in the source code:

// JDK 1.6 Private TRANSIENT Entry<E> header = new Entry<E>(null, null, null); private transient int size = 0; // JDK 1.7 TRANSIENT int size = 0; transient Node<E> first; transient Node<E> last;Copy the code

Look from the source code comments, LinkedList is not thread-safe, multithreaded environment through external synchronization strategies after use, such as the List List = Collections. SynchronizedList (new LinkedList (…). ) :

 If multiple threads access a linked list concurrently, 
 and at least one of the threads modifies the list structurally,
  it <i>must</i> be synchronized externally.  
  (A structural modification is any operation that adds or 
  deletes one or more elements; merely setting the value of 
  an element is not a structural modification.) 
  This is typically accomplished by synchronizing on some object
   that naturally encapsulates the list.
Copy the code

Why do linkedLists add and delete quickly?

/**
 * Appends the specified element to the end of this list.
 *
 * <p>This method is equivalent to {@link #addLast}.
 *
 * @param e element to be appended to this list
 * @return {@code true} (as specified by {@link Collection#add})
 */
public boolean add(E e) {
  linkLast(e);
  return true;
}
/**
 * Links e as last element.
 */
void linkLast(E e) {
  final Node<E> l = last;
  final Node<E> newNode = new Node<>(l, e, null);
  last = newNode;
  if (l == null)
    first = newNode;
  else
    l.next = newNode;
  size++;
  modCount++;
}
Copy the code

The add function appends the element to the end of the list by creating a Node with last as the first Node and null as the last Node. Make the new Node the last Node of the list. If the previous last Node L is null, make the new Node the first Node of the list. Otherwise, the last Node pointer points to the new Node, and the current lList operand modCount+1.

When adding a new element, the LinkedList is concerned with two important data variables, one first and one last, which greatly improves the efficiency of insertion and ensures the order by default by appending to the end.

Delete another element:

/** * Removes the element at the specified position in this list. Shifts any * subsequent elements to the left (subtracts one from their indices). * Returns the element that was removed from the list. * * @param index the index of the element to be removed * @return the element previously at the specified position * @throws IndexOutOfBoundsException  {@inheritDoc} */ public E remove(int index) { checkElementIndex(index); return unlink(node(index)); } /** * Unlinks non-null node x. */ E unlink(Node<E> x) { // assert x ! = null; final E element = x.item; final Node<E> next = x.next; final Node<E> prev = x.prev; if (prev == null) { first = next; } else { prev.next = next; x.prev = null; } if (next == null) { last = prev; } else { next.prev = prev; x.next = null; } x.item = null; size--; modCount++; return element; }Copy the code

Delete the element specified by index by calling checkElementIndex(index) to check if the element is present. If there is no throw throw new IndexOutOfBoundsException (outOfBoundsMsg (index)); Get (int index), set(int index, E element) etc.

Delete a non-empty node from the list. The node in the list is also obtained by node(index), and the joint elements in the list are obtained by the current node: Element, next, prev, prev, prev, frist, last, element null, number -1, operand +1. According to the above analysis, the key variables of remove Node are the local variables of Node instance itself next, prev and item, and the local variables of list first and last to ensure that nodes are connected. The manipulation of these variables makes the deletion action efficient.

For queries:

/** * Returns the element at the specified position in this list. * * @param index index of the element to return * @return the element at the specified position in this list * @throws IndexOutOfBoundsException {@inheritDoc} */ public E  get(int index) { checkElementIndex(index); return node(index).item; }Copy the code

Get the node at the specified index position. Check the element by calling checkElementIndex(index) before retrieving the element, and then fetch the element by node(index). As mentioned above, node obtains the element through traversal, so it is relatively low performance.

Set 3: Set

Set Set is also used more often in our daily life. Used to store a collection of non-repeating elements, it provides the following methods:

Add elements to Set: add(E E)

Remove element from Set: remove(Object e)

Contains (Object e)

Each of these methods returns a Boolean value, that is, whether it is correct or successful. A Set is a Map that stores only keys but not values. We often use Set to remove duplicate elements because adding the same key twice returns false.

public HashSet() {
    map = new HashMap<>();
}


public TreeSet() {
    this(new TreeMap<E,Object>());
}
Copy the code

The main descendants of Set are: HashSet, SortedSet. HashSet is unordered because it implements the Set interface, not the SortedSet interface, whereas TreeSet implements the SortedSet interface to ensure elements are ordered.

The output of the HashSet is also unordered:

public class Test {
    public static void main(String[] args) {
        Set<String> set = new HashSet<>();
        set.add("2");
        set.add("6");
        set.add("44");
        set.add("5");
        for (String s : set) {
            System.out.println(s);
        }
    }
}

//print
44
2
5
6
Copy the code

The order in which you see the output is neither the order in which you add it nor the order in which you sort the String, and it may be different in different JDK versions.

With TreeSet:

public static void main(String[] args) {
  Set<String> set = new TreeSet<>();
      set.add("2");
      set.add("6");
      set.add("44");
      set.add("5");
      for (String s : set) {
          System.out.println(s);
      }
 }
//print
2
44
5
6
Copy the code

When traversing a TreeSet, the output is ordered, not in the order in which it was added, but in the order in which the elements were sorted.

Note: The added element must implement the Comparable interface. If the Comparable interface is not implemented, a Comparator object must be passed in when creating a TreeSet.

Collection of basic source code analysis | Java development combat

Set 1: Map

background

Set 2: List

Set 3: Set

Related Posts

Python crawler: Crawls Instagram and decrypts JS encryption parameters

Build user security rating, UGC intelligent audit application practice

Go Language Introduction 10– If branch language and Switch statement | Go theme month