Java Self-cultivation -Java collection

Disclaimer: This article is a personal summary and supplement to the blogger “I am the flower of the motherland” “Java development post high frequency interview question full analysis”

According to its storage structure, Java Collection can be divided into two categories, namely single-column Collection java.util.Collection and double-column Collection java.util.Map, as shown below:

Difference between HashMap and Hashtable

HashMap does not take synchronization into account and is thread unsafe; Hashtable uses the Synchronized keyword and is thread-safe;
HashMap allows NULL keys; Hashtableb allows null keys, and the value of a Hashtable cannot be null.

HashMap is thread unsafe, right? Take a 🌰

A :(note that the following are common misinterpretations by candidates!! “, because the following answer is memorized by everyone)

Have a fail fast fast – fail mechanism, when the HashMap traversal, calling the remove method makes the iterator will throw an exception ConcurrentModificationException at the time of change. Hashtable does not throw exceptions because synchronized methods are used. (confident tone ^_^ feeling that the interviewer is low).

Give the correct answer first:

The security of HashMap thread is mainly due to the possibility of HashMap infinite loop during capacity expansion in multi-threaded environment.
HashTable is thread-safe because its content implementation uses synchronized on methods like PUT and remove, so the use of a single method is thread-safe. However, when multiple methods are combined, thread-safety is not guaranteed. For example, if a thread is doing a get and then a PUT update, that’s two compound operations. In between the two operations, another thread may have changed the key, so your subsequent PUT operation may not be as expected.

Fast-fail in Java collections?

A: Fail-fast is an error-detection mechanism for Java collections that can occur when multiple threads make structural changes to the collection.

Such as: Let’s say there are two threads (thread 1 and thread 2). Thread 1 Iterator iterates through the elements of set A, and thread 2 at some point changes the structure of set A (the structure changes, rather than simply changing the contents of the elements). So when the program is likely to throw ConcurrentModificationException, resulting in a fast – fail to fail fast.

So how does fast failure work?

Iterators access the contents of the collection directly during traversal and use a modCount variable during traversal. If the contents of the collection change during traversal, the value of modCount is changed. When the iterator iterates over the next element using hashNext()/next(), it checks whether the modCount variable is the expectedModCount value and returns the traversal if so; Otherwise, an exception is thrown and the traversal is terminated. The JDK source code looks something like this:

The common wrong answer is that the fast mechanism is simply a sign that the HashMap thread is unsafe. And it’s a big mistake to assume that thread-safe collections like Hashtables and vectors don’t suffer from rapid failures of concurrent modification. If you open the JDK source code, you will find that Hashtable will also throw this exception during iteration, which may cause rapid failure.

The underlying implementation of HashMap 🐴?

Array + list + red black tree -> red black tree: Add some regular (properties 4 and 5) color marks on the binary search tree (BST) to ensure balance, complexity o(logn), statistical performance is better than balanced binary tree (AVL) -> but in the case of hashCode is very discrete, the probability of the linked list turning into a red-black tree is very small. For 6/1 billion -> Storage and access principles -> Expansion mechanism (concepts: initial capacity, load factor and expansion increment; HashMap length 2 to a power; Expansion step rehashing.

A: The underlying implementation data structure of HashMap is in the form of array + linked list. Jdk8 and its later versions use array + linked list + red-black tree to solve the problem of slow query caused by too long linked list. The diagram below:

Expand knowledge:

Source code analysis -HashMap in the linked list and red-black tree conversion threshold
Interview old Enemy: Red and black Tree

What is the initial capacity of the HashMap, the loading factor, and the increment?

A: The initial capacity of HashMap is 16, the loading factor is 0.75, and the expansion increment is twice the original capacity. If the capacity of a HashMap is 16, the capacity after a capacity expansion is 32. HashMap expansion is when the number of elements (array and linked list + red-black tree) exceeds 16*0.75=12.

Why is the length of a HashMap a power of 2?

(Key positioning is via hash&length-1 -> benefit)

We insert a key-value pair into the HashMap, and realize the location of the current Key by conducting & operation between the hash value of the Key and Length-1. The power of 2 can reduce the number of conflicts (collisions) and improve the efficiency of HashMap query
If length is a power of 2, then leng-1 into binary must be 11111… In the form of binary with H, the operation efficiency will be very fast, and space is not wasted
If length is not a power of 2, for example, length is 15, then length-1 is 14, and the corresponding binary is 1110. In the and and operation, the last bit is 0, and 0001,0011,0101,1001,1011,0111,1101 can never be stored in these positions. The waste of space is considerable, and worse, the number of places the array can be used is much smaller than the array length, which further increases the chance of collisions and slows down the efficiency of the query! This will result in a waste of space.

Conclusion: In other words, the N power of 2 is helpful to reduce the probability of collision, and the space utilization is relatively large. So you can see why the first expansion went from 16 ->32? I’m not going to say 32+1=33 or anything else, right? As for the loading factor, if the setting is too small, it is not conducive to space utilization; if the setting is too large, it will lead to more collisions and reduce the query efficiency. Therefore, 0.75 is set.

How does HashMap store and fetch work?

When the put() method is called to pass the key and value for storage, the hashCode() method is first called on the key. The hashCode returned is used to find the bucket location to store the Entry object, that is, the bucket (array) where the element should be stored. A Hash collision occurs when two keys have the same hashCode value, and each bucket is followed by a linked list (plus a red-black tree in JDK8 and later) that places the newly stored key-value pairs in the header (bucket).

When the get method is called to retrieve the stored value, the corresponding bucket is first found according to the hashCode of the key, and then the corresponding value is found in the linked list and red-black tree according to the equals method.

How to expand HashMap?

The default load factor in a HashMap is 0.75, which means that the load factor starts to expand when the number of elements in the Map (including arrays, linked lists, and red-black trees) exceeds 16*0.75=12. An array of buckets twice the size of the original HashMap is created to resize the map and place the original objects into the new bucket array. This process is called rehashing because it calls the hash method to find the new bucket location.

Of course, the above expansion mechanism is relatively inefficient. So, our great JDK developers have made a capacity expansion efficiency optimization in version 1.8. Since it is a power of 2, an element either stays where it was or moves to the current position +2 to the power of N (that is, oldIndex+OldCap).

How do you do that?

In plain English, it is determined by whether the new bit position is 0 or 1.
0 is the original position and 1 is oldIndex+OldCap position.

This design is indeed very clever, not only saves the time to recalculate the hash value, but also because the resize process can be considered random whether the new 1bit is 0 or 1, so the resize process evenly distributes the previously conflicting nodes to the new bucket. This section is JDK1.8 added optimization points, interested students can go to see the source code.

However, it is important to note that HashMap expansion can cause an infinite loop in a multi-threaded environment.

Take a look at some of the concepts in HashMap. Transient int size: record the number of K-V pairs in the Map; LoadFactor: A load stamp that measures how full a HashMap is. Default loadFactor is 0.75f (static final float DEFAULT_LOAD_FACTOR = 0.75f); Int threshold: indicates the critical value. When the actual number of K-V exceeds threshold, the HashMap will expand the capacity. Threshold = capacity * loading factor. Capacity: If this parameter is not specified, the default capacity is 16(static final int DEFAULT_INITIAL_CAPACITY = 1 << 4). If (++size > threshold) resize(); Conclusion: If the number of K-Vs in the HashMap exceeds threshole, perform capacity expansion.

What are the methods for resolving Hash conflicts?

Zip method (the method used by HashMap)
Linear detection and hashing
Secondary detection and hash
Pseudo random detection and hash

Which classes are suitable as keys for a HashMap?

Wrapper classes such as String and Interger are good for HashMap keys because they are final classes and override the equals and hashCode methods to avoid key-value pair rewriting and improve HashMap performance. To calculate hashCode(), you need to prevent key changes, and if the key returns a different hashCode when you put it in and when you retrieve it, you cannot find the object you want from the HashMap.

ConcurrentHashMap and Hashtable?

A: ConcurrentHashMap combines the best of both HashMap and Hashtable. HashMap does not consider synchronization, Hashtable does. But the Hashtable locks the entire structure each time a synchronization is executed.

ConcurrentHashMap Locks the hash table into 16 buckets (the default value). Common operations such as GET, PUT, and remove lock only the buckets that are currently used.

Implementation of ConcurrentHashMap

This class contains two static inner classes, MapEntry, which encapsulates the key-value pairs of the mapping table, and Segment, which acts as a lock.

A Segment is a ReentrantLock. Each Segment guards an element in a HashEntry array. When modifying a HashEntry array, you must first obtain the corresponding Segment lock.

In practical development, we can use HashMap in single-threaded environments, ConcurrentHashMap in multi-threaded environments, and Hashtable is no longer recommended (i.e., Hashtable only exists in interview questions).

What are the characteristics of TreeMap?

A: The underlying TreeMap implementation uses a red-black tree, and the key-value pairs stored in TreeMap are sorted by key.

If the Key is stored as a string, it will be sorted in the dictionary default order
If a custom reference type, such as User, is passed in, the object must implement the Comparable interface and override its compareTo method; Or when creating a TreeMap, we must specify the comparator to use. As follows:

// When defining the class, specify a comparison rule
class User implements Comparable{
    @Override
    public int compareTo(Object o) {
        // Define the comparison rule here
        return 0; }}public static void main(String[] args) {
    // Method 2: When creating a TreeMap, specify a comparison rule
    new TreeMap<User, Integer>(new Comparator<User>() {
        @Override
        public int compare(User o1, User o2) {
            // Define the comparison rule here
            return 0; }}); }Copy the code

What is the difference between the Comparable interface and the Comparator interface?

Comparable the suffix able of the Comparable interface means Comparable; When you need to redefine the comparison rules, you must change the source code, that is, the compareTo method in the User class.
The Comparator interface does not need to modify the source code. It simply repasses a Comparator with the specified rule when creating a TreeMap.

5. The way that Java’s Comparator is written in ascending and descending order

@Override
public int compare(CommentVo o1, CommentVo o2) {
           return o1.getTime().compareTo(o2.getTime());
}
Copy the code

Return -1 (or negative), indicating that there is no need to swap o1 and O2, o1 comes before O2, asC
Returns 1 (or a positive number), indicating that the positions of O1 and O2 need to be swapped, o1 after O2, desc

What are the differences between ArrayList and LinkedList?

ArrayList uses a dynamic array implementation underneath, which is essentially a dynamic array
LinkedList uses a bidirectional LinkedList implementation underneath and can be used as a stack, queue, or double-ended queue
ArrayList is more efficient at random access than LinkedList
LinkedList is more efficient than ArrayList in adding and deleting nodes
The ArrayList must reserve certain space. If the space is insufficient, the ArrayList will be expanded
The cost of LinkedList is that it must store information about the node as well as pointer information about the node

What are the differences between HashSet and TreeSet?

The underlying implementation of a HashSet is a Hash table (it’s still a HashMap, but only with keys).

Element uniqueness principle: Determine whether elements’ hashCode values are the same. If they are the same, then we’re going to check whether or not the equals method of the element is true.

The underlying TreeSet is implemented using red-black trees

Ensure that elements are unique through the Comparable or Comparator interface.

How about LinkedHashMap and LinkedHashSet?

A: LinkedHashMap can record the insertion order and access order of elements as follows:

The Entry inside LinkedHashMap is inherited from HashMap.Node. Both classes implement map. Entry
,v>
The Entry of the LinkedHashMap has not only value, next, but also before and after attributes, thus ensuring the insertion order of each element through a two-way linked list
Public LinkedHashMap(int initialCapacity,float loadFactor, Boolean accessOrder), accessOrder is passed true to implement the LRU cache algorithm (accessOrder)
LinkedHashSet is implemented using LinkedHashMap. The relationship between the two is similar to that between HashMap and HashSet.

Extension: What is the LRU algorithm? How does LinkedHashMap implement the LRU algorithm?

The LRU (Least recently used) algorithm filters out data based on historical access records. Its core idea is that “if data has been accessed recently, the probability of future access is higher!”

Since LinkedHashMap can record the access order of elements in the Map, LRU algorithm can be easily implemented. Simply pass the constructor’s accessOrder to true and override the removeEldestEntry method. The implementation is as follows:

private static int MAX_ENTRIES = 5; @test public void LRUTest() {Map<String, String> Map = new LinkedHashMap<String, String>(MAX_ENTRIES, 0.75f, true) { /** * this override will allow the map to grow up to 100 * entries and then delete the eldest entry each time a new entry is * added, maintaining a steady state of 100 entries. */ @Override protected boolean removeEldestEntry(Map.Entry<String, String> eldest) { return size() > MAX_ENTRIES; }}; map.put("1", "1"); map.put("2", "2"); map.put("3", "3"); map.put("4", "4"); map.put("5", "5"); System.out.println(map.toString()); map.put("6", "6"); System.out.println(map.toString()); map.get("3"); System.out.println(map.toString()); map.put("7", "7"); System.out.println(map.toString()); map.get("3"); System.out.println(map.toString()); map.remove("3"); System.out.println(map.toString()); }Copy the code

What is Iterator?

Iterator is simply an Iterator that is used when iterating through collections. Demo implementation is as follows:

ArrayList<String> list = new ArrayList<>(); list.add("zhangsan"); list.add("lisi"); list.add("yangwenqiang"); Iterator<String> Iterator = list.iterator(); while(iterator.hasNext()){ System.out.println(iterator.next()); }Copy the code

What is the relationship between Collections and Collections?

Collection is a top-level Collection interface whose subinterfaces include List and Set.
Collections is a collection utility class that can manipulate Collections, such as sorting, binary lookup, copying Collections, finding Max and min values, and so on.

All in all: the “S” are mostly tools.

Conversion between array and List?

The conversion of array and set Lis is a very common operation in our daily development, mainly through Arrays. AsList and List. ToArray methods. The demo is as follows:

Public class ConverTest {public static void main(String[] args) {// list ConverTest to ArrayList<String> list = new ArrayList<>(); list.add("zhangsan"); list.add("lisi"); list.add("yangwenqiang"); Object[] arr = list.toArray(); for (int i = 0; i < arr.length; i++) { System.out.println(arr[i]); } System.out.println("---------------"); String[] arr2 = {"niuke", "alibaba"}; List<String> asList = Arrays.asList(arr2); for (int i = 0; i < asList.size(); i++) { System.out.println(asList.get(i)); }}}Copy the code

Array to collection List: AsList can’t use add/remove to modify the set methods after conversion, because this method actually returns an internal private class ArrayList of Arrays, which is derived from Abstractlist and doesn’t implement these operation methods. Calls will be thrown directly UnsupportOperationException anomalies. This transformation represents an adapter pattern, just a transformation interface, which is essentially an array.

The list.toarray method takes care of converting collections to arrays. It is best to pass in an array of the same type, list.size(). Because if the array space allocated by the input parameter is not large enough, the toArray method internally reallocates the memory space and returns the new array address. If the number of elements in the array is greater than required, the array elements subscript list.size() and those following are set to NULL, and the rest of the array elements remain unchanged. Therefore, it is recommended that the size of the parameter group in this method be consistent with the number of set elements.

If the toArray method is used directly, the return value of this method must be the Object[] class. Forcing an array of another type will result in a ClassCastException.

ToArray (T[] a) and toArray(T[] a)