Hello, today I’m going to share with you the Java set interview questions. Please take out your notebook and write them down

1. What are the common sets?

Java Collection classes are derived primarily from the two root interfaces Collection and Map. Collection derives three subinterfaces: List, Set, and Queue (newly added queues in Java5). Therefore, Java collections can be roughly divided into List, Set, Queue, and Map interface systems.

Note: Collection is an interface, Collections is a utility class, and Map is not a subinterface of Collection.

Java collection framework diagram is as follows:

In the figure, List represents an ordered repeatable collection that can be accessed directly by the index of the element; Set stands for unordered non-repeatable collections that can only be accessed on the basis of the elements themselves; A Queue is a collection of queues.

A Map represents a collection of key-value pairs that can be accessed based on the key of an element.

In the figure above, the light green background covers the common implementation classes in the collection system, namely ArrayList, LinkedList, ArrayQueue, HashSet, TreeSet, HashMap, TreeMap, etc.

2. What are thread-safe collections? What about thread insecurity?

Thread-safe:

  • Hashtable: More thread-safe than HashMap.
  • ConcurrentHashMap: is an efficient but thread-safe collection.
  • Vector: More synchronization mechanisms than Arraylist.
  • Stack: the Stack, also thread-safe, inherits from the Vector.

Linearly unsafe:

  • HashMap
  • Arraylist
  • LinkedList
  • HashSet
  • TreeSet
  • TreeMap

3. Similarities and differences between Arraylist and LinkedList?

  • Threadsafe: ArrayList and LinkedList are not synchronized, that is, not thread-safe;

  • Underlying data structures: Arraylist uses Object arrays at the bottom; The underlying LinkedList uses a two-way circular LinkedList data structure;

  • Insert and delete are affected by element location: ArrayLists are stored in arrays, so the time complexity of insert and delete elements is affected by element location. For example, when the add(E E) method is executed, the ArrayList defaults to append the specified element to the end of the list, which is O(1) time. But if you want to insert and delete elements at position I (add(int index, E element)), the time is O(n-i). Because all the (n-i) elements after the ith and ith elements in the set are moved backwards/forwards by one bit. LinkedList is stored in a LinkedList, so the time complexity of inserting and deleting elements is not affected by the location of elements, and is approximately O (1) while the array is approximately O (n).

  • Fast random access or not: LinkedList does not support efficient random element access, whereas ArrayList implements the RandmoAccess interface, so it has random access. Fast random access is a quick way to get an element object by its ordinal number (corresponding to the get(int index) method).

  • Memory footprint: ArrayList’s space waste is mainly due to the amount of space reserved at the end of the list, whereas LinkedList’s space cost is due to the fact that each element consumes more space than ArrayList (due to the storage of direct descendants and direct precursors as well as data).

4. Is ArrayList different from Vector?

  • Vector is thread-safe, ArrayList is not thread-safe. Vector adds the synchronized keyword in front of key methods to ensure thread safety. If multiple threads are accessing the collection, it is best to use Vector because we do not need to worry about and write thread-safe code ourselves.
  • ArrayList is 0.5 times larger when the underlying array is insufficient, and Vector is twice as large, so ArrayList helps save memory.

5. How does ArrayList scale up?

The essence of ArrayList expansion is to calculate the size of the new array, instantiate it, and copy the contents of the original array into the new array. By default, the new capacity is 1.5 times the original capacity.

Take JDK1.8 as an example:

public boolean add(E e) {

// Determine whether e can be contained, if so, add it directly to the end; If not, add e to end ensureCapacityInternal(size + 1); // Increments modCount!! // add e to the end of the array elementData[size++] = e; return true; }Copy the code

// Each time an element is added (), arrayList needs to make a judgment about the list’s capacity. The ensureCapacityInternal() method ensures that the array maintained by the current ArrayList has the ability to store new elements, which are then processed and stored at the end of the array elementData

private void ensureCapacityInternal(int minCapacity) {

  ensureExplicitCapacity(calculateCapacity(elementData, minCapacity));
Copy the code

}

private static int calculateCapacity(Object[] elementData, int minCapacity) {

If (elementData == DEFAULTCAPACITY_EMPTY_ELEMENTDATA) {return if (elementData == DEFAULTCAPACITY_EMPTY_ELEMENTDATA Math.max(DEFAULT_CAPACITY, minCapacity); } return minCapacity; }Copy the code

private void ensureExplicitCapacity(int minCapacity) {

modCount++; // If the existing storage capacity of ArrayList meets the minimum storage requirements, add directly adds elements. If the minimum required storage capacity >ArrayList existing storage capacity, this indicates that the ArrayList storage capacity is insufficient, so need to call grow(); If (minCapacity -elementdata. length > 0) grow(minCapacity); }Copy the code

private void grow(int minCapacity) {

Int oldCapacity = elementData.length; Int newCapacity = oldCapacity + (oldCapacity >> 1); // Check whether the capacity is sufficient. If (newcapacity-mincapacity < 0) newCapacity = minCapacity; If (newCapacity -max_array_size > 0) newCapacity = hugeCapacity(minCapacity); if (newCapacity -max_array_size > 0) ElementData = array. copyOf(elementData, elementData, newCapacity); }Copy the code

6. What’s the difference between Array and ArrayList? When should an Array be used instead of an ArrayList?

  • Array can contain both primitive and object types, and ArrayList can contain only object types.
  • The Array size is fixed, whereas the size of an ArrayList changes dynamically.
  • ArrayList provides more methods and features, such as addAll(), removeAll(), iterator(), and so on.

7. What is the underlying data structure of HashMap?

There are differences between JDK1.7 and JDK1.8:

In JDK1.7, it consists of “arrays + linked lists”, where arrays are the body of a HashMap, and linked lists are primarily used to resolve hash conflicts.

In JDK1.8, it consists of “array + linked list + red-black tree”. When the list is too long, the performance of the HashMap is severely affected. Red-black tree search time complexity is O(logn), while the list is O(n) bad. For this reason, JDK1.8 further optimizes the data structure by introducing red-black trees. Linked lists and red-black trees convert when certain conditions are reached:

  • When the linked list exceeds 8 and the total number of data exceeds 64, the tree becomes red-black.
  • If the length of the current array is less than 64, array expansion will be selected rather than conversion to a red-black tree to reduce search time.

8. What are the methods to resolve hash conflicts? What kind of HashMap is used?

Methods to resolve Hash conflicts include open addressing, rehash, chain address (zipper), and establishment of a public overflow area. The chain-address approach is used in HashMap.

  • Open addressing is also called rehash. The basic idea is that if p=H(key) conflicts, hash again based on P, p1=H(p). If P1 conflicts again, hash again based on P1, and so on until a non-conflicting hash address PI is found. Therefore, open addressing requires a hash table whose length is greater than or equal to the number of elements to be stored, and because of the re-hash, only the deleted nodes can be marked, not actually deleted.

  • Rehash (double hash, multiple hash), provides multiple different hash functions, when R1=H1(key1) conflicts, then calculate R2=H2(key1) until there are no conflicts. This does not result in clustering, but increases the calculation time.

  • Linked address method (zipper method), the hash value of the same elements to form a single linked list of synonyms, and the single linked list of the head pointer stored in the hash table I cell, search, insert and delete mainly in the synonym list. Linked lists are good for frequent inserts and deletions.

  • Establish a public overflow area, divide the hash table into public table and overflow table, when overflow occurs, put all overflow data into the overflow area.

9. Why not just use red-black trees when resolving hash conflicts? Instead of using a linked list and then switching to a red-black tree, right?

Because a red-black tree needs to do left rotation, right rotation, color change to maintain balance, whereas a single-linked list doesn’t. When the number of elements is less than 8, the linked list structure can ensure the query performance. When the number of elements is larger than 8, the search time complexity of red-black tree is O(logn), while the linked list is O(n). At this time, red-black tree is needed to speed up the query, but the efficiency of new nodes is slower.

Therefore, if you start with a red-black tree structure, with too few elements and slow addition, you will definitely waste performance.

10. What is the default HashMap load factor? Why 0.75, not 0.6 or 0.8?

Before answering this question, let’s take a look at the default constructor for HashMap:

int threshold; // Hold the maximum number of key-value pairs

final float loadFactor; // load factor int modCount; int size;Copy the code

The initial length of Node[] table is length(the default is 16), Load factor is the Load factor (the default is 0.75), and threshold is the maximum number of key-value pairs a HashMap can hold. Threshold = length * Load factor. That is, the larger the load factor, the greater the number of key-value pairs it can hold after the array has been defined.

The default loadFactor is 0.75. 0.75 is a balanced choice for space and time efficiency and should not be changed except in special time and space cases:

  • If you have a lot of memory and are time efficient, you can lower the value of the Load factor.
  • Conversely, if memory is tight and time efficiency is not high, you can increase the value of the loadFactor, which can be greater than 1.

Let’s go back to the author’s comments in the source code (JDK1.7) :

As a general rule, the default load factor (.75) offers a good tradeoff between time and space costs. Higher values decrease the space overhead but increase the lookup cost (reflected in most of the operations of the HashMap class, including get and put). The expected number of entries in the map and its load factor should be taken into account when setting its initial capacity, so as to minimize the number of rehash operations. If the initial capacity is greater than the maximum number of entries divided by the load factor, no rehash operations will ever occur.

As a general rule, the default load factor (0.75) provides a good trade-off between time and space costs. Higher values reduce the space overhead but increase the cost of lookups (shown in most HashMap class operations, including GET and PUT). When setting the initial size, you should consider the expected number of entries in the map and its load factor, and minimize the number of rehash operations. If the initial capacity is greater than the maximum number of entries divided by the load factor, the rehash operation will not occur.

11. How to calculate the storage index of the key in the HashMap?

First of all, according to the key value to calculate the value of hashcode, then according to the hashcode calculate the hash value, at last, through the hash & (length – 1) to calculate the storage location. Look at the implementation of the source code:

The Hash algorithm here essentially consists of three steps: taking the hashCode value of the key, computing the Hash value based on the hashCode, and computing the subscript by modulo. The difference between JDK1.7 and 1.8 is in the second step. Take JDK1.8 as an example. N is the length of the table.

12. Put method flow of HashMap?

Taking JDK1.8 as an example, the process is as follows:

  1. First, compute the hash value based on the key value and find the index of the element stored in the array.
  2. If the array is empty, call resize to initialize it;
  3. If there’s no hash conflict, just put it in the corresponding array index;
  4. If a conflict occurs and the key already exists, override the value;
  5. If the node is found to be a red-black tree after the conflict, the node is attached to the tree.
  6. If there is a linked list after the conflict, check whether the list is larger than 8. If the list is larger than 8 and the array capacity is smaller than 64, expand the list. If the list node is larger than 8 and the array is larger than 64, the structure is converted to a red-black tree; Otherwise, the list inserts key-value pairs, overwriting the value if the key exists.

13. How to expand HashMap?

The HashMap expands when it exceeds the capacity defined by the load factor. Java arrays cannot be automatically expanded by doubling the size of the HashMap and putting the original objects into the new array.

So what are the steps of expansion? Let’s look at the source code.

Take a look at the JDK1.7 code first

Void resize(int newCapacity) {// newCapacity is passed in

Entry[] oldTable = table; // Reference the Entry array before capacity expansion int oldCapacity = oldtable.length; If (oldCapacity == MAXIMUM_CAPACITY) {// If the array size before capacity expansion has reached the maximum (2^30) threshold = integer.max_value; // Change the threshold to the maximum value of int (2^31-1), so that there is no expansion in the future return; } Entry[] newTable = new Entry[newCapacity]; // Initialize a new Entry array transfer(newTable); / /!!!!! Table = newTable; // The table attribute of the HashMap references the new Entry array threshold = (int)(newCapacity * loadFactor); // Change the threshold}Copy the code

The transfer() method copies the elements of the original Entry array to the new one by replacing the existing one with a larger one.

void transfer(Entry[] newTable) {

Entry[] src = table; // SRC references the old Entry array int newCapacity = newtable.length; for (int j = 0; j < src.length; J ++) {// Iterate over the old Entry array Entry<K,V> e = SRC [j]; // Get each element of the old Entry array if (e! = null) { src[j] = null; // Release the object reference of the old Entry array (after the for loop, the old Entry array no longer references any objects) do {Entry<K,V> next = e.next; int i = indexFor(e.hash, newCapacity); / /!!!!! Recalculate the position of each element in the array e.ext = newTable[I]; // mark [1] newTable[I] = e; // Place the element on the array e = next; } while (e! = null); }}}Copy the code

The reference to newTable[I] is assigned to e.ext, which uses single-linked header insertion, where new elements are always placed at the head of the list. Elements placed on an index will end up at the end of the Entry chain if a hash conflict occurs.

JDK1.8 has made two optimizations

  1. After resize, the position of the element is the original position, or the original position +oldCap (the length of the original hash table). There is no need to recalculate the hash as in JDK1.7, just to see if the new bit of the hash value is 1 or 0. 0 is the same index, or 1 is oldCap. This design is very clever and saves the time of recalculating the hash value. As shown in the following figure, n is the length of the table. Figure (a) shows the example of determining the index position with key1 and KEY2 keys before capacity expansion. Figure (b) shows the example of determining the index position with key1 and KEY2 keys after capacity expansion. Where hash1 is the hash and high order operation result corresponding to key1.

After recalculating the hash element, the mask range of n-1 is increased by 1 bit(red) since n is doubled, so the new index will look like this:

2. In JDK1.7 rehash, if the new list is migrated from the old list to the new list, the list elements will be inverted if the new list has the same array index position. JDK1.8 will not be upside down, using tail insertion.

14. What is commonly used as a HashMap key?

Immutable classes such as Integer and String are commonly used as HashMap keys, and String is most commonly used.

  • Because the string is immutable, hashCode is cached when it is created and does not need to be recalculated. This is why keys in a HashMap tend to use strings.
  • Since the equals() and hashCode() methods are used when retrieving objects, it is important that the key object overrides them properly. These classes have overridden hashCode() and equals() methods quite formally.

15. Why is HashMap thread unsafe?

  • Infinite loop under multithreading. The HashMap in JDK1.7 uses header insertion to insert elements. In a multi-threaded environment, expansion can lead to an infinite loop of circular lists. For this reason, JDK1.8 uses tail inserts to keep the list elements in their original order during expansion without the problem of circular lists.
  • Multithreaded PUT can cause elements to be lost. If multiple threads perform the put operation at the same time, if the computed index positions are the same, the previous key will be overwritten by the next key, resulting in element loss. This problem exists in BOTH JDK 1.7 and JDK 1.8.
  • If PUT and GET are concurrent, GET may be null. Thread 1 performs put, rehash because the number of elements exceeds threshold, and thread 2 performs GET, which may cause this problem. This problem exists in BOTH JDK 1.7 and JDK 1.8.

16. What is the implementation principle of ConcurrentHashMap?

ConcurrentHashMap is implemented differently in JDK1.7 and JDK1.8.

Take a look at JDK1.7

ConcurrentHashMap in JDK1.7 consists of a Segment array structure and a HashEntry array structure. ConcurrentHashMap divides hash buckets into small arrays. Each small array consists of n Hashentries.

Segment inherits a ReentrantLock, so it acts as a ReentrantLock. HashEntry is used to store key-value pair data.

First of all, the data is divided into a section of storage, and then each section of data is matched with a lock. When a thread uses the lock to access one section of data, the data of other sections can also be accessed by other threads, which can achieve real concurrent access.

Take a look at JDK1.8

In terms of data structure, ConcurrentHashMap in JDK1.8 selects the same array + linked list + red-black tree structure as HashMap. In the implementation of lock, the original Segment lock is abandoned, CAS + synchronized to achieve lower granularity lock.

The lock level is controlled at the level of finer hash bucket elements, that is, only the head of the list (the root of the red-black tree) needs to be locked without affecting the read and write of other hash bucket elements, which greatly improves the concurrency.

17. What is the execution logic of the put method of ConcurrentHashMap?

Take the JDK1.7

First, an attempt is made to acquire the lock. If it fails, the lock is acquired using spin. If the number of spin retries exceeds 64, block to acquire the lock instead.

After obtaining the lock:

  1. Locate the table in the current Segment to a HashEntry using the hashcode of the key.
  2. The HashEntry is iterated over, and if it is not empty it determines whether the passed key is equal to the currently iterated key, and if it is, the old value is overwritten.
  3. If it is not empty, create a HashEntry and add it to the Segment. At the same time, it determines whether to expand the Segment.
  4. Release the Segment lock.

Look at JDK1.8

It can be roughly divided into the following steps:

  1. Calculates the hash value based on the key.
  2. Check whether initialization is required.
  3. Locate the Node, obtain the first Node F, check the first Node F: If it is null, add the first Node F through cas. If f.hash = MOVED = -1, other threads are expanding the capacity. If not, synchronized locks the F node, determines whether it is a linked list or a red-black tree, and traverses and inserts.
  4. When the list length reaches 8, the array is expanded or the list is converted to a red-black tree.

18. Should the get method of ConcurrentHashMap be locked? Why?

The get method does not need to be locked. Because the element val and pointer next are volatile, thread A is visible to thread B when changing val or adding A Node in A multithreaded environment.

This is it better than other concurrent Collections such as Hashtable, with Collections. SynchronizedMap () packaging HashMap security is one of the reasons for high efficiency.

static class Node<K,V> implements Map.Entry<K,V> {

final int hash; final K key; // Volatile V val; volatile Node<K,V> next;Copy the code

}

19. Does the get method not need to lock have anything to do with volatile hash buckets?

It doesn’t matter. The hash bucket table is volatile to ensure that the table is visible when the array is expanded.

static final class Segment<K,V> extends ReentrantLock implements Serializable {

// Transient volatile HashEntry<K,V>[] table;Copy the code

20. Why does ConcurrentHashMap not support key or value null?

ConcurrentHashMap is used by multiple threads. If map.get(key) is null, we cannot determine whether the mapping value is null or whether the corresponding key is not found. That’s where the ambiguity comes in.

A single-threaded HashMap can use containsKey to determine whether it contains the null.

We use contradiction to reason:

Concurrenthashmap.get (key); concurrenthashMap.get (key); concurrenthashMap.get (key); concurrenthashMap.get (key) We do not know if this null is unmapped or if the value stored is NULL.

Assuming at this point, the true case of returning null is that the corresponding key was not found. So, we can verify our hypothesis with concurrenthashmap.containsKey (key), and we expect to return false.

But after calling concurrenthashmap. get(key) and before containsKey, thread B performs concurrenthashMap. put(key, null). So if we call containsKey it’s going to return true, and that’s not going to be true, and that’s going to be ambiguous.

The key in ConcurrentHashMap cannot also be null. If the interviewer is not satisfied, reply that the author Doug does not like NULL, so he did not allow null keys in the design. I don’t know what the interviewer is looking for in an answer to this interview question

Well, this is the end of today’s article, I hope to help you confused in front of the screen