preface

I’m sure the container scenario is very variable, so do you know how to choose the optimal data structure in terms of performance? Perhaps for developers with years of experience, this is fine. But for those who are just starting out a year or two ago or just know how to use it but don’t know what’s best for you right now, this article will help you out.

Interviewer: Tell me about some performance improvements you’ve made. The style of the question and answer mode to answer, I believe that you are helpful.

If you’re looking for a job, you need an Android Advanced Development Interview guide

List

1. Have you ever used ArrayList? What is the underlying implementation?

Programmer:

Yes, its underlying array based data structure, the default first initialization length is 10, since add, PUT, size does not handle thread safety, so it is not thread safe.

Why don’t I just hand-animate the whole structure. As shown in the figure below.


Illustration:

  • Index: Index Index of an ArrayList
  • ElementData: Data corresponding to the index subscript of an ArrayList
  • Size: Size of the ArrayList

Interviewer:

Well, tell me more about the add process and the automatic expansion mechanism.

Programmer:

Ok, so LET me talk about the Add process first, and then the automatic expansion mechanism.

  • The add process:


    When we instantiate an ArrayList, we can specify its size, or we can instantiate an existing array with an empty argument, as shown in the code below.

      //1. Specify the size of the underlying initialization array

     public ArrayList(int initialCapacity) {

        this.elementData = new Object[initialCapacity];

    .// omit the code

      }



      //2. No argument constructor, default is empty array

      public ArrayList(a) {

        this.elementData = DEFAULTCAPACITY_EMPTY_ELEMENTDATA;

      }



      //3. Specify the initial existing data

      public ArrayList(Collection<? extends E> c) {

    .// omit the code

       elementData = Arrays.copyOf(elementData, size, Object[].class);

      }

    Copy the code

    There are several ways to initialize, and then add.

    2. When the add(E) function is called for the first time to store data, obtain the minimum capacity that can be expanded minCapacity = 10

    3. If the desired minimum size is greater than the current array size (default is empty), expand

    DEFAULT_CAPACITY=10 is set to the newCapacity of the current newCapacity.

    Arraycopy instantiates an empty array with a default size of 10.

    6. If we add[11] when size = 10, then we will check (size + 1) -size > 0 based on the formula. If it is true, the second expansion will be carried out.

    7, the second expansion mechanism, there is a calculation formula, in fact, the first time also can be ignored, because it is 0. In plain English, the current array size + (current array size / 2) = 10 + 5, i.e. the second expansion size is 15.

    8. Add data to elementData[size++] = e.

    This is a detailed process of adding and expanding. Here you can also use a diagram to illustrate (I won’t post the detailed code, but mainly explain the idea of answering the interviewer), as follows:


    Interviewer:

    Well, the mechanism of adding and expanding is quite detailed.

    Let’s talk about deleting an ArrayList.

  • Delete the mechanism

    ArrayList provides remove(index) and remove(obj) methods for removing objects. I use these two apis most frequently in my projects.

    remove(index):

    // Delete according to array subscript

      public E remove(int index) {// index = 5,size = 10

      rangeCheck(index);



      modCount++;

        E oldValue = elementData(index);//[1,2,3,4,5,6,7,8,9,10



        int numMoved = size - index - 1;/ / 4

      if (numMoved > 0)

          System.arraycopy(elementData, index+1, elementData, index,

                         numMoved);// move one bit directly from [7,8,9,10]

        elementData[--size] = null// clear to let GC do its work



        return oldValue;

    }

    Copy the code

    This IS an API that just means delete by index

    1. Check whether the index is out of bounds

    2. Retrieve the deleted data according to the index

    3. Set index + 1 as the starting point of the move, and size-index-1 as the length of the array to be moved

    4, Use the System. Arraycopy array to move the following data forward one bit and leave the last space empty.

    5. Finally, the deleted node data is returned

    remove(obj):

      // Delete according to the value

      public boolean remove(Object obj) {

        // If the value is empty, find the delete whose first value is empty

        if (o == null) {

          for (int index = 0; index < size; index++)

            if (elementData[index] == null) {

            fastRemove(index);

              return true;

          }

        } else {

        // The value is not empty, find the first delete equal to the input parameter

          for (int index = 0; index < size; index++)

            // Check the value of equals

            if (o.equals(elementData[index])) {

              fastRemove(index);

              return true;

            }

        }

        return false;

      }

    Copy the code

    This API is used to delete matched objects from an ArrayList

    ElementData [index] == null; remove all elementData[index] == null; remove all elementData[index] == null;

    2. If condition 1 is not true, traverse and judge whether the address values of the two objects point to the same block of memory. If yes, delete the two objects, the principle of deletion is the same as remove(index).

    Here I ask the interviewer if my description is clear. If the interviewer does not understand, we can draw a picture on the spot, as shown below:


    Interviewer:

    Well, yes, here’s how deletion works!

    Programmer:

    However, there are a few caveats to using ArrayList in development, such as:

    1. You cannot use for to delete an object, because every time an object is deleted, the underlying index relationship will change, resulting in deletion exceptions. The solution should be iterators.

    2, multi-threaded concurrent also cannot use ArrayList, should use CopyOnWriteArrayList or Collections. SynchronizedList (list) to solve the problem of thread safety.

    3. There are performance issues, too many additions, too few lookups, instead choose the LinkedList data structure. Avoid frequent copying of arrays

    Now that you’ve covered the basics of ArrayList, you’re not going to ask about every API, you’re going to ask about common ones.

2. Do you use LinkedList? What is the underlying implementation?

Programmer:

Yes, its underlying data structure is a two-way linked list, so let me draw its structure diagram. As follows:


Illustration:

  • Each Node in the list is called Node, and Node has a prev attribute, which represents the position of the previous Node, and a next attribute, which represents the position of the next Node;
  • First is the head node of a bidirectional list, and its preceding node is NULL.
  • Last is the last node of the bidirectional list, and its next node is NULL;
  • When there is no data in the linked list, first and last are the same node, both pointing to null;

Interviewer:

Well, the basic architecture is kind of like that, so you can talk about the basic add, get,remove principle.

Programmer:

Ok, so I’m going to go through each of them in turn.

add:

    // Add data

    public boolean add(E e) {

        linkLast(e);

        return true;

    }

    // Append nodes from the tail

    void linkLast(E e) {

        // store the last node data temporarily

        final Node<E> l = last;

        // Create a new node. L is the previous node, e is the value of the current node, and the latter node is null

        final Node<E> newNode = new Node<>(l, e, null);

        // The new node is placed at the end

        last = newNode;

        // If the list is empty, the head and tail are the same node, both newly created nodes

        if (l == null)

            first = newNode;

            // Otherwise, the next node of the front and tail nodes points to the current tail node.

        else

            l.next = newNode;

        // Size and version changes

        size++;

        modCount++;

    }

Copy the code

The add(E) API is used to add data based on the end node. In general, there are several steps:

1, get the Last terminator and assign it to a temporary variable L.

NewNode = Node[L,item,null];

3. Assign the generated newNode to the last endpoint. This step is equivalent to updating last data.

If l of the temporary variable is null, it indicates that there is no data in the linked list. Then newNode is assigned to the first header node. Instead, newNode is assigned to the L.ast node.

5. Finally update the linked list size, and modify modCount

The above steps can be illustrated by a GIF, as follows:


The process of adding is like this, or relatively simple, is the operation of moving node data.

get:

    public E get(int index) {

        checkElementIndex(index);

        return node(index).item;

    }

    // Query nodes according to the search result

    Node<E> node(int index) {

        // index is in the first half of the queue

        if (index < (size >> 1)) {

            Node<E> x = first;

            // until the for loop reaches the previous node of the index

            for (int i = 0; i < index; i++)

                x = x.next;

            return x;

        } else {// index is at the bottom of the queue

            Node<E> x = last;

            // until the for loop reaches the node after index

            for (int i = size - 1; i > index; i--)

                x = x.prev;

            return x;

        }

    }

Copy the code

The get(index) API is the one I usually use to retrieve data. Generally, it is divided into two parts to retrieve data. The steps are as follows:

1. Check if index is out of bounds.

Index > < (size >> 1);

3. If the upper part of the index is directly traversed from the first header, then the item data is retrieved. Instead, start at the end of last and iterate through the index one time, and get the item data.

Generally speaking, the steps of data acquisition are only these 3 steps, because it is a linked list structure without index correspondence, data can only be traversed one by one. Therefore, ArrayList can also be used to compensate for poor performance if data fetching operations are frequent.

remove:

To delete data, I usually use remove() or remove(index). The difference between them is that one removes a node from scratch, and the other is to specify index index to remove a node, so I’m going to talk about the principle of index deletion.

1. Check whether the index is out of bounds.

2. The principle of data on the search node is the same as that of the second dot of GET, which is segmented search.

3, get the current node to be deleted, and then handle the current node prev, item, next node point to the location, and empty the current item pointer to avoid memory leakage.

Delete is also a three step process, and the performance of LinkedList delete is higher than that of ArrayList delete API.

So, if you have a LinkedList data structure that is frequently added and deleted.

3. How do you choose ArrayList and LinkedList in your work?

Programmer:

If there’s a project that needs to find matches quickly but doesn’t add or delete frequently I use an ArrayList, but if there’s a lot of new and deleted queries and there’s a lot of new and deleted queries I use a LinkedList. (PS: Combine their principles and answer why)

4. What should ArrayList do when using multiple threads?

Programmer:

There are usually two ways to address thread-safety issues when using lists in multiple threads. The first also is a kind of direct use of the Collections. The simplest synchronizedList (list), but its performance is not good, because it is the implementation of the principle of equivalent to delegate pattern, to another class, Synchronized is added to each function internally. Another implementation is java.util.concurrent##CopyOnWriteArrayList.

Interviewer:

Have you ever used CopyOnWriteArrayList? How does it achieve thread-safety?

Programmer:

Yes, its basic principles are the same as ArrayList, and its underlying implementation is based on arrays. Its basic features are summarized as follows:

1, thread safety, multi-threaded environment can be used directly, without locking;

2. Lock + array copy + volatile ensures thread safety;

3, each array operation, will be a copy of the array out of the new array operation, operation after the success of the assignment back.

Thread safety from the source code of add, remove implementation to describe how they ensure thread safety?

Here must be clear thinking, it is best to take the initiative to find common API to explain how to achieve internal thread safety, do not directly give the answer, must first analyze the source code, and finally give a summary.

Interviewer:

B: Yes, please tell them separately.

Programmer:

1. How does the underlying array ensure data security?

Before I look at how the underlying array is kept safe, I’ll say a few words about the Java memory model, because the thread-safety of arrays involves the volatile keyword.

Volatile means visible and is often used to modify a shared variable. It means that changes in the value of a shared variable are notified to other threads so that they know that the value of the shared variable has been changed.

On multi-core cpus, for efficiency purposes, threads interact directly with the CPU cache, not memory, when fetching values. The main reason is that CPU cache execution is faster. For example, if a thread wants to fetch C, it will fetch it directly from the CPU cache, and if it does not have C in the CPU cache, it will fetch it from memory, so the thread will always fetch the value from the CPU cache.

Produces a problem at this time, the CPU of the values in the cache and memory value may not be all the time synchronization, result in calculating the value of the thread may not be the latest, sharing the value of the variable may have been modified by other threads, but this changes the value of the machine memory is the CPU cache value or old, lead to calculate.

There is a mechanism by which memory actively notifies the CPU cache. The value of the current shared variable is invalid, you need to pull a new one, and the CPU cache will fetch the latest one from memory.

Volatile triggers this mechanism. Volatile variables are recognized as shared variables, and changes in memory are notified to various CPU caches to ensure that the thread retrieves the latest value from the CPU cache.

Let me draw a picture to illustrate:


CPU 1 and CPU 2 have C values in their cache. Then thread 1 changes the C values in memory and CPU 2 have C values in their cache. If the C value is volatile, CPU 2 invalidates the C value in its cache, and CPU 2 pulls the latest value from memory. Thread 2 then reads the latest value from memory.

Object [] is defined by volatile, which ensures that it is visible and safe in memory.

public class CopyOnWriteArrayList<E>

    implements List<E>, RandomAccess.Cloneable.java.io.Serializable 
{

    // Volatile keyword modifier, visible

    // array only opens get set

    private transient volatile Object[] array;

    final Object[] getArray() {

        return array;

    }

    // Update the underlying array memory address

    final void setArray(Object[] a) {

        array = a;

    }

    public CopyOnWriteArrayList(a) {

        setArray(new Object[0]);

    }

Copy the code

2. How does the add(E) API ensure data security?

According to the source code, there are several steps to ensure the security of operation Add (E).

    // Add elements to the end of the array

    public boolean add(E e) {

        final ReentrantLock lock = this.lock;

        / / lock

        lock.lock();

        try {

            // Get all the original arrays

            Object[] elements = getArray();

            int len = elements.length;

            // Copy it to a new array with a length of + 1

            Object[] newElements = Arrays.copyOf(elements, len + 1);

            // Assign to a new array. Place the new element directly at the end of the array

            newElements[len] = e;

            // Replace the original array

            setArray(newElements);

            return true;

        } finally {

            lock.unlock();

        }

    }

Copy the code
  • 1. Through reentrant mutexReentrantLockrightadd(E)lock
  • 2, through thegetArray()Method to get an array that already exists
  • 3, instantiate a length of currentsize + 1And then put getArray’s array into the new array
  • 4. Finally, store the added data into the final index of the new array
  • 5, based on the current classsetArray(newElements);To replace the array data in the cache because it is in the classvolatileSo whenever the memory address changes, it immediately notifies the other CPU’s cache to update it.
  • 6. Release reentrant mutex

So add(E) ensures data security by ReentrantLock + array copy + Update Object[] memory address + volatile.

Interviewer:

You mentioned that add(E) is thread-safe with ReentrantLock + array copy, etc. Why copy an array when mutex is thread-safe?

Programmer:

It is true that by locking add(E), only one thread can add(E) to an array at a time, which is not a problem in a multi-threaded environment with a single-core CPU, but our machines are all multi-core cpus. If we do not copy new arrays and change the memory address of the original array container, Is unable to trigger volatile visibility, and threads on other cpus cannot perceive that the array has been modified, leading to thread-safety issues on multi-core cpus.

Suppose we don’t duplicate copy, but direct modify the value on the original array, the array of memory address will not change, and the array is volatile modification, must when an array of memory address change, can timely notice to the other threads, memory address remains the same, is just an array element value changes, is unable to change the array element value is the fact that Notification to other threads.

Interviewer:

Well, you seem to have a pretty good understanding of these mechanisms. Are you talking about how Remove keeps threads safe?

2. How does remove ensure thread safety?

In fact, remove to ensure thread safety mechanism is similar to add ideas, are first lock + array copy of different policies, and finally release the lock.

Interviewer:

Add and remove methods implement copy internally, do you have any performance optimization suggestions?

Programmer:

Use addAll and removeAll instead of using add and remove in the loop. This is because the add and remove method is used in the for loop, and the array will be copied (even multiple times) for each operation. The addAll and removeAll methods are optimized at the bottom, and the entire operation will only carry out a copy of the array. Therefore, when the batch operation is more data, the performance of the batch method is more obvious.

Map

1. Tell us what you know about HashMaps

Programmer:

The underlying structure of HashMap is a storage data structure composed of array + single linked list + red-black tree. Simply speaking, when the length of the linked list is greater than or equal to 8 and the length of the array is greater than 64, the linked list will be transformed into a red-black tree. When the size of the red-black tree is less than = 6, it will be transformed into a underlying structure of the linked list. Not thread-safe.

The underlying structure of the HashMap can be explained by a diagram like this:


Illustration:

1. The leftmost table is a HashMap array structure that allows Node values to be NULL

The default capacity expansion for the first time is 16 size and threshold = size * loadFactor -> 12 = 16 * 0.75. As long as ++size > threshold, newCap = oldCap << 1 mechanism to expand capacity.

3. The subscript of an array may be a linked list, a red-black tree, or just a Node. Only when the array length is greater than 64 and the linked list length is greater than = 8, will the Node Node in the array be converted to a TreeNode Node. Only when the size of red-black tree <= 6, the single-linked list structure.

Programmer:

The underlying implementation of HashMap basically looks like this.

Interviewer:

Well, describe the stored procedure for the PUT (K key, V value) API.

Programmer:

Ok, let me describe the basic process first, and then draw a flow chart to summarize

1. Use the formula (h = key.hashcode ()) ^ (h >>> 16) to compute the hash value based on the key

2. Check whether the HashMap table array is initialized. If not, the default size is 16 and the expansion threshold is defined as size * 0.75

Select TAB [index] from TAB [index]; if TAB [index] has a hash value, 💥, Known as hash conflicts, there are two ways to resolve them in the JDK: linked lists and red-black trees.

4, When sending hash collisions, first check whether the stored key in the array is the same as the current stored key, and the memory address is the same, then overwrite values directly by default

If TAB [index] is a red black tree, add the red black tree Node. If the Node is not a red-black tree, it is placed directly under the next Node to form a single linked list.

6, if the length of the list structure >= 8, the structure of the red black tree.

7. Check the capacity expansion mechanism.

This is the whole PUT process, which can be summarized with a flowchart, as follows:


Interviewer:

You said there are two ways to resolve hash conflicts. Would you describe how a red-black tree can be added?

Programmer:

Ok, the basic process is as follows:

1. First, determine whether the new node already exists in the red-black tree by the following two methods:

1.1. If the node does not implement the Comparable interface, use equals to determine.

1.2. If the node implements the Comparable interface itself, use compareTo to determine.

2. If the newly added node is already in the red-black tree, the node is returned directly. If not, determine whether the new node is to the left or right of the current node. The left value is small and the right value is large.

3. Spin recursion steps 1 and 2, until the left or right node of the current node is empty, stop spinning, the current node is the parent node of our new node;

4. Place the new node to the left or right of the current node, and establish the parent-child node relationship with the current node.

5, color and rotate, end.

Interviewer:

Do you know why the length of the list to red black tree definition is 8?

Programmer:

This answer, which I noticed through a comment in the HashMap class, roughly describes the time complexity of a linked list query as O (n), and the time complexity of a red-black tree as O (log (n)). When the linked list data is not large, the traversal using the linked list is also faster. Only when the linked list data is large, it will be converted into a red-black tree, but the red-black tree takes up twice the space of the linked list. Considering the conversion time and space loss, we need to define the conversion boundary value.

When considering the value of design 8, we refer to the Probability function of Poisson distribution, from which we conclude that the hit probability of each length of the linked list is:

     * 0:    0.60653066

     * 1:    0.30326533

     * 2:    0.07581633

     * 3:    0.01263606

     * 4:    0.00157952

     * 5:    0.00015795

     * 6:    0.00001316

     * 7:    0.00000094

     * 8:    0.00000006

Copy the code

When the length of the linked list is 8, the probability of occurrence is 0.00000006, less than one in ten million. Therefore, under normal circumstances, the length of the linked list cannot reach 8, and once it reaches 8, there must be something wrong with the hash algorithm, so in this case, In order for HashMap to still have high query performance, we convert the linked list to a red-black tree. When we write normal code and use HashMap, we almost never encounter a red-black tree because the concept is only one in ten million.

Interviewer:

Well, that’s very careful. Note the comments in the source code.

Here may let the interviewer feel is very focused on the details of the source, will think, why?

In the beginning you mentioned that HashMap is not secure in multi-threaded operation, so how did you solve this problem in your work?

Programmer:

Thread safety can be achieved externally by locking itself, or by synchronizing each method with the Collections#synchronizedMap; You can also use the ConcurrentHashMap class under the Concurrent package. How does ConcurrentHashMap ensure thread-safety? (what does ConcurrentHashMap do to ensure thread-safety?) Instructions.

Interviewer:

Yeah, so you’re talking about why arrays are always expanded by powers of two?

Programmer:

B: Ok. Here is my understanding.

For efficient access, a HashMap should minimize collisions by distributing data evenly.

Such as:

2 to the NTH power is actually 1 followed by n 0’s. 2 to the NTH power minus 1 is actually n 1’s.

So when the length is 8, 3 & 8 minus 1 is 3, 2 & 8 minus 1 is 2, and there’s no collision at all.

And at length 5, 3 & (5-1)= 0, 2 & (5-1)= 0, both on 0, so you collide.

/ / 3 & 4

011

100

000

  

/ / 2 & 4

010

100

000

Copy the code

So, to make sure that the volume is 2 to the n, is to make sure that when doing (size-1), each bit can be &1, that is, and 1111… 1111111 performs and operations.

Interviewer:

If I call the new HashMap(1000) constructor, will I still be able to cache the image data in the Map?

Programmer:

In fact, if the initial capacity is given directly, then we need to analyze according to the source code calculation, there are several steps:

1024 = tableSizeFor(1000); This API is used to calculate the expansion threshold.


And you don’t want to think that this is the actual size of the expansion, that there’s a formula that goes into the expansion.

2. Calculate the true expansion threshold.

Check whether the table is empty when the data is put for the first time. If the table is empty, expand the table.

    final Node<K,V>[] resize() {

        Node<K,V>[] oldTab = table; // The first entry is null, so the length is 0

        int oldCap = (oldTab == null)?0 : oldTab.length;

        int oldThr = threshold; // This is 1024

        int newCap, newThr = 0;

        if (oldCap > 0) {

.// omit the code

        } else if (oldThr > 0

            newCap = oldThr;// newCap = 1024

        if (newThr == 0) {

            float ft = (float)newCap * loadFactor;//1024 * 0.75 = 768.0

            newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?

                      (int)ft : Integer.MAX_VALUE); / / 768

        }

       // Update the expansion threshold

        threshold = newThr;

       // instantiate an array

        @SuppressWarnings({"rawtypes"."unchecked"})

            Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];

        // Expanding the newTab size

        table = newTab;

.// omit the code

        }

        return newTab;

    }

Copy the code

You can see that the real expansion valve is 768.

3. Determine the expansion mechanism

As soon as it is added to 768, the expansion will occur, as shown in the following code:

 if (++size > threshold)//769 > 768 Needs to be expanded

        resize();

Copy the code

Therefore, when we set 1000 as the initial capacity expansion, it needs to be expanded. Because the underlying valve is not really set at 1024, it is multiplied by a loading factor. There’s a way to keep it from expanding, which is to call new HashMap(1000,1f) and it won’t expand.

Here you provide not only the actual answer, but also the solution.

The interviewer will be satisfied with your answer.

2. Say what you know about ArrayMap

Programmer:


The bottom layer of ArrayMap establishes the mapping relationship through two arrays, where int[] mHashes stores the Key object hashCode value in order of size, Object[] mArray stores Key and Value objects in adjacent locations in mHashes order. The mArray is twice as long as the mHashes.

The data is stored according to the hashcode() method of the key to get the hash value, calculate the index value of the mArrays, and then use binary lookup to find the corresponding position for insertion. When a hash conflict occurs, the insertion will be at the adjacent position of the INDE.

Get the hash value based on the hashcode() method of the key, then get the INDEX index of the mHashes based on the hash value, and finally get the values of the mArrays based on the index + 1 index.

3. How do you select HashMap, ArrayMap and SparseArray in your work?

Programmer:

Ok, I have summarized a set of performance comparison, and I refer to the following summary for each requirement.

The serial number demand Performance to choose
01 There is 1K of data that needs to be loaded into the container Key is int SparseArray saves 30%, ArrayMap saves 10%
02 There is 1W data that needs to be loaded into the container HashMap

4. Have you used LinkedHashMap? How does the bottom layer maintain insert order and remove least accessed elements?


Ps: Since the internal storage mechanism is spread out, if the connection is spread out, the connection line in the figure is estimated to be messy, so in order to make the figure a little better, I will draw according to my own ideas, of course, the internal structure will not change.

Programmer:

Yes, we saw earlier that the LruCache base layer was also implemented based on LinkedHashMap. Well, I guess I’ll follow my lead.

It also has all the features of HashMap, and on top of that, it also provides two major features. One is the addition of an insert order and a deletion policy that implements the least recent access.

Let’s see how sequential insertion is implemented:

The external structure of LinkedHashMap is a bidirectional linked list structure, and the internal structure is a HashMap structure, which is equivalent to the combination of HashMap + LinkedHashMap.

LinkedHashMap is a simple source code implementation that overwrites the newNode/newTreeNode method called in the HashMap##put method. Then the bidirectional join of the linked list is realized inside the function. As shown below:


In summary, LinkedHashMap links the newly added nodes together using a two-way list to achieve insertion order. The core data structure is then handed over to the HashMap for maintenance.

Here’s how to implement the minimal access deletion function:

In fact, the strategy for accessing the least deletion function is also called LRU algorithm. The basic principle is that frequently used elements will be appended to the end node of the current list, while the less frequently used elements will naturally be placed near the head node of the list. Then we can set the deletion strategy, for example, set a policy size for the current Map. Then when the saved data is larger than the set size, it will be deleted from the primary node.

In the source code, the operation to append frequently used node data to the linked list is in the GET API, as shown below:

    public V get(Object key) {

        Node<K,V> e;

        Call the HashMap getNode method

        if ((e = getNode(hash(key), key)) == null)

            return null;

        // If the LRU policy is set

        if (accessOrder)

        // This moves the current key to the last node

            afterNodeAccess(e);

        return e.value;

    }

Copy the code

AfterNodeInsertion API overwritten by LinkedHashMap HashMap#putVal method when storing data.

//HashMap

final V putVal(int hash, K key, V value, boolean onlyIfAbsent, boolean evict) {

.Delete the rest of the code

    

  // Remove infrequently used elements

 afterNodeInsertion(evict);

}



//LinkedHashMap

// Remove infrequently used elements

void afterNodeInsertion(boolean evict) // possibly remove eldest

  // Get the element head node

  Entry<K,V> first;

  // removeEldestEntry controls the deletion policy. RemoveEldestEntry controls whether the deletion is performed

  if(evict && (first = head) ! =null && removeEldestEntry(first)) {

    // Get the key of the head node and delete the head, because the most recent ones are moved to the end node when they get

    K key = first.key;

    // removeNode Deletes a node

    removeNode(hash(key), key, null.false.true);

  }

}

Copy the code

In summary, LinkedHashMap operates based on the API exposed by HashMap, implementing sequential storage and least recently deleted policies.

After explaining these principles, interviewers generally don’t ask any more questions. Because we’re done with the core features.

5. Do you know how TreeMap is sorted internally?

Programmer:

Yeah, I know. I don’t use this API very much. I’ve only read its source code before and know that its underlying structure is still a red-black tree, the same as the red-black tree of HashMap. Then TreeMap takes advantage of the nature of red-black trees that are large on the left and small on the right to sort by key.

Interviewer:

Well, can you explain exactly how the bottom layer is sorted by key?

Programmer:

In a program, if we want to sort a List, we implement the Comparable##compareTo abstract method, and we use the external Comparator to sort the List. TreeMap uses the same principle. In this way, the sorting of keys is realized.

I’m just going to tell you how the PUT (K key, V value) API implements sorting.

1. Check whether the node of the red-black tree is empty. If it is empty, the new node is directly used as the root node.

Entry<K,V> t = root;

// If the root node is empty, create a new one

if (t == null) {

    // the compare method limits the key to null

    compare(key, key); // type (and possibly null) check

    // become the root node

    root = new Entry<>(key, value, null);

    size = 1;

    modCount++;

    return null;

}

Copy the code

2. According to the red-black tree’s characteristics of small left and large right, judge and find the parent node that should be added, the code is as follows:

Comparator<? super K> cpr = comparator;

if(cpr ! =null) {

    // Spin to find the location where the key should be added, that is, the head where the node should be mounted

    do {

        // At the end of the loop, parent is the object that was compared last time

        parent = t;

        // Compare keys by compare

        cmp = cpr.compare(key, t.key);

        // Key is less than t. Assign t to the left of t, because the left of the red-black tree is smaller

        if (cmp < 0)

            t = t.left;

        // If key is greater than t, assign the value to the right of t, because the value to the right of the red-black tree is larger

        else if (cmp > 0)

            t = t.right;

        // Overwrite the original value if it is equal

        else

            return t.setValue(value);

        // if t is empty, the leaf node has been reached

    } while(t ! =null);

}

Copy the code

Insert a new node to the left or right of the parent node as follows:

// CMP represents the size of the last comparison, which is less than 0, indicating that e is to the left of the previous node

if (cmp < 0)

    parent.left = e;

// CMP represents the size of the last comparison, greater than 0, which means that e is to the right of the previous node.

else

    parent.right = e;

Copy the code

4, color rotation, reach balance, end.

You can see that TreeMap sorting is based on using the Comparator to compare keys if there is a Comparator passed in, or implementing the Comparable compareTo method with key itself if there is none.

6. What does ConcurrentHashMap do to ensure thread-safety?

Programmer:

Its main structure is the same as that of HashMap. The underlying structure is based on array + single linked list + red-black tree.

There are several main points to ensure thread-safety:

1. The array that stores Map data is accessorized with the volatile keyword. If it is volatile, other threads are notified immediately.

    Java7 is initialized at constructor time

    // The capacity is all powers of 2, iterating through iterators

    transient volatile Node<K,V>[] table;



    // The expanded array

    private transient volatile Node<K,V>[] nextTable;

Copy the code

When put, if the array is not initialized, use Thread##yield and sun.misc.Unsafe##compareAndSwapInt to ensure that only one thread initializes the array.


3, if the array index calculated by PUT has no value, the infinite for loop + CAS algorithm is used to ensure that it can be added successfully and will not overwrite the value put in by other threads.

// If the current index position has no value, create it directly

if ((f = tabAt(tab, i = (n - 1) & hash)) == null) {

//cas creates a new element at position I. If position I is empty, the creation completes for auto-loop, otherwise the spin continues

if (casTabAt(tab, i, null.new Node<K,V>(hash, key, value, null)))break;

   }





/ * *

 *  CAS

  * @paramTAB Specifies the object to be modified

  * @paramI Specifies the offset in the object

  * @paramC expectations

  * @paramV updated value

  * @return          true | false

* /


static final <K,V> boolean casTabAt(Node<K,V>[] tab, int i,

                                        Node<K,V> c, Node<K,V> v)
 
{

   return U.compareAndSwapObject(tab, ((long)i << ASHIFT) + ABASE, c, v);

}

Copy the code

4. If the node of put is expanding, the system waits until the expansion is complete before putting the node. This ensures that the values of the old array will not change during the expansion.

// If the current hash is the hash of the forwarding node, it indicates that the slot is being expanded and will wait until the expansion is complete

if ((fh = f.hash) == MOVED)

  tab = helpTransfer(tab, f);

Copy the code

5. When operating on slot points of the array, slot points will be locked first to ensure that only the current thread can operate on the linked list or red-black tree on slot points;


6. When the red-black tree rotates, the root node will be locked to ensure thread safety during rotation.


Interviewer:

How do you describe the application of the CAS algorithm in ConcurrentHashMap?

Programmer:

CAS is actually an optimistic lock with three values: assignment object, original value, and new value. During execution, the CAS determines whether the value in memory is equal to the original value. If so, the new value is assigned to the object; otherwise, assignment fails.

The ConcurrentHashMap PUT method uses CAS, which is used in conjunction with the infinite for loop. The steps are as follows:

  1. Calculate the array index subscript, take out the original value of the subscript;
  2. CAS overwrites the value of the current subscript. When assigning, if the memory value is found to be equal to the original value taken out by 1, perform the assignment and exit the loop. Otherwise, no assignment is performed and go to 3.
  3. The next for loop repeats 1,2 until it succeeds.

You can see the advantages of doing this. First, you will not blindly overwrite the original value, and second, the assignment will definitely succeed.

Interviewer:

Well, that’s not bad, but what are the similarities and differences with HashMap

Programmer:

Similarities:

1. Both implement Map interface and inherit AbstractMap abstract class, so their methods are mostly similar and can be switched with each other.

2, the bottom layer is based on array + single linked list + red black tree implementation.

Difference:

1. ConcurrentHashMap is thread-safe and can be used directly in multi-threaded environments without locking.

2. In terms of data structure, ConcurrentHashMap has many transfer nodes, which are mainly used to ensure thread safety during capacity expansion.

Queue

What is the difference between a queue and a set?

Programmer:

Okay, so let me talk a little bit about queues, and then I’ll talk about the differences;

Understanding of queues:

First of all, the queue itself is also a container, and there are different data structures at the bottom. For example, LinkedBlockingQueue is a linked list structure, so it can maintain the first-in, first-out order. For example, DelayQueue can be a queue or a stack, so it can guarantee the first-in, first-out order. Or the order of first in and then out, and so on, the underlying data structure is different, but also caused by the implementation of different operations;

Some queues (such as LinkedBlockingQueue) provide temporary storage. We can either put data into the queue or take data from the queue at the same time.

3. Queues decouple the one party producing data from the one consuming data. Producers only produce and consumers only consume, and there is no necessary connection between them.

4, queue can also manage to consumers and producers, such as the queue is full, the producers have continued to deliver the data, the queue can make producers blocked, let it can no longer delivery, such as queues empty, with consumers to take the data, the queue can let consumer hodler to live, when there are data, arouse consumers, for consumers to get data back, Such as the ArrayBlockingQueue;

5. Queues also provide blocking functions, such as getting data from a queue, but if there is no data in the queue, the thread will block until there is data available from the queue.

The difference between:

Queues (with some exceptions) and collections provide data storage. The underlying data structure is somewhat similar. For example, LinkedBlockingQueue and LinkedHashMap both use linked lists. Both ArrayBlockingQueue and ArrayList use arrays at the bottom.

2. Difference between and set:

  1. The underlying storage structures of partial queues and partial collections are very similar, but the apis and the underlying implementation of operations are different in order to accomplish different things.
  2. When the queue is empty, the consumer will be blocked. When another thread performs a PUT operation, it will wake up the blocked consumer and let the consumer consume data. When the queue is full, it will also wake up the blocked consumer.
  3. It decouples the producer and the consumer, and the queue is like a pipe between the producer and the consumer, the producer just throws in, the consumer just keeps consuming, and the two don’t care about each other.

2. LinkedBlockingQueue? Let’s talk about how the underlying LinkedBlockingQueue implements data access.

Programmer:

Useful too. LinkedBlockingQueue as a whole is a blocking list structure that acts as both producer and consumer. The underlying implementation of blocking is based on the AQS implementation.

Blocking data:

There are various functions implemented to store data in queues, such as add, PUT, and offer. All of these functions are used to store data, but their internal implementation is different. I’m going to introduce put here, because it’s something I use a lot.

When the data stored has reached its maximum value, the thread blocks and waits for the take function to wake up. If the queue is full, it is directly queued and the currently added element is placed at the end. Finally, inform the consumer that the take function is ready to fetch the data.


Block fetching data:

Blocking fetching data is the same as blocking storing data. The principle of data fetching is to judge whether the current queue size is empty. If it is empty, the queue will be stuck in infinite waiting, and the awakening time will be notified when the data is saved. If there is data in the queue, it is fetched from the head of the queue (following first-in, first-out).


The internal implementation is relatively simple, but the difficulty lies in skillful use of the access API. There are also many application scenarios, such as thread pools based on the principle of blocking queues.

Have you ever used ArrayBlockingQueue? How does ArrayBlockingQueue implement data access underneath? Is automatic capacity expansion supported?

Programmer:


Yes, the underlying storage structure is based on array implementation. To instantiate ArrayBlockingQueue, a fixed queue size must be set. Internal expansion is not supported.

Interviewer:

Well, why don’t you talk about the underlying mechanism for implementing new data?

Programmer:


When storing data, use ReentrantLock to lock the function, and then determine whether the current queue capacity is full, if so, wait indefinitely until there is data awakening; If not, the queue is entered. There are a couple of steps to enqueuing and I’m just going to draw a flow diagram just to make it clear:


According to the flow chart, we know that data is stored according to putIndex first, and then the value of putIndex is calculated and compared with the size of the queue. If the value is equal, the next data can only be stored from the head of the queue.

Interviewer:

So you’re talking about how to get data from the queue?

Programmer:


In fact, take and put are related, and their internal mechanisms are similar. Basically, lock first, then determine whether there is data in the queue, if not, wait indefinitely; If you have data, put it in a queue, put it in a queue let me draw a flow diagram here.


According to the flowchart, we know that first according to takeIndex to get the data, and then calculate the next takeIndex value and queue size comparison, if the same means that the next time can only take data from the queue head.

4. Which queues have blocking function? Roughly how is it blocked?

Programmer:

LinkedBlockingQueue and ArrayBlockingQueue are the same type of queue. The size of the LinkedBlockingQueue is the maximum size of an Integer, and the size of the ArrayBlockingQueue is fixed. If a thread puts data, it will block until another thread consumes data, and then it will wake up to continue putting. When the queue is empty, if another thread takes data, the thread will block until the queue is empty and continue taking data.

SynchronousQueue SynchronousQueue: When a thread is put, it must consume its data before it can return. When a thread takes a put, it must consume its data before it can return. Otherwise, it blocks. Thread A puts data A1 on the queue, and when there are no consumers, thread A cannot return. It blocks until another thread consumes data A1.

Interviewer:

What is the underlying blocking principle?

Programmer:

LockSupport##park/unpark will be called in the Java layer. Here I will use my LinkedBlockingQueue to implement the blocking principle:


The final implementation is native function, as shown below:

public native void unpark(Object var1);



public native void park(boolean var1, long var2);

Copy the code

Let’s go straight to the c++ implementation:

C++ source code implementation more, interested can see this article LockSupport principle analysis

5. Explain the difference between LinkedBlockingQueue and ArrayBlockingQueue.

Programmer:

Similarities:

1. The blocking mechanism is roughly the same. For example, the thread will block when the queue is full or empty.

Difference:

1, LinkedBlockingQueue is a LinkedBlockingQueue structure, the default size is the maximum Interge, ArrayBlockingQueue is an array, the size must be specified at initialization.

2. The underlying structure is different, so the underlying implementation of take, PUT, and remove is different.

6. What are the differences between the access apis of queues? Like put take and offer poll

Programmer:

Here I use the following table to summarize:

function Infinitely block Throw exceptions There is a time limit blocking Special values
New – Queue full put add Offer return false when the offer expires offer return false
View and delete – queue empty take remove Poll Return null after the timeout period offer return null
View not delete – Queue empty There is no element There is no Peek return null

conclusion

Here are some tips to help you with your interview.

reference

  • Juejin. Cn/post / 684490…

  • www.imooc.com/read/47

  • Gityuan.com/2019/01/13/…

About me

  • Email: [email protected]
  • Personal blog: www.devyk.top
  • Making: github.com/yangkun1992…
  • Nuggets blog: juejin.cn/user/336855…

Scan the code to pay attention to my public number, let us from more into some!