This is the first day of my participation in the More text Challenge. For details, see more text Challenge. This article is participating in the “Java Theme Month – Java Development in action”. See the link to the event for more details.

1, the preface




In interviews, I am often asked about the concept of collection, the low-level design of collection List, Map, Set, and their usage scenarios and attention to details. But most of the answers were the same, the same, and fatal, as the answers on the Internet. In fact, everyone is wrong, especially on the Internet, is misleading everyone, detailed reasons, and listen to me to analyze.






2, set List




2.1 List in your mind

List is a container for caching data. It is a collection type provided by JDK for developers. One of the most common things people ask about in an interview is the difference between an ArrayList and a LinkedList.


ArrayList is based on array implementation, LinkedList is based on LinkedList implementation. When it comes to use scenarios, I’ve found that most of the answers are: LinkedList is more efficient than ArrayList for adding and deleting operations, and ArrayList is more efficient than LinkedList for querying and traversing operations. Is the answer accurate? Today I will take you to verify a ha.


2.2 Performance Comparison

First, we all know that ArrayList and LinkedList inherit from AbstractList abstract class, and AbstractList implements the List interface. ArrayList is implemented using an array, while LinkedList is implemented using a bidirectional LinkedList. Next, let’s take a closer look at ArrayList and LinkedList performance.




public class ArrayList<Eextends AbstractList<E>

        implements List<E>, RandomAccess.Cloneable.java.io.Serializable



Copy the code

In the source code, we know that ArrayList implements the RandomAccess interface in addition to cloning and serialization. If you’re unfamiliar with this interface, you can see from the code that it’s an empty interface with no implementation logic, so why should ArrayList implement it? The original RandomAccess interface is a flag interface, it marks as long as the implementation of the interface, can achieve fast RandomAccess.


As for ArrayList, LinkedList of various operations here no longer said, you can see this article.


Next, let’s take a look at some test data, using 50000 tests as an example:


  1. Add tests to ArrayList and LinkedList
Header: arrayList. Time > linkedList. Time

Middle: arrayList. Time is less than linkedlist. Time

End: arrayList. Time is less than linkedlist. Time

Copy the code

From this test, we can see that LinkedList does not necessarily add elements faster than ArrayList.

Since ArrayList is an array implementation, and an array is a contiguous memory space, when adding elements to the array header, the data after the header needs to be rearranged, so the efficiency is very low. The LinkedList is based on the LinkedList implementation, when adding an element, first through the loop to find the position of the new element, if the position to be added in the first half, look from front to back; If it is in the second half, look from the back. So adding elements to the header of the LinkedList is very efficient.

When inserted in the middle, ArrayList also has some data to rearrange and is not very efficient, while LinkedList takes the most time to add elements to the middle, because near the middle, the loop lookup before the element is the most traversal of the element.

When it comes to tail operations, ArrayList is found to be more efficient than LinkedList without scaling up. This is because ArrayList is very efficient when adding elements to the end without copying or rearranging them. LinkedList doesn’t have to loop around to find elements, but it takes more time than ArrayList because of the new object and the logic of changing the pointer to the object.

 public boolean add(E e) {

    linkLast(e);

    return true;

}



void linkLast(E e) {

    final Node<E> l = last;

    final Node<E> newNode = new Node<>(l, e, null);

    last = newNode;

    if (l == null)

        first = newNode;

    else

        l.next = newNode;

    size++;

    modCount++;

}

Copy the code


  1. ArrayList, LinkedList delete test
Header: arrayList. Time > linkedList. Time

Middle: arrayList. Time is less than linkedlist. Time

End: arrayList. Time is less than linkedlist. Time

Copy the code


You’ll notice that the results of the ArrayList and LinkedList deleters are very similar to the results of the additions, which is the same thing, so I won’t go into that.


  1. ArrayList, LinkedList traversal test
forLoop: arrayList. Time is less than linkedList. Time

Iterator: ArrayList.Time is almost equal to LinkedList.Time

Copy the code


As you can see, the LinkedList for loop traversal is not as good as the ArrayList for loop. This is because LinkedList is based on a LinkedList, and when you use a for loop, each for loop will traverse more than half of the List, which seriously affects the efficiency of traversal. ArrayList is based on an array, and implements the RandomAccess interface flag, which means ArrayList can achieve fast RandomAccess, so the for loop is very fast. The iterating performance of a LinkedList is similar to that of an ArrayList, and it’s not too bad, so when we iterate through a LinkedList, we use an iterating loop.






3, often wrong point




thinking

During an ArrayList deletion operation, there are two ways to write:



public static void removeA(ArrayList<String> l) {

  for (String s : l){

      if (s.equals("aaa")) {

        l.remove(s);

      }

  }

}

Copy the code




public static void removeB(ArrayList<String> l) {

    Iterator<String> it = l.iterator();

    while (it.hasNext()) {

        String str = it.next();

        if (str.equals("aaa")) {

            it.remove();

        }

    }



}

Copy the code


The first method is incorrect, and the second method is correct. The first method is incorrect, and the second method is correct. The first method is incorrect, and the second method is correct, because both methods use the list internal Iterator. The ArrayList variable, modCount, is stored to record the number of changes to the list, and the iterator variable, called expectedModCount, is stored to compare it with the expected number of changes. If not equal will throw a ConcurrentModificationException exception. And if you call the remove method in the list in a for loop, you get to a fastRemove method, which is not an iterator method, but an ArrayList method, which just does modCount++, Is not synchronized to expectedModCount. So don't agree, throw ConcurrentModificationException anomalies.


Here is the ArrayList’s own remove method:

public boolean remove(Object o) {

    if (o == null) {

        for (int index = 0; index < size; index++)

            if (elementData[index] == null) {

                fastRemove(index);

                return true;

            }

    } else {

        for (int index = 0; index < size; index++)

            if (o.equals(elementData[index])) {

                fastRemove(index);

                return true;

            }

    }

    return false;

}

Copy the code
private void fastRemove(int index) {

    modCount++;

    int numMoved = size - index - 1;

    if (numMoved > 0)

        System.arraycopy(elementData, index+1, elementData, index,

                         numMoved);

    elementData[--size] = null// clear to let GC do its work

}

Copy the code


If you have read the Ali Java programming specification, do not remove/add elements in the foreach loop when removing a set. Remove elements use the Iterator mode. If concurrent operations are performed, locks must be placed on the Iterator object.