Algorithms and data structures (3) Arrays and linked lists

This time arrays and linked lists. Before we talk about arrays and linked lists, let’s talk about linear and nonlinear lists.

The linear table LinearList

As the name implies, the structure of a linear table is linear. Like books on a library shelf, each row of books is neatly arranged on a straight board. Tall buildings can also be said to be linear vertically, each floor can be viewed as a unit, in order up and down.

The essence of a linear list is that there’s only context between each element. Books on a shelf, two books next to each other, one of them must be in front of or behind the other (regardless of the upper and lower levels). Two adjacent floors, one must be above or below the other.

Arrays and linked lists are linear lists, as are stacks and queues.

Nonlinear table

The equivalent of a linear table is a nonlinear table. The relationship between each element of a nonlinear table is more multivariate, not just contextual.

A tree, the trunk grows branches, branches can branch, finally grow leaves. Between the branches there is a relationship of father and son, of brother. A map, the relationship between the main sites is more complicated.

Trees and graphs in data structures are nonlinear structures.

Arrays and linked lists

Let’s start with a picture.

This is a chest of drawers. But it also reflects the structure of the hard disk space. In the abstract, a hard disk is essentially a one-dimensional structure, with each element of the same size and closely arranged. Although a hard disk is seen as a 3D object, it can always be mapped to a one-dimensional space.

Arrays are completely mappings of hard disk structures, using contiguous memory space to store data. An array is a very straightforward data structure that uses memory. You can apply for a contiguous memory space and store the data. It is like that the houses of A, B and C are adjacent to each other on the street. After going to A’s house, you will find B’s house two steps to the right and C’s house two steps further.

But this brings up the core problem of what to do if there is no contiguous memory available. In memory we run the operating system and a lot of software. Software is loaded into memory at run time and released at end. After repeated this process, our memory space will become very fragmented. It is possible that the free space has 100 empty Spaces, but the 100 empty Spaces are discontinuous and split by other memory in use, so we will fail to apply for a 100-space array.

This also led to linked lists.

Linked lists solve the problem that arrays are forced to allocate contiguous space, by recording the address of one element in the current element, multiple elements scattered in memory space are connected. Just as the houses of A, B and C are not connected but far apart, a knows the address of B, and B knows the address of C, so we can always find the location of B and C as long as we know the address of A.

There is no such thing as a free lunch. Although linked lists do not require continuous memory space, each element needs to remember the location of the next element, which increases the space occupation of each element in the linked list. The equivalent of a linked list is to unbind the contiguous requirement of the entire memory space through the space footprint of a single element. Algorithms and data structures are full of such trade-offs.

To summarize, arrays require contiguous memory space, but the KTH element can be accessed immediately by simply base_address + K * size, with O(1) random access capability. Linked lists do not require contiguous memory space, but they do need to start from scratch, accessing elements one by one to find the KTH element.

contrast

Whether it’s an array or a linked list, we typically operate on it by inserting data, deleting data, and finding data. Now let’s compare the two using these three operations. Time complexity will inevitably be involved in the analysis, so let’s talk briefly about how to analyze time complexity first.

Time complexity

We usually use the big O notation as a tool, or representation, for time complexity analysis. When we do time complexity analysis, we do not focus on the actual execution time of the algorithm, but on the trend of the code execution time as the data size increases. In simple terms, we analyze the number of code executions at data size N. And then I’ll write it in big O notation. Combined with the following pseudocodes, the analysis method is introduced:

Focus only on the code that loops the most, ignoring constants, low orders, and coefficients;
The rule of addition: the total complexity of a piece of code is equal to the complexity of the largest code;
Product rule: The complexity of nested code is equal to the product of the complexity of internal and external code.

complexity(int n) {
    int[] array = new int[n];

    // 1: O(1)
    int a = array[0];
    int b = array[1];

    // 2: O(n)
    for (int i = 0; i < n; i++) {
        print(array[i]);
        print(array[i]);
    }

    // 3: o(n^2)
    for (int i = 0; i < n; i++) {
        for (int j = 0; j < n; j++) { print(array[i] * array[j]); }}}Copy the code

Here we have n data.

First, let’s look at comment 1. Although array is called twice, we ignore its constants, so this code is O(1).

Let’s go back to comment 2, where we’re looping through the array twice, accessing all the elements of the array, so the code complexity is O(2n), but we’re ignoring the coefficients, so the code time is O(n).

At the end of note 3, two loops are nested. Each loop accesses all elements of the data and prints the product of array elements. The time complexity of each loop is O(n). According to the product rule, the time complexity of this code is O(n) * O(n), that is, O(n^2).

So overall, what is the time complexity of this function? By the rule of addition, and ignoring the low-order, constant time complexity, we end up with O(n^2).

Spatial complexity is analyzed in a similar way to time complexity, except that the number of code executions is replaced by the space footprint.

Common time complexity is O(1), O(logn), O(n), O(nlogn), O(n^2), O(n!). .

Next, let’s compare arrays and linked lists.

Insert data

The insertion data of an array is divided into two types: one is to keep the array elements in order after the insertion of an ordered array; the other is to insert data in an unordered array. Let’s look at the first one. In order to keep the array in order, we need to move the data after the insertion position back. The best case is to insert the end of the array without moving the data, which is O(1), and the worst case is to insert the head, which is O(n), and move all the data. The average time complexity is O(n), and its average time complexity can be calculated by weight, which will not be detailed here. In the second case, no array is easier. If we insert at position K, we simply move the KTH element to the end and insert the data in O(1) time.

The insertion of a linked list is relatively simple, and can be done in O(1) time. Of course, this is to say that the insertion position is already known, does not include the process of finding the insertion position.

Delete the data

When data is deleted from an array, to ensure continuous space occupation, subsequent data needs to be moved after deletion, so the time complexity is O(n). Of course, there can be some optimization, that is, after deleting the data, do not immediately move the data, but record it first, and then move the data again when the space is not enough. This is the core idea of the tag clearing algorithm in the Java virtual Machine garbage collection mechanism.

It is also relatively simple to delete data from linked lists. The time complexity is O(1).

To access the data

Suppose we now want to access the KTH element. Because of the nature of spatial continuity, we can directly access the address of the KTH element by using the above address calculation formula. The time complexity is O(1). The linked list is more troublesome because the space is not contiguous, so we need to start from the beginning, one by one, to get the address of the next element, until the KTH element. So the time is O(n).

To sum up, we can get the following table.

	An array of	The list
Random reads	O(1)	O(n)
insert	O(n)	O(1)
delete	O(n)	O(1)

conclusion

Finally, we can conclude that the main difference between arrays and linked lists is whether the memory space is contiguous. Arrays require contiguous memory space, so memory allocation is more demanding, but this allows arrays to randomly access elements O(1). Linked lists do not require contiguous memory space, and the conditions for allocating memory are relatively loose, but this leads to a larger space footprint and higher time complexity of accessing elements.

Finally, I recommend VirtualLearning, a small program I am writing, which can visualize some algorithms and data structures, so that you can learn more intuitively. Currently supports the main sorting algorithm, more content expansion, please look forward to.

Scan the code on wechat to experience, or search for VirtualLearning.

Scan the code to follow the wechat official account