The linear table

A linear table is a row of data formed as a line. Each linear table has at most two directions: forward and back.

Nonlinear table

In nonlinear tables, there is no simple contextual relationship between data.

An array of

Use a contiguous set of memory Spaces to store a set of data of the same type

Arrays support random access reason analysis

  • Arrays are linear table structures with data as a line;
  • Contiguous memory space and the same type of data;

The specific implementation

  • One-dimensional array addressing

The first address of the memory is base_address=1000. When a computer needs to randomly access an element in an array, it first calculates the memory address where the element is stored using the following addressing formula:

a[i]_address = base_address + i * data_type_size
Copy the code
  • Two-dimensional array addressing

For array m * n, the address of a [I][j] (I < m,j < n) is:

address = base_address + ( i * n + j) * type_size
Copy the code
  • Three dimensional array addressing

For array m * n * q, the address of a [I][j][k] (I < m,j < n,k < q) is:

address = base_address + ( i * n * q + j * q + k) * type_size
Copy the code

Common misconceptions about

Linked lists are suitable for insertion and deletion, and the time complexity is O(1). Arrays are good for lookups, with a lookups time of O(1). The statement is not accurate, arrays are suitable for lookup operations, but the lookup time is not O(1). Even a sorted array, binary lookup, is order logn. So, the correct statement is that arrays support random access, with a time complexity of O(1) based on subscripts.

Inefficient inserts and deletes

insert

Given an array of length N, we now need to insert a data into the KTH position in the array

  • If the data in the array is ordered, if you insert elements at the end of the array, then you don’t need to move the data, and the time is O(1). But if you insert elements at the beginning of the array, all the data has to be moved one bit back, so the worst time complexity is O(n). Since the probability of inserting elements at each location is the same, the average case time complexity is (1+2+… N)/n = O (n).
  • If the data stored in an array is random, the array is treated as a collection of stored data. In this case, if we want to insert something into the KTH location, we can simply move the KTH location to the end of the array element and put the new element directly into the KTH location to avoid massive data migration. Using this technique, the time complexity for inserting an element at the KTH position in a given scenario is reduced to O(1).

delete

Delete the data in the KTH position

  • For the sake of memory continuity, you also need to move the data, otherwise there will be holes in the middle, memory will not be continuous. Similar to insertion, if the data at the end of the array is deleted, the best-case time complexity is O(1); If the initial data is deleted, the worst-case time complexity is O(n); The average time is also order n.
  • In some special cases, it is not necessary to pursue continuity of data in an array. Would the efficiency of deletion be improved if multiple deletion operations were performed at the same time? For example, array A [10] stores 8 elements: A, B, C, D, E, F, G, H. Now, we’re going to delete elements A, B, and C.

To avoid data d, E, F, G, and H being moved three times,We can start by recording the deleted data. Each deletion does not actually move the data, only that the record data has been deleted.When the array has no more space to store the data, we trigger another actual delete operation, which greatly reduces data migration due to delete operations. This is very similar to how the JVM’s tag clearing algorithm works. This is done only once, but not once, without moving the following elements to the front, which results in an empty space.

  • JVM tag clearing algorithm

Most mainstream virtual machines use the reachable analysis algorithm to judge whether objects are alive or not. In the marking stage, all GC ROOTS will be traversed and all objects reachable by GC ROOTS will be marked as alive. Only when the marking is complete will the clean-up begin. Its disadvantages are as follows:

  1. Tagging and cleaning are inefficient, only when a small amount of garbage is generated.
  2. Discontinuous memory space fragmentation is generated. (Live objects can be tilted to clean up memory outside the boundary)

Beware of out-of-bounds array access

int main(int argc, char* argv[]){
    int i = 0;
    int arr[3] = {0};
    for(; i<=3; i++){
        arr[i] = 0;
        printf("hello world\n");
    }
    return 0;// Print hello world infinitely
Copy the code

The stack grows downward. First, I, A [2], A [1], A [0] are pushed. This is the order I see when I debug the assembly on my VC. It is equivalent to accessing the I variable when accessing A [3], and the address of the I variable is the address of the array current process, so the operating system will not terminate the process when modifying it.

For an array access out of bounds causing an infinite loop, memory allocation will be incremented or decremented depending on the compiler. If the memory address is decremented, it will cause an infinite loop.

The nature of accessing an array is to access a contiguous memory segment, and as long as the memory address calculated by the array offset is available, the program may not report any errors.

Many computer viruses also take advantage of the code in the array out of bounds can access illegal address vulnerability, to attack the system, so when writing code must be aware of the array out of bounds.

The defect of arrays

  • You need to pre-specify the size because you need to allocate contiguous memory space
  • If you need to store the 11th data, you need to reallocate a larger space, copy the original data, and then insert the new data.

Why do array indices start at 0 instead of 1?

From the memory model of array storage, the best definition of “subscript” is “offset”. A [k] represents the offset of k type_size, so calculate a[k] memory address using this formula:

a[k]_address = base_address + k * type_size
Copy the code

If we start at 1, the memory address of the array element a[k] is changed to

a[k]_address = base_address + (k- 1) * type_size
Copy the code

That’s one more subtraction per random access, and one more instruction for the CPU. But this may not be the main reason, there are other reasons, such as historical reasons and so on.

source

Time.geekbang.org/column/arti…