define

In a broad sense

A data structure is a storage structure for a group of data.

An algorithm is a set of methods for manipulating data.

In a narrow sense

It refers to some well-known data structures and algorithms, such as queue, stack, heap, binary search, dynamic programming, etc

Data structure and algorithm relationships

Data structures serve algorithms, which operate on specific data structures

For example, common binary search algorithms require arrays to store data because of the random access nature of arrays. But if we choose a data structure like a linked list, the binary lookup algorithm won’t work because linked lists don’t support random access

Complexity analysis

Limitations of ex post statistical methods

  1. Test results are highly dependent on the test environment

Hardware differences in the test environment can have a significant impact on test results

  1. Test results are greatly influenced by the size of the data

For small data sorts, insertion sort may actually be faster than quicksort

Large O complexity notation


 int cal(int n) {
   int sum = 0;
   int i = 1;
   int j = 1;
   for (; i <= n; ++i) {
     j = 1;
     for(; j <= n; ++j) { sum = sum + i * j; }}}Copy the code

Lines 2, 3, and 4 each require a unit_time execution time,

Lines 5 and 6 loop n times, requiring 2n * unit_time,

Lines 7 and 8 loop n2 times, so 2 n2n^2n2 * unit_time is required. So, the total execution time of the whole code T(n) = (2n2n ^2n2+2n+3)*unit_time.

Formula:

The above example is expressed as T(n) = O(2n^2+2n+3).

The maximum O time complexity does not specifically represent the real execution time of the code, but the trend of code execution time with the increase of data size. Therefore, it is also called the asymptotic time complexity (also called time complexity).

When n is large, you can think of it as 10,000, 100,000. However, the low-order, constant and coefficient parts in the formula do not control the growth trend, so they can be ignored. We only need to record a maximum order of magnitude. If we use the big O notation to express the time complexity of the two pieces of code we just talked about, it can be written as T(n) = O(n); T(n) = O(n2n^2n2).

Constant time

Even if this code loops 10,000 times, 100,000 times, as long as it’s a known number, it doesn’t matter what n is, it’s still a constant execution time

Time complexity analysis

1. Focus only on the piece of code that loops the most

Ignore constants, low orders, and coefficients in the formula, and just record the magnitude of the largest order

When we analyze the time complexity of an algorithm or a piece of code, we only focus on the piece of code that has the most cycles. The order of n of the number of times the core code is executed is the time complexity of the entire code to be analyzed.

 int cal(int n) {
   int sum = 0;
   int i = 1;
   for (; i <= n; ++i) {
     sum = sum + i;
   }
   return sum;
 }
Copy the code
  1. Lines 2 and 3 are constant-level execution times, independent of the size of n, so it has no effect on complexity.
  2. Lines 4 and 5 are the ones that loop the most, so focus on that. These two lines of code are executed n times, so the total time is zeroO(n)

2. Rule of addition: The total complexity is equal to the complexity of the code with the highest order of magnitude

int cal(int n) {
   int sum_1 = 0;
   int p = 1;
   for (; p < 100; ++p) {
     sum_1 = sum_1 + p;
   }

   int sum_2 = 0;
   int q = 1;
   for (; q < n; ++q) {
     sum_2 = sum_2 + q;
   }
 
   int sum_3 = 0;
   int i = 1;
   int j = 1;
   for (; i <= n; ++i) {
     j = 1; 
     for(; j <= n; ++j) { sum_3 = sum_3 + i * j; }}return sum_1 + sum_2 + sum_3;
 }
Copy the code

The code is divided into three parts, respectively sum_1, sum_2, sum_3

Don’t analyze the time complexity of each section, then put them together and take the highest order of magnitude as the complexity of the whole code.

What is the time complexity of the first period? This code loops 100 times, so it’s a constant execution time, independent of the size of n.

The time complexity of the second and third code is O(n) and O(n2n^2n2).

The time complexity of three pieces of code, we take the highest order of magnitude. So, the time of the whole code is O(n2n^2n2).

3. Multiplication rule: The complexity of nested code is equal to the product of the complexity of both inside and outside of the nested code

In code, we can think of the product rule as a nested loop

int cal(int n) {
   int ret = 0; 
   int i = 1;
   for (; i < n; ++i) {
     ret = ret + f(i);
   } 
 } 
 
 int f(int n) {
  int sum = 0;
  int i = 1;
  for (; i < n; ++i) {
    sum = sum + i;
  } 
  return sum;
 }
Copy the code

Assuming that f() is just an ordinary operation, the time complexity of lines 4 to 6 is T1(n) = O(n).

But f() itself is not a simple operation, and its time complexity is T2(n) = O(n).

The time complexity of the whole CAL () function is T(n) = T1(n) * T2(n) = O(n*n) = O(n2n^2n2).

Analysis of several common time complexity examples

It can be roughly divided into two categories, polynomial and non-polynomial.

There are only two nonpolynomial magnitudes: exponential order O(2n2^n2n) and factorial order O(n!).

We call the algorithm problem with time complexity of non-polynomial order NP (non-deterministic Polynomial) problem

When the data size n becomes larger and larger, the execution time of the non-polynomial algorithm increases sharply

Non-polynomial time complexity algorithms are actually very inefficient algorithms

Let’s look at some common polynomial time complexity

1. O(1) Constant time complexity

We want the execution time of our code not to increase as n increases, so we’ll call the time complexity of our code O(1).

As long as there are no circular statements or recursive statements in the algorithm, the time complexity of this algorithm is equal to two o (1), even if there are thousands of lines of code.

2. O(logn), O(nlogn) log-order time complexity

 i=1;
 while (i <= n)  {
   i = i * 2;
 }
Copy the code

The value of the variable I starts at 1 and is multiplied by 2 each time through the loop. When it’s greater than n

The value of the variable I is a geometric sequence

Solve for x by 2×2^x2x=n,x= log⁡2n\log_2{n}log2n, so the time complexity of this code is O(log⁡2n\log_2{n}log2n).

Whether it’s base two, base three, or base ten, we can write the time complexity of all logarithmic orders as order logn.

3. O(m+n) and O(m*n) are determined by the size of the two data

int cal(int m, int n) {
  int sum_1 = 0;
  int i = 1;
  for (; i < m; ++i) {
    sum_1 = sum_1 + i;
  }

  int sum_2 = 0;
  int j = 1;
  for (; j < n; ++j) {
    sum_2 = sum_2 + j;
  }

  return sum_1 + sum_2;
}

Copy the code

The time complexity of the code above is order m plus n.

Spatial complexity analysis (only memory space related to N is counted)

The full name of the elliptic space complexity is the increasing relationship between the storage space and data size of an algorithm

Common spatial complexity is O(1), O(n), O(n2), logarithmic complexity such as O(logn), O(nlogn) is not usually used

void print(int n) {
  int i = 0;
  int[] a = new int[n];
  for (i; i <n; ++i) {
    a[i] = i * i;
  }

  for (i = n-1; i >= 0; --i) {
    print out a[i]
  }
}
Copy the code

In line 2, we request a space to store variable I, but it is of constant order and has nothing to do with size N, so we can ignore it.

Line 3 applies for an array of int type n, except that the rest of the code doesn’t take up any more space, so the entire code is O(n).

conclusion

Complexity is also called progressive complexity, including time complexity and space complexity, which is used to analyze the growth relationship between algorithm execution efficiency and data size. It can be roughly expressed as: The higher the algorithm of order complexity, the lower the execution efficiency

Analysis of best, worst, average and amortized time complexity

Best – and worst-case time complexity

// n indicates the length of the array
int find(int[] array, int n, int x) {
  int i = 0;
  int pos = -1;
  for (; i < n; ++i) {
    if (array[i] == x) pos = i;
  }
  return pos;
}
Copy the code

The complexity of the above code is O(n), where n represents the length of the array


// n indicates the length of the array
int find(int[] array, int n, int x) {
  int i = 0;
  int pos = -1;
  for (; i < n; ++i) {
    if (array[i] == x) {
       pos = i;
       break;  // add break}}return pos;
}
Copy the code

If the first element in the array happens to be the variable x we’re looking for, then we don’t need to go through the remaining n-1 values, so the time is O(1).

But if there is no variable x in the array, then we need to iterate through the entire array, and the time is order n. Therefore, the time complexity of this code is different in different cases.

The best-case time complexity is, in the best case, the time complexity of executing this code

The worst-case time complexity is, in the worst case, the time complexity of executing this code

Average case time complexity


// n indicates the length of the array
int find(int[] array, int n, int x) {
  int i = 0;
  int pos = -1;
  for (; i < n; ++i) {
    if (array[i] == x) {
       pos = i;
       break;  // add break}}return pos;
}
Copy the code

Let’s say the probability of being in the array and not being in the array is 1/2.

In addition, the probability that the data to be searched appears in the n positions from 0 to n-1 is the same, which is 1/n.

So, according to the probability product rule, the probability that the data you’re looking for is anywhere from 0 to n-1 is 1 over 2n.

This is the weighted average of probability theory, also known as the expected value,

So the full name for average time complexity should be weighted average time complexity or expected time complexity.

In big O notation, the weighted average time complexity of this code is still O(n) after removing the coefficients and constants.

For the most part, we don’t need to distinguish between best case, worst case, and average case time complexity

We use these three complexity representations only when the time complexity of the same piece of code varies by an order of magnitude in different cases

Amortize the time complexity

Amortized analysis, the time complexity that we get from amortized analysis is called amortized time complexity.

 // array represents an array of length N
 // Array. length is equal to n
 int[] array = new int[n];
 int count = 0;
 
 void insert(int val) {
    if (count == array.length) {
       int sum = 0;
       for (int i = 0; i < array.length; ++i) {
          sum = sum + array[i];
       }
       array[0] = sum;
       count = 1;
    }

    array[count] = val;
    ++count;
 }
Copy the code

This code implements a function to insert data into an array.

When the array is full (count == array.length), we loop through the sum through the for loop, emptying the array, and placing the sum at the top of the array

New data is then inserted. But if the array has free space to begin with, the data is inserted directly into the array

In the best case, we have free space in the array, and we just insert the data into the array at index count, so the best case is order one.

In the worst case, there is no free space in the array, so we need to do the sum of the array first, and then insert the data, so the worst case is order n.

The average time complexity is O(1): Assuming the length of the array is N, we can divide it into n cases depending on where the data is inserted, and the time complexity of each case is O(1). In addition, there is the “extra” case where you insert data when there is no free space in the array, which is O(n) time. And all of these n+1 scenarios have the same probability of happening, which is 1 over n+1. Therefore, according to the calculation method of weighted average, the average time complexity obtained by us is:

O(n) insert is the worst case summation empty insert and O(1) insert is the best case direct insert

Each O(n) insertion operation will be followed by n-1 O(1) insertion operation. Therefore, if the time-consuming operation is amortized to the next n-1 less time-consuming operation, the amortization time complexity of this group of continuous operations is O(1), which is the general idea of amortization analysis

Where amortized time complexity analysis can be applied, the general amortized time complexity is equal to the best-case time complexity.

// Global variable, array of size 10, length len, subscript I.
int array[] = new int[10]; 
int len = 10;
int i = 0;

// Add an element to the array
void add(int element) {
   if (i >= len) { // The array space is running out
     // Reapply for an array space twice the size
     int new_array[] = new int[len*2];
     // Copy the data from the original array to new_array
     for (int j = 0; j < len; ++j) {
       new_array[j] = array[j];
     }
     // new_array is copied to array. Array is now twice as big as len
     array = new_array;
     len = 2 * len;
   }
   // Place element at subscript I, subscript I plus one
   array[i] = element;
   ++i;
}
Copy the code

PS: Learn the beauty of data structure and algorithm from Teacher Wang Zheng