Why is complexity analysis needed?

You might be a little confused, but I run through the code, and through statistics and monitoring, I can get the execution time of the algorithm and how much memory it takes. Why do time and space complexity analysis? Can this analysis be more accurate than if I actually ran it?

First of all, I can say for sure that your method of evaluating the efficiency of an algorithm is correct. Many books on data structures and algorithms even have a name for this method: post-hoc statistics. However, this statistical method has significant limitations.

1. Test results are highly dependent on the test environment

Hardware differences in the test environment can have a significant impact on test results. For example, if we take the same piece of code and run it on an Intel Core I9 processor and an Intel Core I3 processor, needless to say, the i9 processor executes much faster than the I3 processor. Also, if code A executes faster than code B on one machine, we might have the opposite result when we switch to another machine.

2. Test results are greatly influenced by data size

We’re going to talk about sorting algorithms, but let’s do an example of that. For the same sorting algorithm, if the order degree of data to be sorted is different, the execution time of sorting will be greatly different. In extreme cases, if the data is already ordered, the sorting algorithm doesn’t need to do anything and the execution time is very short. In addition, if the test data size is too small, the test results may not truly reflect the performance of the algorithm. For example, for small data sorts, insertion sort might actually be faster than quicksort!

Therefore, we need a method that can roughly estimate the efficiency of the algorithm without using specific test data to test it. This is the time and space complexity analysis method we are going to talk about today.

Large O complexity notation

The execution efficiency of an algorithm, roughly speaking, is the execution time of the algorithm code. But how do you “visually” get the execution time of a piece of code without actually running it?

Here’s a very simple code, 1,2,3… The sum of n. Now let’s estimate the execution time of this code.

 int cal(int n) {
   int sum = 0;
   int i = 1;
   for (; i <= n; ++i) {
     sum = sum + i;
   }
   return sum;
 }
Copy the code

From the CPU’s point of view, each line of this code performs a similar operation: read data-operate-write data. Although each line of code corresponds to a different number of CPU executions and execution time, we are only making rough estimates here, so we can assume that each line of code executes at the same unit_time. Based on this assumption, what is the total execution time of this code?

Lines 2 and 3 take 1 unit_time each, and lines 4 and 5 have been run n times, so it takes 2n times unit_time, so the total execution time for this code is (2n+2) unit_time. As you can see, the execution time T(n) for all code is proportional to the number of times per line of code is executed.

With that analysis in mind, let’s look at this code.

 int cal(int n) {
   int sum = 0;
   int i = 1;
   int j = 1;
   for (; i <= n; ++i) {
     j = 1;
     for(; j <= n; ++j) { sum = sum + i * j; }}}Copy the code

We still assume that the execution time of each statement is unit_time. So what is the total execution time T(n) of this piece of code?

Lines 2, 3, and 4 require 1 unit_time for each line. Lines 5 and 6 loop n times, requiring 2n * unit_time. Lines 7 and 8 loop N2 times. So the execution time is 2n2 times unit_time. So, the total execution time of the whole code T(n) = (2n2+2n+3)*unit_time.

Although we do not know the specific value of unit_time, we can get a very important rule through the derivation of the execution time of these two pieces of code, that is, the execution time T(n) of all codes is proportional to the execution times n of each line of code.

We can sum up this rule in a formula. Look out, the big O is coming!

Let me explain this formula. Where T(n), which we’ve already talked about, is how long the code takes to execute; N indicates the data size. F (n) represents the total number of times each line of code is executed. Because this is a formula, it’s f(n). O in the formula means that the execution time of the code T(n) is proportional to the expression F (n).

So, T(n) = O(2n+2) in the first example, and T(n) = O(2n2+2n+3) in the second example. This is the big O time complexity notation. The maximum O time complexity does not specifically represent the real execution time of the code, but the trend of code execution time with the increase of data size. Therefore, it is also called the asymptotic time complexity (also called time complexity).

When n is large, you can think of it as 10,000, 100,000. However, the low-order, constant and coefficient parts in the formula do not control the growth trend, so they can be ignored. We only need to record a maximum order of magnitude. If we use the big O notation to express the time complexity of the two pieces of code we just talked about, it can be written as T(n) = O(n); T (n) = O (n2).

Time complexity analysis

The origin and expression of time complexity of big O are introduced. Now, how do you analyze the time complexity of a piece of code? I have three practical tips to share with you.

1. Focus only on the piece of code that loops the most

As I said, the big O complexity notation is just a trend. We usually ignore constants, low orders, and coefficients in the formula and just record the magnitude of the largest order. Therefore, when we analyze the time complexity of an algorithm or a piece of code, we only focus on the piece of code with the most times of loop execution. The order of n of the number of times the core code is executed is the time complexity of the entire code to be analyzed. For your understanding, LET me use the previous example.

int cal(int n) {
   int sum = 0;
   int i = 1;
   for (; i <= n; ++i) {
     sum = sum + i;
   }
   return sum;
 }
Copy the code

Lines 2 and 3 are constant-level execution times, independent of the size of n, so it has no effect on complexity. Lines 4 and 5 are the ones that loop the most, so focus on that. As we said earlier, these two lines of code are executed n times, so the total time is order n.

2. Rule of addition: The total complexity is equal to the complexity of the code with the highest order of magnitude

I have a little bit of code here. You can try to analyze it first, and then see if it’s the same as me.


int cal(int n) {
   int sum_1 = 0;
   int p = 1;
   for (; p < 100; ++p) {
     sum_1 = sum_1 + p;
   }

   int sum_2 = 0;
   int q = 1;
   for (; q < n; ++q) {
     sum_2 = sum_2 + q;
   }
 
   int sum_3 = 0;
   int i = 1;
   int j = 1;
   for (; i <= n; ++i) {
     j = 1; 
     for(; j <= n; ++j) { sum_3 = sum_3 + i * j; }}return sum_1 + sum_2 + sum_3;
 }
Copy the code

This code is divided into three parts, sum_1, sum_2, sum_3 respectively. We can analyze the time complexity of each part separately, put them together, and take the highest order of magnitude as the complexity of the whole code.

What is the time complexity of the first period? This code loops 100 times, so it’s a constant execution time, independent of the size of n.

I want to emphasize again that even if this code loops 10,000 times, 100,000 times, as long as it’s a given number, it doesn’t matter what n is, it’s still constant execution time. When n is infinite, we can ignore it. Although it can have a significant impact on the execution time of the code, going back to the concept of time complexity, it represents the change in the performance of an algorithm and the growth of the data size, so we can ignore the execution time of the constant. Because it has no effect on growth trends per se.

What is the time complexity of the second and third pieces of code? The answer is O(n) and O(n2), which you should be able to figure out easily, but I won’t bore you.

Taking the time complexity of these three pieces of code together, we take the highest order of magnitude. Therefore, the time complexity of the entire code is O(n2). In other words, the total time complexity is equal to the time complexity of the code with the highest magnitude. Then we can abstract this law into the formula:

If T1(n)=O(f(n)), T2(n)=O(g(n)); Then T = T1 (n) (n) + T2 (n) = Max (O (f (n)), O (g (n))) = O (Max (f (n), g (n))).

3. Multiplication rule: The complexity of nested code is equal to the product of the complexity of both inside and outside of the nested code

I just showed you an addition rule for complexity analysis, and here’s a product rule. By analogy, you can “guess” what the formula looks like, right?

If T1(n)=O(f(n)), T2(n)=O(g(n)); Then T = T1 * T2 (n) (n) (n) = O (f (n)) * O (g (n)) = O (f (n) * g (n)).

That is to say, assuming the T1 (n) = O (n), T2 (n) = O (n2), the T1 * T2 (n) (n) = O (n3). So, in code, we can think of the product rule as a nested loop, and I’ll give you an example.


int cal(int n) {
   int ret = 0; 
   int i = 1;
   for(; i < n; ++i) { ret = ret + f(i); }}int f(int n) {
  int sum = 0;
  int i = 1;
  for (; i < n; ++i) {
    sum = sum + i;
  } 
  return sum;
 }
Copy the code

Let’s look at the CAL () function separately. Assuming that f() is just an ordinary operation, the time complexity of lines 4 to 6 is T1(n) = O(n). However, the f() function itself is not a simple operation, and its time complexity is T2(n) = O(n), so the time complexity of the whole CAL () function is T(n) = T1(n) * T2(n) = O(n*n) = O(n2).

I just talked about three kinds of complexity analysis techniques. But you don’t have to remember. In fact, the key to complexity analysis is proficiency. As long as you read more cases, more analysis, you can achieve “no move wins a move”.

Analysis of several common time complexity examples

While the code varies widely, the common levels of complexity are small. As a quick summary, these complexity levels cover almost any code you’ll ever have to work with.

For the complexity levels just listed, we can roughly divide them into two categories, polynomial and non-polynomial. There are only two non-polynomial magnitude-o (2n) and O(n!). .

We call the algorithm problem with time complexity of non-polynomial order NP (non-deterministic Polynomial) problem.

When the data size n becomes larger and larger, the execution time of the non-polynomial scale algorithm will increase sharply, and the execution time of solving the problem will increase indefinitely. Therefore, non-polynomial time complexity algorithms are actually very inefficient algorithms. So, I won’t expand on NP time complexity. Let’s focus on a few common polynomial time complexities.

1. O(1)

The first thing you need to understand is that O(1) is just a way of describing constant time complexity, not just one line of code. This code, for example, is O(1) in time, not O(3), even if it has three lines.

 int i = 8;
 int j = 6;
 int sum = i + j;
Copy the code

So just to summarize a little bit, as long as the execution time of the code doesn’t increase with n, then we’ll call the time complexity of the code O(1). In other words, the time complexity of this algorithm is equal to two o (1), even if there are thousands of lines of code, as long as there are no circular or recursive statements in the algorithm.

(2) O (logn), O (nlogn)

Logarithmic time complexity is very common and the most difficult to analyze. Let me illustrate this with an example.

 i=1;
 while (i <= n)  {
   i = i * 2;
 }
Copy the code

According to the complexity analysis we discussed earlier, the third line of code is the one that loops the most. So, if we can calculate how many times this line of code is executed, we can know the time complexity of the entire code.

As you can see from the code, the value of variable I starts at 1 and is multiplied by 2 each time through the loop. When greater than n, the loop ends. Remember that geometric sequence we learned in high school? In fact, the value of variable I is a geometric sequence. If I were to list them one by one, it would look something like this:

So we just need to know what x value is, and we know how many times this line of code has been executed. I’m not going to go into the problem of solving x by 2x equals n, which we probably learned in high school. X =log base 2n, so the time of this code is order log base 2n.

Now, let me change the code a little bit, and you see, what is the time complexity of this code?

 i=1;
 while (i <= n)  {
   i = i * 3;
 }
Copy the code

It’s easy to see, based on the idea I just gave you, that this code has time order log3n.

In fact, whether it’s base 2, base 3, or base 10, we can write the time complexity of all logarithmic orders as O(logn). Why is that?

We know that logarithms can be converted to each other, so log base 3n is log base 32 log base 2n, so O log base 3n is O log base 2n, where C log base 32 is a constant. Based on one of our previous theories, we can ignore the coefficient O(Cf(n)) = O(f(n)) when adopting the complexity of large O markers. So, order log base 2n is equal to order log base 3n. Therefore, in the expression method of logarithmic order time complexity, we ignore the “base” of logarithm and uniformly express it as O(logn).

If you understand what I said earlier about order nlogn, then order nlogn is easy to understand. Remember when we talked about the product rule? If the time of a piece of code is O(logn), and we iterate n times, the time is O(nlogn). Moreover, O(nlogn) is also a very common algorithm time complexity. For example, merge sort and quicksort have O(nlogn) time complexity.

(3) O (m + n), O (m * n)

Let’s talk about a different kind of time complexity, where the complexity of the code is determined by the size of the two data. Old rule, look at the code first!

int cal(int m, int n) {
  int sum_1 = 0;
  int i = 1;
  for (; i < m; ++i) {
    sum_1 = sum_1 + i;
  }

  int sum_2 = 0;
  int j = 1;
  for (; j < n; ++j) {
    sum_2 = sum_2 + j;
  }

  return sum_1 + sum_2;
}
Copy the code

As you can see from the code, m and n represent two data sizes. We can’t evaluate the magnitude of m or n in advance, so when we write complexity, we can’t simply use the rule of addition and omit one of them. So, the time complexity of the code above is O(m+n).

In this case, the original addition rule is not correct, we need to change the addition rule: T1(m) + T2(n) = O(f(m) + g(n)). But the product rule continues: T1(m)*T2(n) = O(f(m) * f(n)).

Spatial complexity analysis

Previously, we spent a long time talking about big O notation and time complexity analysis, understanding the previous content, spatial complexity analysis methodology is very simple.

As I mentioned earlier, the full name of time complexity is asymptotic time complexity, which represents the increasing relationship between the execution time of an algorithm and the size of the data. By analogy, the full name for the parabolic space complexity is the increasing relationship between the storage space of an algorithm and the data size of the algorithm.

Let me give you some concrete examples.

void print(int n) {
  int i = 0;
  int[] a = new int[n];
  for (i; i <n; ++i) {
    a[i] = i * i;
  }

  for (i = n-1; i >= 0; --i) {
    print out a[i]
  }
}
Copy the code

As with time complexity analysis, we can see that in line 2, we apply for a space to store variable I, but it is of constant order and has nothing to do with data size N, so we can ignore it. Line 3 applies for an array of int type n, except that the rest of the code doesn’t take up any more space, so the entire code is O(n).

Common spatial complexity is O(1), O(n), O(n2), logarithmic complexity such as O(logn), O(nlogn) is not usually used. Moreover, spatial complexity analysis is much simpler than temporal complexity analysis. So, for spatial complexity, it’s enough to know what I just said.

Contents summary

Complexity is also called progressive complexity, including time complexity and space complexity, which is used to analyze the growth relationship between algorithm execution efficiency and data size. It can be roughly expressed as: The higher the algorithm of order complexity, the lower the execution efficiency. Common complexity is not many, from low to high order: O(1), O(logn), O(n), O(nlogn), O(n2).

Disclaimer: This article is reproduced, not original