Best, Worst, Average, and Amortized Time Complexity

The time complexity may vary in different cases

In the last article, we summarized the big O notation of algorithmic time complexity which is the asymptotic time complexity and the basic method of algorithmic time complexity analysis.

One of the drawbacks of post-hoc statistics is that the test results are heavily influenced by the size of the data and the reality of the data, but in all the examples we presented in the previous article, the input was the number of cycles. In this case, it can only reflect the influence of data scale on algorithm efficiency, but it cannot reflect the influence of different actual conditions of the same scale data on algorithm efficiency.

Or take the sorting algorithm for example, for the same sorting algorithm, even if the size of the data we input is the same, but the order of the data to be sorted is not the same, the sorting execution time will be very different. In extreme cases, if the data is already ordered, the sorting algorithm doesn’t need to do anything and the execution time is very short.

Visible, an algorithm for time complexity analysis, we also want to consider with the size of the data of the impact of different conditions on the algorithm efficiency, that is we need to the specific situation of the input data is classified and discussed to more accurately assess the actual situation of an algorithm, or even, in the data under the condition of same size, The actual situation of the data will determine the selection of the algorithm.

Therefore, in this article, we will supplement how to analyze the time complexity of the algorithm according to different situations of data.

Best – and worst-case time complexity

Let’s look at an example of a search algorithm:

int find(int[] array, int value) {
    if(array == null || array.length == 0) {
        return -1;
    }
    for(int i = 0; i < array.length; i++) {
        if(array[i] == value) {
            returni; }}return -1;
}
Copy the code

After reading the above code, it is not hard to see that the above code is different from the previous example in that the loop code related to the data size does not always execute completely. At this point, can we say that the time complexity of this code is O(A.length)? Obviously, not necessarily, and at this point the complexity analysis approach we introduced in the last article is not enough.

Because the value can appear anywhere in the array, if the first element in the array is exactly the value we’re looking for, line 7 will return the result, ending the array prematurely with O(1). However, if we are unlucky and there is no value in the array, the code will iterate through the array until line 10 returns -1, at which point the time complexity becomes O(n). So, in different cases, the time complexity of this code is different.

To represent the time complexity of code in different cases, we need to introduce three concepts: best-case time complexity, worst-case time complexity, and average-case time complexity.

The first two are easy to understand, and in fact we have analyzed both the best case and worst case time complexity of the above code in the example above.

As the name implies, ** best-case time complexity is the time to execute this code under optimal input data. ** As in our example above, in the best case, the first element of the array is the value we are looking for, and in this case the corresponding time complexity is the best case time complexity.

Similarly, ** worst-case time is, in the worst case, the time of this line of code. ** As in our example above, if there is no value in the input array, then we need to traverse the entire array, which is the worst-case time complexity corresponding to the worst-case time complexity.

Average case time complexity

The best-case and worst-case time complexity correspond to the extreme case of code time complexity, which is not likely to occur. To better describe code time complexity in the more general case, we need to introduce another concept: average case time complexity.

What about the average time complexity? It’s just a weighted average of the time complexity of each case times the probability of each case and then divided by the total number of cases, which is ** the time complexity of each case. ** A little bit of probability theory is needed when analyzing average case time complexity.

Take the above search algorithm as an example, the position of variable X in the array is n+1: any position from 0 to n-1 in the array and is not in the array. The time complexity in each case is actually the number of elements to traverse, which is 1(when the value is on array[0]) and n(when the value is not in array). We first assume that the probability of value appearing in each position is the same. At this time, we add up the number of elements that need to be searched and traversed in each case and divide by n+1 to obtain the average value of the number of elements that need to be traversed:

In the big O complexity notation, coefficients, low orders and constants can be omitted. After we simplify the above results, the average case time complexity is O(n).

That’s true, but there are some problems with the derivation, because we said that the probability of a value being in n+1 places in an array is different, and the probability of it not being in an array is actually higher.

We know that the value we’re looking for is either in the array or not in the array. It’s a little bit tricky to figure out what the probabilities are, so for the sake of understanding, we’re going to assume that the probability of value being in the array and the probability of value not being in the array are both 1/2. When value exists in the array, the probability of it appearing in any position in the array is 1/n. Therefore, according to the probability multiplication rule, the probability of the data to be searched appearing in any position between 0 and n-1 is 1/(2n).

So the biggest problem with our derivation above is that we don’t take into account all the probabilities. If we take into account the probability of each scenario, the time complexity of the average scenario becomes the following:

After the introduction of probability theory, our analysis is still O(n), but that’s the correct derivation. This value is the weighted average in probability theory, also called the expected value, so the average case time complexity is also called the weighted average time complexity or the expected time complexity.

In fact, for the most part, we don’t need to distinguish between best-case, worst-case, and average-case time complexity, and most of the time, as we did in the example in our last article, we can just use one complexity. Only when the time complexity of the same block of code varies by an order of magnitude in different cases do we use the three complexity distinctions.

Amortize the time complexity

So far, we have covered most of algorithmic complexity analysis. Next, we will discuss a more special concept, amortized time complexity, and its corresponding analysis method, amortized analysis.

Amortized time complexity sounds a bit like average case time complexity, and the two can be confusing. As mentioned above, in most cases we do not distinguish between best, worst, and average complexity. Average time complexity is only used in some special cases, while amortized time complexity is used in more special and limited scenarios.

Let’s take a look at the following example. The following code is written in an unconventional way, but this is just for convenience:

// array represents an array of length N
// Array. length is equal to n
int[] array = new int[n];
int count = 0;

void insert(int val) {
    if (count == array.length) {
        int sum = 0;
        for (int i = 0; i < array.length; ++i) {
            sum = sum + array[i];
        }
        array[0] = sum;
        count = 1;
    }

    array[count] = val;
    ++count;
}
Copy the code

This code implements a function to insert data into an array. When the array is full, we loop through the sum through the for loop, emptying the array, placing the sum at the first place in the array, and inserting new data. But if the array has free space to begin with, the data is inserted directly into the array.

So what is the time complexity of this code? Let’s start with the best case, worst case, and average time complexity described above.

In the best case, if there is free space in the array, all you need to do is insert the data into the array at index count, and in the best case, the time is O(1); In the worst case, there’s no free space in the array, so we need to do the array sum first, and then insert, so the worst case is order n.

What is the average time complexity? Let’s go back to the probability theory method introduced earlier. Assuming that the length of the array is N, depending on where the data is inserted, we can divide it into n cases, each with a time complexity of O(1). In addition, there is the “extra” case, where you insert data when there are no free places in the array, which is O(n) time. In other words, there are n+1 cases, and the probability of each n+1 case is the same, 1 over (n+1), so it’s not hard to calculate the average case time complexity of the code above is order (1).

But is the time complexity analysis of the code above necessary? Let’s examine the characteristics of the above code:

  • Without the special case that there are no free positions in the array, the time complexity of the code is always O(1), with no difference of magnitude;
  • The special case of no free positions in the array occurs periodically, with an O(n) operation followed by n-1 O(1) inserts, and so on.

Therefore, for the complexity analysis of such a special scenario, we do not need to find out all the input cases and the corresponding probability of occurrence, and then calculate the weighted average as mentioned in the average complexity analysis method before. We introduce a simpler analysis method: amortization analysis, and the time complexity obtained by amortization analysis is called amortization time complexity.

How does amortized analysis analyze the time complexity of a piece of code? Let’s continue with the data insertion example above.

Since the above insert operation was followed by an O(n) operation, n-1 O(1) insert operation followed, and so on. So can we evenly distribute that O(n) operation to the next N-1 O(1) operation, that is, we divide this O(n) operation into N points, give one to each of the next N-1 O(1) operation, and keep one for itself. After the amortization, the complexity of the original N-1 O(1) operation becomes O(2), This simplifies to order one; The original O(n) complexity operation becomes O(1). According to the time complexity addition rule, the amortized time complexity of this set of continuous operations is O(1). So that’s the general idea of amortized analysis.

The application scenarios of amortization time complexity and amortization analysis are special, and we will not use them in most cases. Only in the following scenarios will we consider using amortization analysis to get the algorithm’s amortization time complexity:

A data structure for a set of continuous operation, the time complexity in most cases is very low, only a few cases of high time complexity and the and the coherent temporal relationship between operation, this time, we can put together a set of operations analysis, look to whether can high time complexity of the operation of time consuming, Amortized to other operations that are less time complex. Moreover, in the case where amortized time complexity analysis can be applied, the general amortized time complexity is equal to the best-case time complexity.

Actually, sharing time complexity is a special kind of average time complexity, we don’t need to spend too much effort to distinguish between them, we here are sharing time complexity is aimed to introduce its corresponding method of amortization, when we encounter, the analysis of the amortization scenario can be used to speed up our analysis.

Common time complexity

Up to now, we analysis of time complexity for the algorithm, has been more comprehensive, here we introduce what we will meet when coding forms of time complexity, and the advantages and disadvantages between them, for an algorithm, we not only should have a basic sense, also know that it is good or bad.

While code varies widely, there are not many common levels of complexity, so let’s summarize a little bit. The complexity listed below shows the middleweight in order of magnitude from smallest to largest:

  • Constant order O (1)

  • The logarithmic order O (logn)

  • Linear order O (n)

  • Linear log order O(nlogn)

  • Square O(n2), cubic O(N3)… Order K to the power O(NK)

    Note that k in order K must be a constant, independent of the input data

  • Exponential order O (2 n)

  • Factorial order O (n!

For the complexity levels listed above, we can roughly divide them into two categories, polynomial and non-polynomial. There are only two non-polynomial magnitude-o (2n) and O(n!). . We call the algorithm problem with time complexity of non-polynomial order NP(non-deterministic Polynomial) problem.

When the data size n is more and more big, the polynomial order of magnitude of the algorithm execution time will increase sharply, so the non-deterministic polynomial time complexity of the algorithm is very inefficient, we encountered when encoding complexity of code also tend to think of some way to optimize away, therefore, about the NP tells the time complexity is not here. We focus on several common polynomial time complexities.

O(1)

The first thing you need to understand is that O(1) is just a way of describing constant time complexity, not just one line of code. Such as:

int i = 8;
int j = 6;
int sum = i + j;
Copy the code

Even with three rows, it is O(1), not O(3).

In fact, as long as the execution time of the code does not increase with n, the time complexity of such code is denoted as O(1). Generally, the time complexity of this algorithm is equal to two o (1), even if there are thousands of lines of code, as long as there are no circular or recursive statements in the algorithm. Meanwhile, the time complexity of a fixed number of cycles is also O(1).

O (logn), O (nlogn)

Logarithmic time complexity is very common and the most difficult to analyze. Let’s look at the following code:

i=1;
while (i <= n)  {
    i = i * 2;
}
Copy the code

According to the complexity analysis we discussed earlier, the third line of code is the one that loops the most. So, if we can calculate how many times this line of code is executed, we can know the time complexity of the entire code. As you can see from the code, the value of variable I starts at 1 and is multiplied by 2 each time through the loop. When greater than n, the loop ends. Remember that geometric sequence we learned in high school? In fact, the value of variable I is a geometric sequence. If I were to list them one by one, it would look something like this:

So we just need to know what x value is, and we know how many times this line of code has been executed. X =log base 2n, so the time of this code is order log base 2n.

Now, let’s modify the above code slightly:

i=1;
while (i <= n)  {
    i = i * 3;
}
Copy the code

So, based on what I just did, it’s easy to figure out that the time of this piece of code is order log base 3n, and we know that logarithms are interchangeable, so log base 3n is equal to log base 32 by log base 2n, where log base 32 is a constant. Based on one of our previous theories, the coefficients can be ignored when using large O notation complexity, so that O(log2n) is equal to O(log3n).

Therefore, in the expression method of logarithmic order time complexity, we ignore the “base” of logarithm and uniformly express it as O(logn).

O (m + n), O (m * n)

The complexity of the code is determined by the size of the two data sets. Look at the following code:

int cal(int m, int n) {
    int sum_1 = 0;
    int i = 1;
    for (; i < m; ++i) {
        sum_1 = sum_1 + i;
    }

    int sum_2 = 0;
    int j = 1;
    for (; j < n; ++j) {
        sum_2 = sum_2 + j;
    }

    return sum_1 + sum_2;
}
Copy the code

As you can see from the code, m and n represent two data sizes. We can’t evaluate the magnitude of m or n in advance, so when we write complexity, we can’t simply use the rule of addition and omit one of them. So, the time complexity of the code above is O(m+n).

In this case, the original addition rule is not correct, we need to change the addition rule: T1(m) + T2(n) = O(f(m) + g(n)). But the product rule continues: T1(m)×T2(n) = O(f(m)×f(n)).

conclusion

So far, in our last article, we introduced the basic method of progressive time complexity analysis, including only considering the main contradiction, addition rule and multiplication rule, and learned the influence of data size on algorithm execution efficiency. In this article, we also introduced the time complexity under different circumstances. Including best Case time complexity, worst Case time complexity, and Average Case time complexity For example, amortized time complexity (Complexity) is important for the algorithm. For example, amortized time complexity is important for the algorithm. Most of the code we encounter will be able to do complexity analysis.

I am well aware that my technical level and expression ability are limited, and there must be deficiencies and mistakes in the article. Welcome to communicate with me ([email protected]), and communicate with me to revise the deficiencies and mistakes in the article. Thank you for reading.