An overview of

What is a data structure? What are data structures for? What is an algorithm? What does the algorithm do?

This article is just a brief understanding of some professional terms, basic concepts, as a guide to the later articles. I have a basic concept and understanding of data structure and algorithm, so as to pave the way for the following Leetcode brush questions and in-depth understanding of data structure.

10 common data structures: array, linked list, stack, queue, hash table, binary tree, heap, hop table, graph, Trie tree; 10 common algorithms: recursion, sorting, binary search, search, hash algorithm, greedy algorithm, divide-and-conquer algorithm, backtracking algorithm, dynamic programming, string matching algorithm;

Basic concepts and terminology

Data (data)

  • The result of fact or observation, of objective thingsLogical inductionIs raw material used to represent objective things.
  • In computer science, any program that can be put into a computer and be programmed by a computerThe general name of the symbol being processed.

Data item

  • A data item is an exponential data element that can consist of severalData item composition, data items are dataAn integraltheThe smallest unit of.
  • Data items are data recordsThe most basic,inseparablefamousThe data unitAre independent meaningsMinimum identification unit.

Data Elements

  • It is adatatheThe basic unitData elements are also callednodeorrecord.
  • aThe data elementCan be made ofA number ofaA data itemComposition.

Data Objects

  • Data object refers toNature is the sameIs a collection of data elements ofA subset of;
  • The data objectHereinafter referred to as data

Data items having the same number and type of data elements of the same nature;

Data Structure

  • By each otherA kind oforA variety ofThe data elements of the relationshipA collection of; The relationships between these data elements are calledstructure.

The data structure

In computer science, a data structure is a way of storing and organizing data in a computer.

Simply put, data structure is a combination of data elements that have one or more specific relationships with each other. Data structure = data elements + relationships between elements.

Data structure is generally divided into two dimensions: logical structure and storage structure, and logical structure can be roughly divided into linear structure and nonlinear structure. The logical structure and storage structure will be discussed below.

Common data structures:

  • The stack(Stack)
  • The queue(Queue)
  • An array of(Array)
  • The list(Linked List)
  • The tree(the Tree)
  • figure(Graph)
  • accumulation (Heap)
  • Hash table(Hash table)

Basic logical structure

Collection structure: A finite collection of array elements. There is no relationship between data elements other than that of belonging to the same collection.

Linear structure: An ordered collection of data elements. A one-to-one relationship is formed between data elements.

Tree structure: A tree is a hierarchical data structure in which there is a one-to-many relationship between data elements.

Graph structure: The relationship between data elements in a graph is many-to-many.

algorithm

In mathematics and computer science, any well-defined series of specific steps of computation, often used in computation, data processing, and automatic reasoning. As an efficient method, an algorithm is used to calculate a function. It contains a set of clearly defined instructions that can be expressed clearly in limited time and space.

The basic characteristics of the algorithm

The basic characteristics of the algorithm: it is a set of rules that strictly define the order of operations, each rule is valid, is clear, this order will terminate in a limited number of times. Features include:

  1. Input: An algorithm must have zero or more inputs.
  2. Output: An algorithm should have one or more outputs, which are the results of the algorithm’s calculation.
  3. Clarity: The description of the algorithm must be unambiguous in order to ensure that the actual execution result of the algorithm is exactly the same as the requirements or expectations. The actual execution result is usually required to be certain.
  4. Finiteness: According to Turing’s definition, an algorithm is a sequence of operations that can be simulated by any Turing complete system, and a Turing machine has only a finite number of states, a finite number of input symbols, and a finite number of transfer functions (instructions). Some definitions stipulate that the algorithm must complete the task in a finite number of steps.
  5. Effectiveness: Also known as feasibility. Yes, the operations described in the algorithm can be implemented by performing the basic operation a limited number of times.

Common Design patterns

Complete traversal method and incomplete traversal method: When the solution of the problem is a finite discrete solution space and the correctness and optimality can be verified, the simplest algorithm is to completely traverse all the elements of the solution space and test whether the elements are the solutions we want one by one. This is the most straightforward algorithm and tends to be the simplest to implement. However, when the solution space is very large, this algorithm is likely to lead to unbearable engineering computation. In this case, incomplete traversal methods, such as various search and programming methods, can be used to reduce computation. Divide-and-conquer: The idea of dividing a problem into independent parts and solving them separately. One of the benefits of this approach is parallel computation. Dynamic programming: a method often used when the global optimal solution of a problem consists of local optimal solutions. Greedy algorithm: common approximate solution ideas. A method that can be used when the global optimal solution of the problem is not (or cannot be proved to be) composed of local optimal solutions, and the optimality of docking is not required. Linear Programming (LP) refers to optimization problems in which objective functions and constraints are Linear. Linear programming is an important field in optimization problems. Degeneracy: The method of solving a problem by reducing it to an equivalent or approximate, relatively simple model through logical or mathematical reasoning.

Common implementation methods

Recursion: Recursion, in mathematics and computer science, is the use of the function itself in the definition of a function. Recursion is also more commonly used to describe the process of repeating something in a self-similar way. Iterative approach: Iteration is the activity of repeating a feedback process, usually with the goal of approaching and achieving a desired goal or result. Each iteration of the process is called an “iteration,” and the results of each iteration are used as the initial values for the next iteration.

Sequential calculation: Sequential calculation is the formal algorithm in a programming language for a single thread serialized execution. Parallel computing (English: Parallel computing) generally refers to a computing model in which many instructions can be performed simultaneously. The computational process can be broken down into small parts and then solved in a concurrent manner under the premise of simultaneous processing

Commonly used algorithms: recursion, sorting, binary search, search, hash algorithm, greedy algorithm, divide-and-conquer algorithm, backtracking algorithm, dynamic programming, string matching algorithm and so on.

The data structure

Data structures are generally divided into two dimensions: logical structure and storage structure. Here are some common knowledge points based on these two dimensions.

Logical structure

Logical structure is the relationship between data. Logical structure can be divided into two kinds: linear structure and nonlinear structure.

Nonlinear structure can be roughly divided into graph structure, tree structure and set structure corresponding to the above classification.

Linear structure

Linear structure: a collection of ordered data elements in which the relationship between data elements is one-to-one, that is, all data elements except the first and last elements are one end to the other. Common linear structures are: array, stack, queue, linked list, linear list and so on.

Nonlinear structure

Non-linear structure: Individual data elements no longer remain in a linear sequence, and each data element may be related to zero or more other data elements. Common nonlinear structures: binary trees, graphs, trees, sets, etc.

Storage structure

Logical structure refers to the relationship between data elements, and storage structure is the implementation of logical structure in computer language. Common storage structures are: sequential storage, chain storage, index storage, hash storage.

Sequential storage and chain storage are common

Sequential storage structure

Sequential storage structure: the storage of data elements in contiguous addresses, the logical and physical relationship between the data is consistent; For example, array and so on, its storage structure is roughly as shown in the figure below:

Chain storage structure

Chain storage: The storage of data elements in any set of contiguous or discontiguous storage units. The storage relations of data elements do not reflect their logical relations. Pointers are used to store the addresses of data elements, through which we can find the location of associated data elements.

More commonly used data structures will be described in a little more detail below.

algorithm

In the learning algorithm and the algorithm behind the actual combat to be very clear about the first two indicators to measure the algorithm, time complexity and space complexity.

Large O complexity notation

Take a code sample to see what big O is. The code is as follows:

  function sum (n) {
    let sum = 0;
    for (let i = 0; i < n; i++) {
      sum += i;
    }
    return sum
  }
Copy the code

Case 1

This is a simple JS code, do not need to focus on the function of the code. Assuming that the execution time of each line of code is the same, unit_time, the execution time of the code is analyzed from the perspective of code execution. In line 2 they only need one unit_time, but in the next for loop (lines 3, 4) it executes 2n times unit_time, plus line 6 executes unit_time, so we know that the total execution time is (2n+2) times unit_time. We know from this example that the total code execution time T(n) is proportional to the number of times n is executed per line of code.

Let’s look at big O, the formula is T(n) = O(f(n)), and let’s explain some key things about big O.

  • T(n): Represents the total time the code takes to execute.
  • n: Indicates the size of the data, commonly known as the number of code executions.
  • f(n): indicates the total number of code executions. Because this is a formula, so we usef(n)To represent.
  • O: indicates the execution time of the codeT(n)withf(n)The expression is proportional.

The above example in terms of big O is T(n) = O(2n + 2), which is the big O time complexity notation.

In fact, the large O time complexity does not specifically represent the real execution time of the code, but the change trend of the code execution time with the increase of data size, so it is also called progressive time complexity

Time complexity analysis

In the example above we get the formula T(n) = O(2n + 2). When we look at the time complexity of some real algorithms, we don’t see the O(2n + 2) format, usually O(n), O(n²) format. Because the big O complexity notation is just a trend. How to analyze the time complexity of a piece of code?

Focus only on the piece of code that loops the most times

In the example above we get the big O formula T(n) = O(2n + 2), but it’s not usually written O(n), because constants, low orders, and coefficients are usually ignored, and only the highest order is recorded. When we analyze the time complexity of an algorithm or a piece of code, we only focus on the piece of code that has the most cycles.

Rule of addition: The total complexity is equal to the complexity of the largest piece of code

To analyze the addition rule, use the following code:

  function sum (n) {
    let sum1 = 0
    for(let i = 0; i <100; i++) {
      sum1 += i;
    }

    let sum2 = 0;
    for(let i = 0; i < n; i++) { sum2 += i; }let sum3 = 0;
    for(let i = 0; i <= n; i++) {
      for(let j = 0; j <= n; j++) { sum3 = sum3 + i * j; }}return sum1 + sum2 + sum3;
  }
Copy the code

Case 2

The above code can be divided into three sections sum1, SUM2 and SUM3. The time complexity of the three sections of code can be calculated respectively, and then the largest of the three sections of code can be taken as the complexity of the whole section of code. Let’s directly analyze the time complexity of three pieces of code:

What’s the first time? This code executes 100 times, but we know from the rules above that the execution time of a constant is independent of the size of n. Even if this loop is executed 1,000 times, 10,000 times, as long as it’s a known number, it doesn’t matter what n is, it’s still a perfectly good execution time. So we can ignore the current

The second time is T(n) = O(n), and the third time is T(n) = O(n²).

So how do we calculate the total time complexity of all the current code? According to the rules above we go to the current maximum order, so the time complexity of the whole code is O(n ^ 2). In other words, the total time complexity is equal to the time complexity of the code with the highest magnitude. The formula is abstracted as:

If T1(n) = O(f(n)),T2(n) = O(g(n)); Then T = T1 (n) (n) + T2 (n) = Max (O (f (n)), O (g (n))) = O (Max (f (n), g (n))).

Product rule: The complexity of nested code is equal to the product of the complexity of both inside and outside the nested code

The third section of code in Example 2 is used as an example to analyze the complexity calculation of the product rule. Let’s use abstract formula to derive it:

T1(n) = O(f(n)), T2(n) = O(g(n)); Respectively to represent the inner layer and outer layer of the abstract formula of total time complexity is T = T1 * T2 (n) (n) (n) = O (f (n)) * O (n) (g) = O (f (n) * g (n)), we assume that f (n) and g (n) are two equal, We get T(n) = O(n ^ 2). Let’s look at another example:

  function sum1 (n) {
    let sum11 = 0;
    for (let i = 0; i < n; i++) { sum11 += sum2(i); }}function sum2 (n) {
    let sum22 = 0;
    for (let i = 0; i < n; i++) {
      sum22 += i;
    }
    return sum22;
  }

Copy the code

Let’s first look at the time complexity of sum1. Let’s assume that sum2 is just a common operation, and sum1 is T1(n) = O(f(n)) = T1(n) = O(n). But in our code, sum2 is a cycle, and the time complexity of sum2 is T2(n) = O(n), so the overall time complexity is T(n) = T1(n) * T2(n) = O(n*n) = O(n²).

Spatial complexity analysis

The spatial complexity analysis is simpler than the time complexity analysis. The full name of the time complexity is progressive time complexity, which represents the increasing relationship between the execution time of the algorithm and the data size. By analogy, the full name for the parabolic space complexity is the increasing relationship between the storage space of an algorithm and the data size of the algorithm.

Here’s an example:

  function sum1 (n) {
    let sum11 = 0;
    for (let i = 0; i < n; i++) {
      sum11 += sum2(i);
    }
    return sum11;
  }
Copy the code

Line 3 declares a variable I and passes in a variable n. They each declare two spatial variables. The rest of the code does not take up more space, so the space complexity of the whole code is O(n).

Common spatial complexity is O(1), O(n), O(n²), logarithmic complexity such as O(logn), O(nlogn) is not commonly used.

Common time complexity case analysis

Common complexity is of a small order, as follows:

Polynomial magnitude

  • Constant order O (1)
  • The logarithmic order O (logn)
  • Linear order O (n)
  • Linear log order O(nlogn)
  • Order O(n²), order O(n³),….. Order N to the k.

Nonpolynomial magnitude

  • The index order O (2 ^ n)
  • Factorial order O (n!

The above complexity magnitude can be roughly divided into polynomial magnitude and non-polynomial magnitude. There are only two non-polynomial magnitude-o (2^n) and O(n!). .

The complexity level increases as the magnitude increases

We call the algorithm problem with time complexity of non-polynomial order NP (non-deterministic Polynomial) problem. When the data size n becomes larger and larger, the execution time of the non-polynomial scale algorithm will increase sharply, and the execution time of solving the problem will increase wirelessly. Therefore, non-polynomial time complexity algorithms are actually very inefficient algorithms.

O(1)

O(1) is just a representation of constant time complexity, not a single line of code. As shown in the code below, it also has O(1) time complexity, not O(3).

  let i = 0;
  let j = 0;
  let sum = i + j;
Copy the code

Summary: as long as the execution time of the code does not increase as n increases, the time complexity of the code is denoted as O(1). Or, in general, the time complexity is O(1) as long as there are no loops, recursive statements, and thousands of lines of code in the algorithm.

O (logn), O (nlogn)

Logarithmic time complexity is very common and difficult to analyze, as shown in the following example:

  let i = 1;
  while (i <= n) {
    i = i * 2;
  }
Copy the code

In terms of the order in which the code is executed, the value of variable I starts at 1 and is multiplied by 2 each time through the loop. When greater than n, the loop ends. The value of the variable I is a geometric sequence. Roughly as follows:

  2^0,2^1,2^2And...2.^ k,2^x = n
Copy the code

Solving x by 2^x = n we know that x = log2n, so the time complexity of the current code segment is order log2n. When we label complexity in terms of big O when we replace 2 with 3, we can ignore the coefficients and write O(Cf(n)) = O(f(n)).

O(nlogn)

O(nlogn) is a multiplication O(nlogn), if the time complexity of a piece of code is O(logn), we will execute it n times, the time complexity is O(nlogn). And O(nlogn) is also a very common algorithm time complexity. For example, the time complexity of recursive sorting and quicksort is O(nlogn).

O (m + n), O (m * n)

The complexity of a piece of code is determined by the size of two pieces of data. The code is as follows:

  function sum (m, n) {
    let sum1 = 0;
    for(let i = 0; i < m; i++) {
      sum1 += i;
    }

    let sum2 = 0;
    for(let i = 0; i < n; i++) {
      sum2 += i;
    }
    return sum1 + sum2;
  }
Copy the code

In this code, we need to calculate the time complexity of the two cycles, which are O(m) and O(n) respectively. So can we apply the addition rule above? Well, no, because there’s no way to know whether M is big or n is big. So, I don’t know if I’m going to ignore m or n. The time of this piece of code is order m plus n.

The O (m) * O (n) and O (m) + O (n) is the same, can also be calculated for the O (m) * O (n) = O (m * n).

The sorting

There are many kinds of sorting algorithms, the most common is bubble sort, insertion sort, selection sort, quicksort, merge sort, count sort, bucket sort, radix sort and so on.

  • Bubble sort

Loop over the data, comparing the values of two elements that are close to each other, and swapping them if the current value is greater than the next. Repeat the loop until the order is correct

  • Selection sort

First, find the smallest (large) element in the unsorted sequence and store it at the beginning of the sorted sequence. Then, continue to find the smallest (large) element from the remaining unsorted elements and place it at the end of the sorted sequence. And so on, until all the elements are sorted.

  • Insertion sort

It works by building an ordered sequence, scanning back to front in the sorted sequence for unsorted data, finding the appropriate location and inserting it.

  • Merge sort

This algorithm is a very typical application of Divide and Conquer. The ordered subsequences are combined to obtain a fully ordered sequence. That is, each subsequence is ordered first, and then the subsequence segments are ordered. If two ordered tables are joined into one ordered table, it is called 2-way merge.

  • Quick sort

Separate the records to be sorted into two independent parts through a sort. If the keywords of one part of the records are smaller than those of the other part, the two parts of the records can be sorted to achieve the order of the whole sequence.

They have their own characteristics, and we need to choose different sorting methods at the same time, from the time complexity, whether stable sorting, whether in situ sorting three aspects to compare the results as shown in the figure below:

Binary search

Dsmt4. Dsmt4. Dsmt4. Logarithmic Search algorithm. Dsmt4. Is a search algorithm that looks for a particular element in an ordered array.

Binary search is aimed at an ordered set of data, and the search idea is somewhat similar to the divide-and-conquer idea. Each time, the interval to be searched is reduced by half by comparing with the middle element of the interval, until the element to be searched is found, or the difference is reduced to 0.

The time complexity of binary search is O(logn).

Let’s say the data is size N, and every time you look it up, it’s going to shrink by half, so it’s going to divide by 2. In the worst case, it does not stop until the search interval is reduced to empty.

This is an equal ratio surrey. When n/(2^k) = 1, k is the total number of contractions. Each reduction operation only involves the size comparison of two data. Therefore, after k interval reduction operations, the time complexity is O(k). If n over 2 to the k is 1, we know that k is log base 2n, so the time is order log base n.

The hash algorithm

A Hash algorithm is actually a Hash, and its definition and principle are very simple. Mapping a binary value string of arbitrary length to a binary value string of fixed length, the rule of this mapping is the hash algorithm. The binary string of values derived from the raw data mapping is the hash value.

A good hash algorithm probably has the following characteristics:

  • The original data cannot be derived backwards from the hash value (so the hash algorithm is also called one-way hash algorithm);
  • Very sensitive to input data, even if the original data only changed a Bit, the final hash value will be very different;
  • The probability of hash conflict is very small, for different original data, the probability of the same hash value is very small;
  • The efficiency of hashing algorithm should be as efficient as possible. For long text, hashing value can also be calculated quickly.

Hashing algorithms can be used in many ways, such as:

  • A hash algorithm used for secure encryptionMD5andSHA
  • Unique identification, Baidu cloud transfer
  • Data verification, large file upload

Breadth-first search

Breadth First Search, we usually call it BFS for short. It’s a kind of “carpet” strategy, where you start with the vertex closest to the starting vertex, and then you go out. The general search process is shown in the figure below:

Depth-first search

Depth-first-search, or DFS for short. Unlike BFS, nodes accessed earlier may not be nodes closer to the root node. Therefore, the first path you find in DFS may not be the shortest path.

In DFS, nodes are processed in exactly the opposite order, as if they were added to the stack, last in, first out. So depth-first search is generally implemented using a stack.

String matching

The formal definition of the string matching problem:

  • Text (Text)Phi is a length of phinAn array ofT[1...n];
  • Pattern (Pattern)Phi is a length of phimandm<=nAn array ofP[1...m];
  • TandPThe elements in are finiteAlphabet sigma table;
  • if0<=s<=n-mAnd,T[s+1...s+m] = P[1...m], that is, the1<=j<=m, there areT[s+j] = P[j]Said,Model PinText TAnd the displacement issAnd saidsIs aValid Shift.

For example, in the figure above, the goal is to find all occurrences of the pattern P = abaa in the text T = abcabaabcabac. This pattern appears only once in this text, that is, at displacement S = 3, displacement S = 3 is an effective displacement.

The simple string matching problem is to search for all occurrences of a certain string P in a large string T.

There are many different string matching algorithms, Naive Algorithm or Brute Force, Rabin-Karp Algorithm, Finite Automation Algorithm, Knuth-Morris-Pratt Algorithm (KMP) Algorithm), Boyer-Moore Algorithm, Simon Algorithm, Colussi Algorithm, Galil-Giancarlo Algorithm, Apostolico-Crochemore Algorithm, Horspool Algorithm and Sunday Algorithm, etc.

There are two relatively simple, easy to understand, they are: BF algorithm and RK algorithm. There are two more difficult to understand, but more efficient they are: BM algorithm and KMP algorithm.

BF is short for Brute Force, which is also called Brute Force matching algorithm. The execution process is roughly as follows:

Greedy algorithm

Greedy algorithm, divide and conquer algorithm, backtracking algorithm, dynamic programming these four basic algorithms, more accurately should be the algorithm thought, not specific algorithm, commonly used to know our design algorithm and coding.

Greedy algorithm is how to use greedy algorithm to achieve data compression and coding, effectively save data storage space.

Greedy algorithm has many classic applications such as Huffman Coding, Prim and Kruskal minimum spanning tree algorithm, Dikstra single source shortest path algorithm.

Divide and conquer algorithm

The core idea of divide and conquer algorithm is actually four words, divide and conquer, that is, divide the original problem into N smaller, and similar to the structure of the original problem sub-problems, solve these sub-problems recursively, and then merge the results, then get the solution of the original problem.

Backtracking algorithm

In fact, backtracking algorithm is a search attempt process similar to enumeration, which is mainly to find the solution of the problem in the search attempt process. When it is found that the solution condition is not met, it will backtrack back and try another path.

Backtracking is a kind of optimal search method, which searches forward according to the optimal conditions to reach the target.

However, when a certain step is explored, it is found that the original choice is not good or fails to reach the target, so it takes a step back to make a new choice. The technology of going back and going again when there is no way is called backtracking method, and the point in a certain state that meets the backtracking condition is called “backtracking point”.

Backtracking algorithms are suitable for problems that consist of multiple steps, and each step has multiple options.

  • A path that neutralizes a binary tree
  • String arrangement
  • The sum is n numbers of sum
  • Path in matrix
  • The range of motion of the robot
  • N Queen problem

Dynamic programming

Dynamic programming is more practical to solve optimal problems, such as maximum, minimum and so on. It can significantly reduce time complexity and improve the efficiency of code execution.

Suitable for dynamic programming problems, need to meet the optimal substructure and no aftereffect, dynamic programming solution process, is to find the state transition equation, bottom-up solution. Bottom-up solutions can save you a lot of complicated computations, such as the Fibonacci sequence above, which would have grown exponentially using recursion, whereas dynamic programming keeps the algorithm at O(n).

Routing problem

  • Minimum path sum
  • Different paths
  • Different Paths II
  • Form the shortest path to a string

The linear table

Data that has a one-to-one relationship is stored "linearly" in physical space. This storage structure is called linear storage structure (linear table for short) **

Linear lists, including arrays, stacks, queues, linked lists, and so on, are linear structures. Let’s take a quick look at four of the more common types of constructs.

Linear table storage data can be subdivided into two types:

  • Storing data sequentially in contiguous chunks of physical space is called a storage structureSequential storage structure (sequential table).

  • Data is distributed in physical space, and the logical relationship between them is kept by a wire. This storage structure is calledLinked storage structure (linked list).

The term

In a data structure, each individual in a set of data is called a data element (element for short). The elements in a linear table are one-to-one logical data. What do you call the relationships between data elements?

  • The adjacent elements to the left of an element are called"Direct precursor", all elements to the left of this element are collectively called"Precursors";
  • The adjacent elements to the right of an element are called"Direct successor", all elements to the right of this element are collectively called"Successor elements";

The order sheet

Sequential tables also have requirements for the physical storage structure of data. To store data in a sequential table, a physical space of sufficient size is applied for in advance and data is stored sequentially, ensuring that there is no gap between data elements.

For example, if a sequential table is used to store the collection {1,2,3,4,5}, the final storage state of the data looks like this:

Therefore, we can conclude that the storage structure that “has’ one to one ‘logical relation data is sequentially stored in a whole physical space” is the sequential storage structure.

An array of

In most programming languages there is a basic data type Array. An Array is a linear table data structure. It uses a contiguous set of memory Spaces to store a set of data of the same type. In order to maintain the continuity of the data in memory, the insert and delete operations are inefficient. Since both insert and delete operations are performed on subsequent array elements, the time complexity of these two operations is calculated as O(n) on average.

The stack and queue

Stacks and lists are common data structures, and we often see them in some code. Let’s look at stacks and queues separately. Both are data structures that restrict access order: stack (fifin, last out) and queue (fifin, first out). As shown below:

The queue

A queue is a restricted sequence that can only operate on the tail and head of a queue, and can only add elements at the tail and delete elements at the head.

As the most common data structure, queue also has a very wide range of applications, such as message queue. We can also implement a simple queue operation using arrays in JavaScript:

  // Declare a data to emulate the queue
  var queue = [];
  // Add two elements to the queue.
  queue.push(1);
  queue.push(2);
  / / out of the team
  let dequeue = queue.shift();
  console.log(dequeue, queue);
Copy the code

In computer science, a queue is a special abstract data type or collection. The entities in the collection are kept sequentially.

There are two basic queue operations:

  • Enqueue: Adds entities to the back-end position of the queue.
  • Dequeue: Removes entities from the front of the queue.

The FIFO(first in, first out) of the elements in the queue is shown below:

Pictures from

Take HTTP 1.0, HTTP 1.1, HTTP 2.x as an example.

One of the things that comes up a lot when we do front-end performance tuning is the “HTTP 1.1 queue blocking problem”.

HTTP1.1 has a header blocking problem. HTTP1.1 has a header blocking problem. HTTP2 has a header blocking problem.

In fact, “queue head blocking” is a proper term, not only here, the switch and other problems with this problem, the root cause of this problem is the use of queue data structure.

In HTTP 1.0, each request required a TCP connection to be established and immediately disconnected. For the same TCP connection, all HTTP 1.0 requests are queued until the response to the previous request is received before the next request can be sent. This blocking occurs mainly on the client side.

In HTTP/1.1, multiple concurrent requests require multiple TCP links, and a single domain name has a limit of 6-8 TCP connection requests.

In HTTP 1.1, each connection is a persistent connection by default.

Multiple HTTP 1.1 requests can be sent at once for the same TCP connection without waiting for the previous response to be received before sending the next request. This eliminates HTTP 1.0 client-side queue header blocking, which is the concept of a Pipeline in HTTP 1.1.

However,HTTP 1.1 states that the server-side response is queued according to the order in which the request was received, that is, the response to the request received first is also sent. The problem with this is that if the processing time of the first received request is long and the response generation is slow, it will block the delivery of the response that has already been generated. It can also cause congestion at the head of the queue. Therefore, HTTP 1.1 queue header blocking occurs on the server side.

In order to solve theThe HTTP 1.1In theThe server header is blocked.HTTP 2.xUsing theBinary framingAnd multiplexing.

Binary frames are the smallest unit of HTTP 2.x data communication. In HTTP 1.1 packets are in text format. Frames can be used to split the request and response data into smaller pieces, and binary protocols can be parsed more efficiently.

Frames in HTTP 2.x are divided into header frames and body frames.

In HTTP 2.x, all communication with a domain name is done on a single connection that can host any number of two-way data streams.

Each data stream is sent in the form of a message, which in turn consists of one or more frames.

Multiple frames can be sent out of order and reassembled according to the stream identifier at the beginning of the frame.

Multiplexing, used to replace the original sequence and blocking mechanism.

In HTTP/2, all communications under the same domain name are carried out over a single link, using only one TCP link, on which requests and responses can be made in parallel without interference.

For more information on multiplexing, see my other article HTTP 2.x

The stack

A stack is also a restricted sequence that can operate only on the top of the stack, both on and off the stack.

In computer science, a stack is an abstract data type used to represent a collection of elements with two main operations:

  • Push adds elements to the top (bottom) of the stack.
  • Pop removes the top (bottom) element of the stack.

The above two operations can be summarized as LIFO(last in, first out).

In addition, there should be a peek operation for accessing the current top (bottom) element of the stack. (Only returns but does not pop up)

The name “stack” is analogous to a stack of objects (a stack of books, a stack of dishes, etc.).

Stack operations can be mimicked by array types in JS as follows:

    // Declare a data to emulate the stack
  var stack = [];
  // Add two elements to stack (push)
  stack.push(1);
  stack.push(2);
  // pop
  let stackPop = stack.pop();
  console.log(stackPop, stack);
Copy the code

Stack push and pop operations:

Pictures from

Stack in many places have applications, for example, we are familiar with the browser has a lot of stacks, in fact, the implementation of the browser stack is a basic stack structure, from the data structure, it is a stack. So that explains that the solution we use recursively is essentially the same as the solution we use loop + stack.

The common applications of stack are base conversion, bracket matching, stack shuffling, infix expressions (rarely used), postfix expressions (inverse polish expressions) and so on.

In fact, there is a one-to-one correspondence between the legitimate stack shuffle operation and the legitimate parenthesis matching expression, that is to say, there are as many legitimate expressions of n pairs of parentheses as there are types of stack shuffle of N elements. Those who are interested can look for relevant information.

The list

Each node in a data structure corresponds to a storage unit, which is called a storage node. Node for short.

Nodes in a linked list

The storage of each data in the linked list consists of the following two parts:

  • The data element itself, the region in which it resides, is calledData field (denoted by data).
  • A pointer to a direct successor element is in a region calledPointer field (next).

Linked list alias Chained storage structures or singly linked lists are used to store data in a one-to-one logical relationship. Unlike a sequential list, a linked list does not limit the physical storage transition of data, and the physical storage location of data elements stored in a linked list is random.

For example, we use a linked list to store three element sections {1,2,3}, as shown below:

There are many types of linked lists: common linked list types areSingly linked lists,Double linked list,Circular linked listRough sketches of them in physical storage.

Singly linked lists

Singly linked list: N nodes are linked to form a linked list. Since each node of the list contains a pointer field, it is also called a linear list or singly linked list.

Double linked list

A bidirectional list is a linear list that traverses in both the preceding and subsequent directions. In bidirectional linked list structure, each node includes two pointer fields besides data field, one pointer points to its successor node, the other pointer points to its predecessor node. It usually takes the form of a circular linked list with table headers.

Circular linked list

The next pointer on the last node of the circular list is not null, but points to the front of the list.

The characteristic of circular linked list is: as long as we know the address of one node in the list, we can find the address of all other nodes.

Linked list operation time complexity

Singly linked lists

Operation method Time complexity instructions
append O(n) Appends nodes to the end of the list
search O(n) Find any element in a linked list
insert O(n) Inserts a node anywhere in the linked list
remove O(n) Deletes a node at any position in the linked list
searchNext O(1) Finds the descendants of a node
insertNext O(1) Insert a node after a node (successor node)
removeNext O(1) Delete a node after a node (successor node)

Double linked list

Operation method Time complexity instructions
search O(n) Find any element in a linked list
insert O(n) Inserts a node anywhere in the linked list
remove O(n) Deletes a node at any position in the linked list
SearchNext or searchPre O(1) Find the successors or precursors of a node
InsertNext or insertPre O(1) A successor or precursor node inserted into a node
RemoveNext or removePre O(1) Delete the precursor node or successor node of a node

Circular linked list

Operation method Time complexity instructions
search O(n) Find any element in a linked list
insert O(n) Inserts a node anywhere in the linked list
remove O(n) Deletes a node at any position in the linked list
searchNext O(1) Finds the descendants of a node
insertNext O(1) Insert a node after a node (successor node)
removeNext O(1) Delete a node after a node (successor node)

Comparison of sequential and linked lists

  • The order sheetThe storage space can beStatically allocatedIt can also beDynamic allocation.The listThe storage space isDynamic allocation.
  • The order sheetcanRandom or sequential access.The listonlySequential access.
  • The order sheetforInsert/deleteOn average, operations move nearly half of the elements.The listtheModify the pointerThere is no need to move elements. ifInsert/deleteExcept only occurs in the tableAt both ends, appropriate USESCircular linked lists with trailing Pointers.
  • Storage density = storage capacity occupied by node data/total storage occupied by node structure. The storage density of a sequential list is 1, and that of a linked list is < 1.

Summary: Sequential lists are implemented with arrays, linked lists are implemented with Pointers. The linked list is realized by pointer, the node space is dynamically allocated, and the linked list is divided into single linked list, double linked list and circular linked list according to the different link form.

Tree and binary tree

Tree structure is a very important nonlinear data structure, which has obvious hierarchical characteristics among all elements. The most common tree is the binary tree.

A tree is a set of hierarchical relationships composed of n (n>=1) finite nodes. It has the following characteristics: each node has zero or more child nodes; A node without a parent is called the root node. Each non-root node has one and only one parent node.

Will have"One to many"The data elements in the collection of relationships are as followsFigure a.In the form of storage, the whole storage shape in the logical structure, similar to the real life upside downThe tree figure BSo the storage structure is calledTree storage structure.

The nodes of the tree

Nodes: Each data element stored using a tree structure is called a node. Root node (” root node “for short) : Every non-empty tree has one and only one node called root. Leaf node: If a node has no children, it is called a leaf node (leaf node).

If a node has no parent, it is the root of the whole tree.

Subtrees and empty trees

Subtree: The root node of the whole tree in the figure above is node A, which is also A tree if you look at the parts of nodes B, E, F, etc., and node B is the root node of the tree. Therefore, the tree formed by nodes B, E, and F is a subtree of the whole tree.

Note that a single node is also a tree, except that the root node is itself. In Figure A, nodes K, L, F, etc. are trees, and they are all subtrees of the whole tree.

Once you know the concept of a subtree, you can also define a tree as consisting of a root node and several subtrees.

Empty tree: If the set itself is empty, the resultant tree is said to be empty. There are no nodes in an empty tree.

Addendum: In a tree structure, subtrees with the same root cannot intersect each other. For example, in Figure (A), except for the root node A, the other elements form three subtrees respectively. The root nodes are B, C and D, and these three subtrees have no identical nodes with each other. If it does, it breaks the structure of the tree and is not considered a tree.

The number of nodes and the depth of the tree

  • The number of layers of a node: The number of layers of the root node is 1, and the number of layers of the rest nodes is equal to the number of layers of their parent nodes plus 1.
  • cousin: Parents are Cousins at the same tier.
  • The depth of the tree: The maximum number of layers of nodes in a tree is called the tree depth.

Tree traversal

  • Sequential traversal (DLR): sequential traversal If the binary tree is empty, the operation is null; otherwise, the binary tree is emptyAccessing the rootThe left subtree of the root is traversed sequentiallyThe right subnumber of the root is traversed in order first.
  • Middle-order traversal (LDR): If the binary tree is empty, the operation is null; otherwise, the binary tree is emptyMiddle order traverses the left subtreeAccessing the rootMiddle order traverses the right subtree.
  • Postorder traversal (LRD): If the binary tree is empty, the operation is null; otherwise, the binary tree is emptyThe left subtree is traversed sequentiallyThe right subtree is traversed sequentiallyAccessing the root.

  • Sequence traversal first: – > D – A – > B > H – > I – > E – > C – > F – > G – > J – > K
  • In the sequence traversal: H – > D – > I – > – > E – > A – > B F – > C – > J – > K – > G
  • After sequence traversal: H – > I – – > E – > – > B > D F – > – > K J – > G – > C – > A

Binary tree

Binary tree is a kind of tree, and binary tree is also the most common kind of tree. How is it called a binary tree?

  • A non-empty binary tree has only one root node;
  • Each node has a maximum of two subtrees, which are called the left and right subtrees of the node.

There are two common full and complete binary trees in binary trees.

Full binary tree

Full binary tree: If the degree of each node except the leaf nodes is 2, the tree is called full binary tree. In addition to meeting the properties of ordinary binary trees, full binary trees also have the following properties:

  1. The number of nodes at level I in a full binary tree is2^n-1A.
  2. Full binary trees of depth K must have(2^k) - 1The number of leaves is2^k-1.
  3. A full binary tree does not have a degree of1, each branch node has two subtrees with the same depth, andA leaf nodeThey’re all at the bottom.
  4. The depth of a full binary tree with n nodes islog2(n+1).

Complete binary tree

Complete binary tree: If the binary tree is full except for the nodes in the last layer, and the nodes in the last layer are distributed from left to right, the binary tree is called complete binary tree.

For any complete binary tree, if the nodes contained are numbered from left to right according to the hierarchy, for any node I, the following conclusions of the complete binary tree are true:

  1. wheni > 1, the parent node is the node[i/2].(when I =1, it represents the root node and has no parent node).
  2. if2* I > n (total nodes), the nodeiThere must be no left child node (leaf node); Otherwise its left child is the node2*i.
  3. if2*i + 1 > n, the nodeiThere are definitely no right child nodes; Otherwise, the right child node is the node2*i + 1.

Binary tree traversal

Binary trees also have first order traversal, middle order traversal, subsequent traversal, but will often be asked, breadth first, depth first.

Breadth-first search (BFS) is an algorithm that traverses or searches data structures, such as trees or graphs, and can also be used in more abstract scenarios.

The characteristic of this is that nodes that are closer to the root are traversed earlier.

For example, we can use BFS to find the path from the start node to the target node, especially the shortest path.

In BFS, nodes are processed in exactly the same order as they were added to the queue, first-in, first-out, so breadth-first searches are generally implemented using queues.

  • Prints the binary tree from top to bottom
  • randomly
  • The importance of employees
  • The number of islands

Binary search tree

A binary search tree is a binary sort tree, also called a binary search tree. A binary search tree is either an empty tree or a binary tree with the following properties:

  1. If the left subtree is not empty, the value of all nodes in the left subtree is less than the value of its root node.
  2. If the right subtree is not empty, the value of all nodes in the right subtree is greater than the value of its root node.
  3. The left and right subtrees are also binary sorting trees.
  4. There are no nodes with equal keys.

Balanced binary trees

A balanced binary tree, also known as an AVL tree, is either an empty tree or a binary tree with the following properties:

  • It’sThe left subtreeandThe right subtreeAre allBalanced binary trees.
  • The left subtreeandThe right subtreetheThe absolute value of the difference in depth is no more than 1.

AVL treeIt was invented firstSelf-balanced binary search tree algorithm. inAVLThe maximum difference in the height of the children’s tree of any node in1So he is also calledHeight balanced tree.N nodestheAVL treeMaximum depth approx.1.44 log2n.To find the,insert,deleteIn balance and in the worst caseO(logn). Additions and deletions may require one or more tree rotations to rebalance the tree.

Red and black tree

A red-black tree is a balanced binary tree that guarantees O(logn) time complexity for basic dynamic geometry operations in the worst case. The difference between a red-black tree and a balanced binary tree is as follows:

  1. Red-black trees give up the pursuit of perfect balance and pursue approximate balance. Under the condition that the time complexity of balanced binary trees is not much different from that of balanced binary trees, each insertion is guaranteed to be only needed at mostThree rotatingIt’s going to balance out and it’s going to be easier to do.
  2. Balanced binary trees pursue absolute equilibrium, which is difficult to realize because the number of rotation after each insertion of new nodes is unpredictable.

figure

Linear table and tree diagram is a relatively more complex data structure, the linear table, only the linear relationship between data elements, in a tree structure, has the obvious hierarchical relationships between data elements, and in the graph structure, the relationship between the nodes can be arbitrary, between any two data elements in the figure is likely to be relevant.