Algorithm clearance manual: 01 Algorithms and data structures

This post was originally posted on my personal blog:”Programmer charging Station”

Article links:”Portal”

The data structure is the skeleton of the program, and the algorithm is the soul of the program.

Algorithm + Data Structure = Program is a very famous book written by Niklaus Emil Wirth, the father of Pascal. The title of the book became one of the most famous lines in computer science. It can be seen that for program design, the relationship between algorithm and data structure is inseparable.

So before we do that, we need to figure out what is an algorithm? What is a data structure? Why learn algorithms and data structures?

Simply put, an Algorithm is a method or process to solve a problem. If we think of a problem as a function, then an algorithm is the process of converting input into output. A Data Structure is a computer representation of Data and the corresponding set of operations. Programs are concrete implementations of algorithms and data structures.

If we think of “programming” as cooking, then “data structures” are ingredients and spices, and “algorithms” are different cooking methods, or recipes. Different ingredients and spices, different cooking styles, different permutations. The same thing, made by different people, the taste of nature is also very different.

Why learn algorithms and data structures?

Take cooking, for example. When we cook dishes, we emphasize the perfect combination of color, aroma and taste. The same is true of programming, where we seek to solve problems by choosing more appropriate “data structures” and using “algorithms” that take less time and space.

We learn algorithms and data structures in order to learn how to consider solutions in terms of time complexity and space complexity in programming, train our logical thinking, and write high-quality codes, so as to improve our programming skills and obtain higher work returns.

Of course, just like cooking, mastering the ingredients and spices, learning how to cook, does not mean that you will make a delicious stir-fry dish. And just because you know algorithms and data structures doesn’t mean you can write programs. It takes a lot of thinking and thinking, and a lot of learning, to be a good cook (programmer).

1. Data structure

Data Structure: A collection of Data elements with structural properties.

Simply put: “data structure” refers to the organizational structure of data, used to organize and store data.

In terms of expansion, data structure studies the logical structure and physical structure of data as well as the mutual relationship between them, and defines the corresponding operation of this structure, designs the corresponding algorithm, and ensures that the new structure obtained after these operations still maintains the original structure type.

The function of data structure is to improve the utilization of computer hardware. For example, the operating system wants to find out where the application “Microsoft Word” is stored on the hard disk. It would be inefficient to scan the entire hard disk, but if you use the “B+ tree” as an index, you can easily search for the Word Microsoft Word, and then quickly locate the file information of the application “Microsoft Word”. Find the disk location from the file information.

The purpose of learning data structure is to help us understand and master how the data in the computer is organized and stored.

For data structures, we can classify data according to its “logical structure” and “physical structure”.

1.1 Logical structure of data

Logical Structure: The relationships between data elements.

According to the different relationships between elements, the logical structure of data can be divided into the following four types:

1. Set structure

Collection structure: Data elements belong to the same collection and have no other relationship.

The data elements in the collection structure are unordered and each data element is unique; there are no identical data elements in the collection. A set structure is much like a “set” in the mathematical sense.

2. Linear structure

Linear structure: A one-to-one relationship between data elements.

A data element in a linear structure (except for the first and last elements) has only one neighbor to the left and one neighbor to the right. Linear structure types include arrays, linked lists, and stacks, queues, and hashes derived from them.

3. Tree structure

Tree structure: There is a one-to-many hierarchical relationship between data elements.

The simplest tree structure is a binary tree. This structure can be simply expressed as: root, left subtree, right subtree. The left and right subtrees have their own subtrees. Of course, in addition to binary tree, tree structure types also include: multi-fork tree, dictionary tree and so on.

4. Graphic structure

Graphical structure: Data elements are many-to-many relationships.

Graphic structure is a more complex nonlinear structure than tree structure, used to represent the relationship between objects. A graph consists of small dots (called vertices or nodes) and the lines or curves (called edges) that connect them.

In graph structure, any two nodes can be correlated, that is, the adjacency relation between nodes can be arbitrary. Graph structure types include: undirected graph, directed graph, connected graph and so on.

1.2 Physical Structure of data

Physical Structure: How the logical Structure of data is stored in a computer.

There are many kinds of storage structures in the computer, the most used are these two structures: sequential storage structure, chain storage structure.

1. Sequential storage structure

Sequential Storage Structure: Data elements are stored in a piece of Storage with consecutive addresses. The logical relationship between data elements is directly reflected by the Storage addresses of the data elements.

In a sequential storage structure, logically adjacent data elements must also be physically adjacent.

The advantage of this structure is that it is simple, easy to understand, and actually takes up the least storage space. Disadvantages: Need to occupy a piece of continuous address storage unit; And storage allocation should be carried out in advance; In addition, some operations are less time efficient (move, delete elements, etc.).

2. Chain storage structure

Linked Storage Structure: Storing data elements in arbitrary Storage units, which may be contiguous or discontiguous.

In chained storage, logically contiguous data elements may or may not be contiguous in physical addresses. Its representation at the physical address is random. In the chain storage structure, the combination of several elements occupied by each data element is generally called a chain node. Each link contains not only the data information of a data element, but also an address, called a pointer, that indicates the link at which the data element’s immediate successor in the logical relationship resides. In other words, the logical relationships between data elements are indirectly reflected through Pointers.

The advantages of this structure are: storage space does not need to be allocated in advance, when the need for storage space can be applied for temporarily, will not cause space waste; Some operations are far more time efficient than sequential storage structures (insert, move, delete elements). The disadvantage is that not only the data information of the data element itself takes up storage space, but also Pointers. The space overhead of chained storage structure is larger than that of sequential storage structure.

2. The algorithm

Algorithm: An accurate and complete description of the steps used to solve a particular problem, represented in a computer as a set of instructions. An Algorithm represents a strategic mechanism for describing a problem in a systematic way.

Simply put: “Algorithm” refers to a way to solve a problem.

To expand: an algorithm is a series of operational steps that express a general method for solving a certain kind of computing problem. For any input of this kind of method, it can be calculated step by step and finally produce an output. It is independent of any language and can be described in natural language, programming language (Python, C, C++, Java, etc.), as well as pseudo-code, flow chart representation.

Let me give you some examples of what an algorithm is.

Example 1:

Question: How do I get from Shanghai to Beijing?

Solution: We can fly, we can take the bullet train, we can take the coach. The time cost and money cost of different solutions are different. For example, taking a plane takes the least time but costs the most, while taking a long-distance bus costs less but takes longer. If we choose a compromise that doesn’t take too long or cost too much, we can take the bullet train or train.

Example 2:

Question: How to calculate 1 + 2 + 3 +… Plus 100?

Solution: We can choose to use a calculator and start with 1, keep going right and add 2, then 3, then… Until you get to 100, you get 5,050. (1+100)∗1002=5050 \frac{(1+100) * 100}{2} = 50502(1+100)∗100=5050 (1+100)∗100=5050).

Example 3:

Question: How do I sort an array of n integers in ascending order?

Solution: We can choose bubble sort to sort the N integers, also can choose insert sort, merge sort, quick sort, etc.

All three of these examples can be considered algorithms. The way to get from Shanghai to Beijing can be considered an algorithm, and the way to sum numbers from 1 to 100 can also be considered an algorithm. The way you sort an array can also be viewed as an algorithm. And we can see from these three examples that there are often different algorithms for a particular problem.

2.1 Basic characteristics of the algorithm

An algorithm is simply a series of steps that solve a particular problem. In addition, the algorithm must have the following characteristics.

1. The input

For any problem to be solved, it has to be handed to the corresponding algorithm in some way. The parameters initially assigned to the algorithm before it begins are called inputs. For example, the input in example 1 is the parameters of the origin and destination (Beijing, Shanghai), and the input in example 3 is an array of N integers.

An algorithm can have multiple inputs or no inputs. For example, if you’re solving a fixed problem, you have no input.

2. The output

Algorithms exist to solve problems, and they always need to return a result. So you need at least one or more parameters as the output of the algorithm. For example, the output in Example 1 is the final chosen mode of transportation, and the output in Example 2 is the result of the sum. The output in example 3 is a sorted array.

3. There are poor

The algorithm must end in a finite number of steps and should be completed in an acceptable amount of time. For example, if we choose to travel from Shanghai to Beijing on May 1st, and the result is that we spend three days agonizing about how to go to Beijing during May 1st, then the travel plan will be ruined, and the algorithm is also unreasonable.

4. A deterministic

Each instruction that constitutes the algorithm must have a clear and definite meaning, so as not to cause ambiguity or ambiguity in readers’ understanding. That is, each step of the algorithm must be precisely defined without ambiguity.

Feasibility of 5.

Each step of the algorithm must be executable and can be realized by a finite number of operations in the current environment. That is, each step can be performed a finite number of times and can be converted into a program that runs on a computer and gets the correct result.

2.2 Objective pursued by the algorithm

To study the function of algorithm is to make the method of solving problems become more efficient. We tend to have multiple algorithms for a given problem. And different algorithms have different costs. In general, a good algorithm should pursue at least two goals:

Less running time required (less time complexity);
Less memory footprint (less space complexity).

Given that the computer takes 1 nanosecond to execute a command (which is not scientific), the first algorithm takes 100 nanoseconds to execute and the second takes 3 nanoseconds. If the memory footprint is not considered, it is clear that the second algorithm is much better than the first algorithm.

Given the size of a computer memory cell is one byte, the first algorithm takes up three bytes of memory space, and the second algorithm takes up 100 bytes of memory space. If the running time is not taken into account, the first algorithm is obviously much better than the second algorithm.

In reality, algorithms often need to consider both running time and space. Of course, an algorithm with less running time and less space footprint is always better, but there are all sorts of factors that make running time and space footprint incompatible. For example, if the running time of a program is too high, we can consider playing with space, sacrificing a certain amount of space in exchange for a shorter running time. Or, if the program is not very demanding on running time and the device memory is limited, choose an algorithm that takes up less space but requires a certain amount of time sacrifice.

Of course, in addition to running time and memory footprint, a good algorithm should also pursue the following goals:

Correctness: Correctness means that the algorithm can meet the requirements of specific problems, the program runs normally without syntax errors, and can pass typical software tests to meet the expected requirements.
Readability: Readability refers to that the algorithm follows the naming rules of identifiers, is concise and easy to understand, and the annotation statements are appropriate, so that it is easy to read by oneself and others, and easy to modify and debug later.
Robustness: Robustness refers to how well the algorithm responds to and handles illegal data and operations.

These three goals are the basic criteria that all algorithms must meet. The criteria for a good algorithm are the aforementioned less running time (less time complexity) and less memory (less space complexity).

3. Summary

3.1 Summary of data structure

Data structures can be divided into “logical structures” and “physical structures”.

Logical structure can be divided into: set structure, linear structure, tree structure, graph structure.
Physical structure can be divided into: sequential storage structure, chain storage structure.

“Logical structure” refers to the relationship between data, and “physical structure” refers to the representation of this relationship in a computer.

For example, the “stack” in a linear table has a one-to-one relationship between its data elements. Every node except the head and tail node has a unique precursor and a unique successor, which reflects the logical structure. For the nodes in the stack, they can be stored in the computer in the way of sequential storage (that is, sequential stack), and their structure in the form of a continuous storage space in the computer. Each node in the stack and its precursor node and successor node are physically adjacent. Of course, the nodes in the stack can also use chain storage (that is, chain stack), each node and its precursor node, the successor node is not necessarily physically adjacent, each node is accessed by the pointer field of the precursor node.

3.2 Algorithm Summary

By “algorithm” I mean a way to solve a problem. An algorithm is a series of operational steps that solve a particular problem.

The algorithm has five basic characteristics: input, output, fineness, determinacy and feasibility.

The algorithm seeks five goals: correctness, readability, robustness, less running time (less time complexity), and less memory footprint (less space complexity).

That’s all for this article, and we’ll look at the “time complexity” and “space complexity” of algorithms in the next article.

The resources

【 article 】 Data Structures and Algorithms
Big Talk data Structure, by Cheng Jie
Fun Algorithms by Chen Xiaoyu
The Art of Computer Programming (Volume 1) basic Algorithms (3rd Ed.) – Translated by Su Yunlin
Algorithmic Art and Informatics Competition — By Liu Rujia and Huang Liang