The data structure

What is a data structure

Is a collection of data elements that have one or more specific relationships with each other.

In plain English, it is a collection of data, but there is a special relationship between the data in the collection.

The logical structure of a data structure

Is the relationship between the data elements

  • Collection structure: Data elements in a collection have no relationship other than that they belong to the same collection.

    image-20200923103251902
  • Linear structure: There is a one-to-one relationship between data elements in a linear structure

    image-20200923103304092
  • Tree structure: There is a one-to-many hierarchical relationship between data elements in a tree structure

    image-20200923103708563
  • Graph structure: The data elements of a graph structure are many-to-many relationships

    image-20200923104315239

The physical structure of a data structure

The logical structure of data stored in a computer

  • Sequential storage structure: data elements are stored in contiguous storage units, and the logical and physical relationships between the data are consistent
  • Chain storage: The storage of data elements in any set of storage units, which can be contiguous or discontiguous

The data type

A data type is a collection of values of the same nature and the collective name for the operations defined on that collection

Data types fall into two categories

  • Atomic types: basic types that cannot be decomposed, including integers, floating-point types, and characters
  • Structure type: a combination of several types that can be refactored. for example, an integer array is composed of several integer data types

algorithm

What is an algorithm

An algorithm is a description of the steps used to solve a particular problem. In a computer, represented as a finite sequence of instructions, each of which represents one or more operations.

Characteristics of the algorithm

Five characteristics of the algorithm: input, output, finite, deterministic, feasibility

  • The algorithm has zero or more inputs, the algorithm has at least one or more outputs, the algorithm must be output, do not need output, why do you want this algorithm
  • Fineness: When an algorithm executes a finite number of steps, terminates automatically without an infinite loop, and each step completes in an acceptable amount of time.
  • Deterministic: Each step of the algorithm has a clear meaning and no ambiguity. The same input can only have a unique output
  • Feasibility: Each step of the algorithm must be feasible, that is, each step can be performed a finite number of times

Requirements for algorithm design

  • correctness
  • readability
  • Robustness,
  • High time efficiency, low storage

Time complexity of the algorithm

In algorithm analysis, the total execution times T(n) is a function of problem size n, and then the change of T(n) with n is analyzed and the order of magnitude of T(n) is determined.

The time complexity of the algorithm, that is, the time measure of the algorithm, is denoted as T(n) = O(f(n)), which means that with the increase of the problem size n, the growth rate of the algorithm execution time is the same as that of f(n). Where f(n) is some function of the size n of the problem.

Use capital O() to represent the time complexity notation of the algorithm.

Generally, the algorithm with the slowest growth of T(n) as n increases is called the optimal algorithm

O(1) constant order, O(n) linear order, O(n^2) square order, O(logn) logarithmic order

Common time complexity

O(1)<O(logn)<O(n)<O(nlogn)<O(n^2)<O(n^3)<O(2^n)<O(n!) <O(n^n)

  • Constant of the order
  • The logarithmic order
  • Linear order
  • Square order
  • The index order
  • Factorial order

Worst case versus average case

Worst-case runtime is a guarantee that the runtime will never be broken again. This is one of the most important requirements, and usually, unless specifically specified, the runtimes we mention are worst-case runtimes

The average run time is the most meaningful of all, because it is the expected run time.

Algorithm space complexity

The space complexity of the algorithm is realized by calculating the storage space required by the algorithm. The calculation formula of the space complexity of the algorithm is remembered :S(n)=O(f(n)). Where n is the size of the problem and f(n) is the function of the statement with respect to the space stored by n.

The linear table

A finite sequence of zero or more data elements

First of all, it is a sequence, that is, there is order between elements. If there are multiple elements, the first element has no precursor, the last element has no successor, and every other element has only one precursor and one successor. Another point that linear tables emphasize is finite.

The sequential storage structure of linear tables

The sequential storage structure of a linear table refers to the sequential storage of the data elements of a linear table with contiguous addresses

The advantages and disadvantages

  • Random access
  • No additional storage is required to represent logical relationships between tables
  • Inserting and deleting elements requires moving a large number of objects
  • When the length of the linear table changes greatly, it is difficult to determine the size of the storage space
  • Causing debris in space

Chain storage structure for linear lists

Compare the single-linked list structure with the sequential storage structure

  • Storage allocation mode
    • The sequential storage structure stores the data elements of a linear table in a sequence of storage units at a time
    • Single linked list uses chain storage structure, with a set of arbitrary storage cells to store the elements of a linear list
  • To find the
    • Sequential storage structure O(1)
    • Singly linked lists O (n)
  • Insert and delete
    • Sequential storage structures require an average of half the length of the table to move elements in O(n) time
    • The insertion and deletion time of a single linked list is only O(1) after a location is found.
  • Space performance
    • Sequential storage structure, pre-allocation, large waste, small overflow
    • Single-linked lists do not require pre-allocation and the number of elements is unlimited

The stack and queue

A stack is a linear table that limits insert and delete operations to the end of the table

A queue is a linear table that allows only an insert at one end and a delete at the other

The stack

First, it is a linear table, that is, the stack elements have a linear relationship, that is, a precursor and a successor relationship, but it is a special kind of linear table. Insert and delete operations are performed at the end of a linear table, which is at the top of the stack, not at the bottom.

Stack application – recursion

Call the function itself, directly or indirectly.

Recursive exit: no recursion definition must have at least one condition, when the recursion is not carried out, that is, no longer reference itself but return value deduce.

Stack application – four operational expressions to evaluate

A suffix expression that does not require parentheses can also be called inverse Poland. Postfix expressions are those in which all symbols appear after the number to be computed.

The queue

Linear tables have sequential storage, stack is linear tables, so there are two storage methods. Similarly, queue, as a special linear table, also has these two storage modes.

Circular queue

The solution to false overflow is to start again when the back is full, that is, the end to end loop, we call this sequential storage structure of the end to end loop queue.

string

A string is a finite sequence of zero or more characters, also called a string.

KMP

The KMP pattern matching algorithm is designed so that unnecessary backtracking does not occur

The tree

A tree is a finite set of n(n>=0) nodes and is called an empty tree when n=0. In any non-empty tree,

  • There is and only one specific node called the root
  • When n>1, the other nodes can be divided into M (m>0) finite sets T1, T2 and T3 that do not intersect each other. Each of these sets is itself a tree called a root subtree
image-20201009111011138
image-20201009111026048

Node classification

The number of subtrees a node has is called the degree of the node. Nodes with degree 0 are called page nodes or terminal nodes. Nodes whose degree is not zero are called non-terminal nodes.

In addition to root nodes, branch nodes are also called internal nodes

The degree of the tree is the maximum degree of each node inside the tree

Internode relation

The subtree of a node is called the child of the node, and correspondingly, the node is called the child’s parent.

Other concepts of trees

The hierarchy of nodes is defined from the root, which is the first layer and the child of the root is fed to the second layer.

The maximum level of nodes in a tree is called the tree depth

image-20201009132552841

If the sub-trees of nodes in the tree are regarded as orderly and not interchangeable from left to right, the tree is called ordered tree, otherwise, it is called disordered tree.

A forest is a collection of M trees that do not intersect each other.

Binary tree

Is a finite set of N nodes, which is either empty, or consists of a root node and two disjoint binary trees called left and right subtrees of the root node.

Characteristics of binary trees

  • Each node has at most two subtrees, so there is no node with a degree greater than 2 in the binary tree
  • The left and right subtrees are ordered and cannot be reversed
  • Even if there is only one subtree in the tree, distinguish whether it is a left or right subtree.

Special binary tree

  • Inclined tree

    A binary tree where all nodes have only left subtrees is called a left inclined tree. A binary tree whose nodes have only right subtrees is called a right-inclined tree. These two are collectively known as inclined trees

  • Full binary tree

    In a binary tree, a binary tree is called full if all branches have left and right subtrees and all leaves are on the same level

  • Complete binary tree

    A binary tree with N nodes is numbered in order. If the node numbered I and the node numbered I in the full binary tree with the same depth are in exactly the same position in the binary tree, the binary tree is called a complete binary tree.

    • Leaf nodes can only appear in the bottom two layers
    • The lowest level of leaves must be concentrated in the left continuous position
    • The second-to-last layer, if there are leaf nodes, must be in the right continuous position
    • If the node degree is 1, the node has only left subtree and no right subtree
    • The binomial and complete binomial trees with the same number of nodes have the least depth

Binary tree properties

  • There are at most 2^(I -1) nodes at the I th level of the binary tree
  • A binary tree of depth K has at most 2^ K – 1 nodes

Walk through the binary tree

Binary tree traversal means that all nodes in the binary tree are visited in some order starting from the root node, so that each node is accessed once and only once.

  • The former sequence traversal

    If the binary tree is empty, the null operation returns. Otherwise, the root node is visited first, the left subtree is traversed first, and the right subtree is traversed first.

  • In the sequence traversal

    If the binary tree is empty, the null operation returns, otherwise starting at the root (note that the root is not visited first), traversing the left subtree of the root, then visiting the root, and preferably traversing the right subtree

  • After the sequence traversal

    If the binary tree is empty, the null operation returns, otherwise, from left to right, first leaves and then nodes traversed through the left and right subtrees, and finally access the root node

  • Sequence traversal

    If the binary tree is empty, the null operation returns, otherwise, the nodes are accessed from the first layer of the tree, that is, the root node, traversed layer by layer from top to bottom, and accessed from left to right in the same layer

figure

A graph is composed of a finite non-empty set of vertices and a set of edges between vertices, usually expressed as G(V,E). Where G represents a graph, V is the set of vertices in graph G, and E is the set of edges in graph G.

Depth-first traversal

Also called depth-first search, or DFS for short

It starts from a vertex V in the graph, accesses that vertex, and then starts from v’s unvisited neighbors in a depth-first traverse of the graph until all vertices in the graph have the same path as V have been accessed.

Breadth-first traversal

Also known as breadth-first search or BFS for short

Similar to tree sequence traversal.

Minimum spanning tree

We call the minimum cost spanning tree for constructing a connected network the minimum spanning tree.

  • Prim Prim algorithm
  • Kruskal Kruskal algorithm

The shortest path

For a net graph, the shortest path refers to the path that has the least sum of edge weights between two vertices, and we call the first vertex on the path the source and the last vertex the destination.

  • Dijkstra algorithm
  • Flody algorithm

To find the

A lookup is an element in a lookup table whose key is equal to the given value, based on a given value.

In order to find

Also called linear search, is the most basic search technology, its search process is: Start from the first record in the table and compare the keyword with the given value one by one. If the keyword of a record is equal to the given value, the search succeeds and the searched record is found. If the comparison between the keyword and the given value is not equal until the last record, no record is searched in the table.

Ordered table lookup

Binary search

Split search is also called binary search, its premise is that the records in the linear table must be ordered key codes, linear table must use sequential storage. Binary search of the basic idea is: in the middle of the table in an orderly way, and take records as a comparative object, if the given value equals the key intermediate records, is to find success, if a given value is less than record keyword, is in the middle of the middle left area continue to find, if the given value is greater than the record keyword, is in the middle of the middle of the continue to find the right area. The appeal process is repeated until the search is successful or the search area is not recorded.

The interpolation to find

Interpolation search is a search method based on the comparison between the key to be searched and the key of the maximum and minimum records in the search table. Its core lies in the calculation formula of interpolation key -A [low] / a[high] -A [low]

Fibonacci search

Linear index search

The ultimate purpose of data structure is to improve the speed of data processing, index is designed to speed up the search of a data structure. Indexing is the process of associating a keyword with its corresponding record.

Index structure can be divided into linear index, tree index and multi – level index.

The so-called linear index is to organize the index item set into linear structure, also known as index table.

Linear index can be divided into dense index, block index and inverted index.

Dense index

Dense index refers to a linear index in which each record in a data set corresponds to an index item.

image-20201011111031512

For a dense index, the index entries must be ordered by key code

The number of dense indexes is equal to the number of records in the dataset

Partitioned indexes

Records in a data set are divided into blocks that satisfy two conditions

  • Piece of internal disorder
  • Block and orderly
image-20201011114157421

Inverted index

Indexing a word or record and using the document ID as a record make it easy to find the document by word or record.

Binary search tree

  • If its left subtree is not empty, then the value of all nodes in the left subtree is less than the value of its root node
  • If its right subtree is not empty, then the value of all nodes in the right subtree is greater than the value of its root node
  • Its left and right subtrees are also binary sort trees

The purpose of constructing a binary sort tree is not to sort, but to improve the speed of finding and inserting and deleting keywords

Balanced binary trees

A balanced binary tree is a binary sort tree in which the height difference between the left and right subtrees of each node is at most equal to 1.

Subtracting the depth of the left subtree from the right subtree is called the balance factor. The values are -1, 0, and 1

Multipath lookup tree

The number of children per node in a multipath lookup tree can be two, and multiple elements can be stored at each node.

B tree

B tree is a balanced multi – path search tree

image-20201011153601460

The gray square on the left represents the number of elements in the current node.

B + tree

image-20201011153834092

Hash table

The hash technique is to establish a definite correspondence f between the storage location of the record and its key, so that each key corresponds to a storage location F

Resolve hash conflicts

  • Open addressing
  • Rehash function
  • Chain address method
  • Public overflow zone method

The sorting

  • Exchange sort

    • Bubble sort
    • Quick sort
  • Insertion sort

    • Direct insertion sort
    • Hill sorting
  • Selection sort

    • Simple selection sort
    • Heap sort
  • Merge sort

  • Radix sort

Big talk about data structures