What are data structures, algorithms? How does the algorithm apply to business?

An overview of

1.1 Overview of data structures

1.1.1 overview

Data structures are the way computers store and organize data. A data structure is a collection of data elements that have one or more specific relationships with each other. More often than not, a carefully chosen data structure can lead to higher running or storage efficiency.

1.1.2 division

Data structure we focus on different dimensions, different ways of dividing. Data structures can be divided into logical results and physical structures.

  1. Logical structure

Logical structure, the logical relationship between the elements. Logical relationships are the formal relationships between elements, regardless of where they are stored in the computer. The types are as follows:

Linear structure: one-to-one association, formation tree structure: one-to-many association, tree graph structure: many-to-many association, mesh data physical structure refers to the storage form of logical structure in computer storage space (also known as storage structure). Generally speaking, the logical structure of a data structure can be expressed as a variety of storage structures according to the need, the commonly used storage structures are sequential storage, chain storage, index storage and hash storage.Copy the code
  1. The physical structure

Data is stored in a computer location

Sequential storage: a group of storage units with contiguous addresses are used to store each data element of the set in sequence, which can be randomly accessed, but the addition and deletion require a large number of mobile chain storage: continuous storage is not required, each node is composed of data domain and pointer domain, occupying extra space, fast addition and deletion, and slow search need to traverse index storage: In addition to storing node information, additional index tables are also established to identify node addresses. Fast retrieval, large space occupation hash storage: the storage location of data elements and key codes to establish a certain corresponding relationship, fast retrieval, there are mapping function collision problemCopy the code

1.1.3 Common data structures in programs

Each data structure finds its counterpart in the logical and physical structures described above.

Array: continuous storage, linear structure, can be read randomly according to the offset, difficult to expand Stack: linear storage, only one end of the operation, advanced after out, similar to bucket Queue: similar to the Stack, can be operated at both ends. First in, first out (FIFO), similar to a water pipe LinkedList: linked storage, with Pointers to the front and back nodes, can be a bidirectional Tree: Typical nonlinear structure, starting from a single root node, the child node one-way execution of the precursor (parent node) Graph: Another kind of nonlinear structure, composed of nodes and ties, they had no root, the concept of a link between each other Heap (Heap) : special tree, is characterized by the value of the root node is the smallest of all nodes or the biggest, and the subtree is Heap Hash (Hash) : derived from the Hash function, to make a functional value mapping, mapping output as the storage addressCopy the code

1.2 Overview of the algorithm

Algorithms refer to how data can be processed more efficiently based on the storage structure. The operation of data is defined in the logic of the data structure, but the concrete realization of the operation needs to be done in the storage structure, which generally involves the following operations:

Search: Search a data structure for nodes that meet certain criteria. Insert: To add a new node to a data structure, usually with a point of position required. Delete: The removal of a specified node from a data structure, which may imply retrieval. Update: Changes the value of one or more fields of the specified node, again implicitly retrieved. Sort: To rearrange the data in a node in a specified order, such as ascending or descending.Copy the code

1.3 complexity

1.3.1 Time Complexity

Time spent on some operation, capital O. In general, time is a dimension that is not easy to measure, and to calculate time complexity, it is common to estimate the number of units of operation of an algorithm, assuming that each unit runs for the same amount of time. In general, there are several common time complexities:

  1. Constant order O(1) : Time is independent of data size, such as swapping two variable values
int i=1,j=2,k k=i; i=j; j=k;Copy the code
  1. Linear order O(n) : Time and data size are linear and can be understood as n ^ 1, such as operations in a single loop
for(i=1; i<=n; i++){ do(); }Copy the code
  1. Order K O(nk) : The number of executions is the k power of the number, such as multiple loops, and the following is an instance of order 2 (n2)
for(i=1; i<=n; i++){ for(j=1; j<=n; j++){ do(); }}Copy the code
  1. Exponential order O(2n) : The number of operations increases exponentially as n increases
for(i=1; i<= 2^n; i++){ do(); }Copy the code
  1. Log order O(log2n) : The number of executions is logarithmically reduced as follows
for(i=1; i<=n;) { i=2^i; do(); }Copy the code
  1. Linear logarithmic order O(nlog2n) : A linear n-fold product based on logarithmic order
for(i=1; i<=2^n; i++){ for(j=1; j<=n; j++){ do(); }}Copy the code
  1. conclusion
In time complexity from small to large order: Ο (1) < Ο (log2n) < Ο (n) < Ο (nlog2n) < Ο (n2) <... < O (NK) < O (2N) < O (N!)Copy the code

1.3.2 Space Complexity ()

Spatial complexity is a measure of how much memory an algorithm occupies during its running. In addition to storage space and storage of its own instructions, constants, variables and input data, a program needs some auxiliary space to operate on data. The spatial complexity mainly refers to the magnitude of the space.

  1. Fixed space
It mainly includes the space occupied by instruction space, constant, simple variables, etc. The size of this part of the space has nothing to do with the number of operation data, belonging to the static space.Copy the code
  1. The variable space
It mainly includes the temporary space dynamically allocated during operation, as well as the space required by the recursive stack, etc. The space size of this part has a great relationship with the algorithm.Copy the code
  1. Classification of spatial complexity

The space complexity is also represented by uppercase O. Compared with the time complexity scenario, the common levels are O(1) and O(n). For example, O(1) : indicates a constant order, and the space occupied is independent of the amount of data.

Int [] a={1,2,3,4,5}; Int I = 0, j = a. ength ‐ 1; while (i<=j){ int temp = a[i]; a[i]=a[j]; a[j]=temp; i++; J ‐ ‐; }Copy the code

2) O(n) : linear order, linearly related to the size of data volume

Int [] a={1,2,3,4,5}; int[] a={1,2,3,4,5}; int[] b=new int[a.length]; for (int i = 0; i < a.length; I++) {b = [I] a [a. ength ‐ ‐ I] 1. }Copy the code

1.3.3 analogy

For an algorithm, its time complexity and space complexity are often mutually affected. Low time complexity may be compensated by taking up more storage space, whereas an algorithm that takes up less space may need to take up more computing time. There is often a trade-off to be struck. In the specific environment of the business, also need to consider the performance of the algorithm, such as the frequency of use, the size of the amount of data, the development language used, running machine system, etc. To design the best algorithm for the current scenario, consider both advantages and disadvantages.Copy the code

1.4 Algorithm idea

1.4.1 Divide and conquer

Divide a complex problem into two or more identical or similar sub-problems, and then divide the sub-problems into smaller sub-problems, until the sub-problems are small enough to be solved in a simple and direct way, and the solution of the original problem is the combination of the solutions of the sub-problems.

  1. Divide-and-conquer has certain requirements for problems:
When the problem is narrowed down to a point where it can be easily solved and the problem is detachable, it's not an indissoluble mess and the disassembled answer is separable. The subproblems that can be assembled into the final result are independent of each other and have little or no dependence on each otherCopy the code

1.4.2 Dynamic Planning

The problem to be solved is decomposed into several sub-problems (stages), and the sub-stages are solved in sequence. The solution of the former sub-problem provides useful information for the solution of the latter sub-problem. When solving any subproblem, all possible local solutions are listed, and those local solutions that are likely to be optimal are retained through decision making, while others are discarded. Each subproblem is solved in turn, and the last subproblem is the solution of the original problem.

  1. Dynamic programming algorithm also has certain applicability scenario requirements:
Optimization solution: after dismantling of sub phase optimization solution, and the optimal solution and tracking the answer in the same direction process forward, no aftereffect sex: a stage of the solution once established, just be sure, will only affect the next step, and won't affect reverse phase correlation: stage is not independent, up and down on a stage will provide policy guidance for the next phase of the action. This is not necessary, but if it has this feature, the significance of dynamic programming algorithm can be more embodiedCopy the code

1.4.3 Greedy algorithms

Similarly, the problem requirements are disassembled, but at each step, the current part is taken as the target, and the optimal solution of the part is obtained. Then when the final problem is solved, the optimal solution of the whole is obtained. In other words, when solving the problem, they always make the best choice in the current view, instead of considering the overall optimality. From this point of view, the algorithm has some scene limitations.

  1. Usage scenarios
It is required that the problem can be disassembled, and the state of each step after disassembly has no aftereffect (similar to the dynamic programming algorithm). It is required that the local optimization of each step of the problem is in the same direction as the overall optimal solution. At least it leads in the right main direction.Copy the code

1.4.4 Backtracking algorithm

The backtracking algorithm is actually an enumeration-like search attempt, listing possible solutions to the problem at each step. Select a scheme to explore the depth, looking for the solution of the problem, when it is found that the solution condition is not satisfied, or the depth reaches a certain number, it will return and try another path. Backtracking is generally suitable for more complex and large-scale problems. It is called “general method of problem solving”.

The solutions to the problems are demonstrable and limited in number to define the depth of the backtracking points. After a certain point, turn backCopy the code

1.4.5 Branch limits

Similar to backtracking, it is also a way to find the optimal solution by enumeration in space. But the backtracking strategy is depth-first. The branch method is breadth-first. The branch method usually finds all the adjacent nodes, adopts the elimination strategy first, abandones the nodes that do not meet the constraints, and adds the remaining nodes to the loose knot table. Then select a node from the survival table as the next operation object.

Failure algorithm and its application

Invalidation algorithms are common in cache systems. Because the cache usually occupies a large amount of memory, and the memory space is relatively expensive and limited, the invalidation or removal operation should be carried out according to the corresponding algorithm for a score value.

2.1 First come, first eliminated (FIFO)

  1. An overview of the

First In First Out. This algorithm removes the earliest data inserted at each new data insertion if the queue is full.

  1. implementation

You can easily do this with LinkedList

public class FIFO { LinkedList<Integer> fifo = new LinkedList<Integer>(); int size = 3; Public void add(int I) {fio.addFirst (I); if (fifo.size() > size) { fifo.removeLast(); } print(); Public void read(int I) {Iterator<Integer> Iterator = fibo.iterator (); while (iterator.hasNext()) { int j = iterator.next(); if (i == j) { System.out.println("find it!" ); print(); return; } } System.out.println("not found!" ); print(); Public void print() {system.out.println (this.fifo); } public static void main(String[] args) {FIFO FIFO = new FIFO(); System. The out. Println (" ‐ 3: add 1 "); fifo.add(1); fifo.add(2); fifo.add(3); System.out.println("add 4:"); fifo.add(4); System.out.println("read 2:"); fifo.read(2); System.out.println("read 100:"); fifo.read(100); System.out.println("add 5:"); fifo.add(5); }}Copy the code
  1. Results presentation and analysis

Summary: Easy to implement. But regardless of element usage, even heavily used data can be kicked out.

2.2 Most obsolete (LRU)

LRU stands for Least Recently Used, that is, eliminate the value that has been Used for the longest time. The following is still the example of a linked list: the newly added data is placed in the header, the most recently queried data is also moved to the header, and when the space is full, the tail element is deleted.

public class LRU { LinkedList<Integer> lru = new LinkedList<Integer>(); int size = 3; Public void add(int I) {lru.addFirst(I); if (lru.size() > size) { lru.removeLast(); } print(); Public void read(int I) {Iterator<Integer> Iterator = lru.iterator(); int index = 0; while (iterator.hasNext()) { int j = iterator.next(); if (i == j) { System.out.println("find it!" ); lru.remove(index); lru.addFirst(j); print(); return; } index++; } System.out.println("not found!" ); print(); } public void print() {system.out.println (this.lru); } public static void main(String[] args) {LRU LRU = new LRU(); System. The out. Println (" ‐ 3: add 1 "); lru.add(1); lru.add(2); lru.add(3); System.out.println("add 4:"); lru.add(4); System.out.println("read 2:"); lru.read(2); System.out.println("read 100:"); lru.read(100); System.out.println("add 5:"); lru.add(5); }}Copy the code
  1. Results analysis

2.3 Least Recent Obsolescence (LFU)

  1. An overview of the
Most Frequently Used It's weeding out the values that have been used the least recently.Copy the code
  1. implementation

It can be considered as one more judgment than LRU. LFU needs reference indexes of time and times. It should be noted that the two dimensions may involve the same number of visits in the same time period, so a counter and a queue must be built in. The counter counts, and the queue places the same count in the access time.

public class Dto implements Comparable<Dto> { private Integer key; private int count; private long lastTime; public Dto(Integer key, int count, long lastTime) { this.key = key; this.count = count; this.lastTime = lastTime; } @Override public int compareTo(Dto o) { int compare = Integer.compare(this.count, o.count); return compare == 0 ? Long.compare(this.lastTime, o.lastTime) : compare; } @Override public String toString() { return String.format("[key=%s,count=%s,lastTime=%s]", key, count, lastTime); } public Integer getKey() { return key; } public void setKey(Integer key) { this.key = key; } public int getCount() { return count; } public void setCount(int count) { this.count = count; } public long getLastTime() { return lastTime; } public void setLastTime(long lastTime) { this.lastTime = lastTime; }}Copy the code
public class LFU { private final int size = 3; private Map<Integer, Integer> cache = new HashMap<>(); private Map<Integer, Dto> count = new HashMap<>(); Public void put(Integer key, Integer value) {Integer v = cache.get(key); if (v == null) { if (cache.size() == size) { removeElement(); } count.put(key, new Dto(key, 1, System.currentTimeMillis())); } else { addCount(key); } cache.put(key, value); } public Integer get(Integer key) {Integer value = cache.get(key); if (value ! = null) { addCount(key); return value; } return null; } private void removeElement() {Dto Dto = collections.min (count.values()); cache.remove(dto.getKey()); count.remove(dto.getKey()); Private void addCount(Integer key) {Dto Dto = count.get(key); Dto.setCount(Dto.getCount() + 1); Dto.setLastTime(System.currentTimeMillis()); Private void print() {system.out.println ("cache=" + cache); System.out.println("count=" + count); } public static void main(String[] args) { LFU lfu = new LFU(); Println ("add 1‐3:"); // Add system.out.println ("add 1‐3:"); lfu.put(1, 1); lfu.put(2, 2); lfu.put(3, 3); lfu.print(); System.out.println("read 1,2"); lfu.get(1); lfu.get(2); lfu.print(); System.out.println("add 4:"); lfu.put(4, 4); lfu.print(); System.out.println("read 2,4"); lfu.get(2); lfu.get(4); lfu.print(); System.out.println("add 5:"); lfu.put(5, 5); lfu.print(); }}Copy the code
  1. Results analysis

2.4 Application Cases

  1. Redis is a typical application scenario of cache invalidation. The common policies are as follows:
Noeviction: Won't delete policy, error message will be returned if more memory is required when Max memory limit is reached. Allkeys-lru: deletes the least recently used key (lru) first for allkeys. Allkeys-random: randomly delete some of allkeys (sounds unreasonable). Volatile - LRU: Applies only to keys with expire configured. The least recently used keys (LRU) are deleted first. Volatile -random: Applies only to keys with expire configured. Some keys are deleted randomly. Volatile - TTL: applies only to keys with expire configured. Keys with a short TTL are deleted in preference.Copy the code

Scheduling algorithm and application

Scheduling algorithms are common in operating systems, because system resources are limited, when there are multiple processes (or requests from multiple processes) to use these resources, they must be selected according to certain principles (requests) to occupy resources. This is called scheduling.

3.1 First come, first served (FCFS)

  1. concept

Follow the order in which service requests are submitted.

  1. implementation

Define a Task class as the Task instance and BlockingQueue as the service queue

Public class Task {// Task name private String name; Private Long addTime; Private int servTime; public Task(String name, int servTime) { this.name = name; this.servTime = servTime; this.addTime = System.currentTimeMillis(); } public void execute() {try {//! Thread.currentthread ().sleep(servTime); } catch (InterruptedException e) { e.printStackTrace(); } System.out.println(String.format("execute:name=%s,addTime=%s,servTime=%s", name, addTime, servTime)); }}Copy the code
Public class FCFS {public static void main(String[] args) throws InterruptedException { Final LinkedBlockingQueue<Task> queue = new LinkedBlockingQueue(5); // Service thread, New Thread(new Runnable() {@override public void run() {while (true) {try {queue.take().execute(); } catch (Exception e) { e.printStackTrace(); } } } }).start(); // Add a task to the queue for (int I = 0; i < 5; i++) { System.out.println("add task:" + i); queue.put(new Task("task" + i, new Random().nextInt(1000))); }}}Copy the code

3. The advantages and disadvantages

This mode is used in CPU intensive task scenarios, which is unfavorable for I/O intensive tasks. Business with relatively balanced time can be queued for processing, such as queueing to punch in the station in reality. If the service depends on a large number of external factors and the execution time slices vary, the FCFS algorithm is not conducive to the overall processing progress of the task and may cause a lot of waiting due to the blocking of a long time service.Copy the code

3.2 Short Work Priority (SJF)

  1. concept

Resources are given priority to those with short execution time. That is, before execution, declare a time I need to occupy the CPU, according to the length of time, the short will be scheduled first. I don’t have time so I’ll go first.

  1. implementation

TreeMap enables prioritization of tasks.

Public class SJF {public static void main(String[] args) throws InterruptedException { Final TreeMap<Integer, Task> TreeMap = new TreeMap(); For (int I = 0; i < 5; i++) { System.out.println("add task:" + i); int servTime = new Random().nextInt(1000); Treemap. put(servTime, new Task(" Task "+ I, servTime)); treemap. put(servTime, new Task(" Task" + I, servTime)); } new Thread(new Runnable() {@override public void run() {while (true) {try { Map.entry <Integer, Task> Entry = treemap.pollFirstEntry (); if (entry == null) { Thread.currentThread().sleep(100); } else { entry.getValue().execute(); } } catch (Exception e) { e.printStackTrace(); } } } }).start(); }}Copy the code
  1. Results analysis

4. The advantages and disadvantages

Apply to the task time difference is large scene, still take the station as an example, take out the bus card priority swipe card, have not done a card to make way. The problem of long FCFS processing time is solved, the average waiting time is reduced, and the system throughput is improved. Failure to take into account the urgency of the operation, thus not ensuring the timely processing of the urgent operation (process) is detrimental to the long operation, may wait for a long time without execution time based on estimates and declarations, subjective factors, can not achieve 100% accuracyCopy the code

3.3 Time Slice Rotation (RR)

  1. concept

Time slice by slice scanning polling, whose turn to execute. Like a rotary table, it can store several elements around it, and when elements are added, they are randomly placed where bits are placed. To execute, turn the turntable so that each element on the turntable can be rotated.

  1. implementation

Based on the array as data slot way to achieve.

Public class RR {// Define arrays as slots, each of which can hold tasks Integer[] integers; // The number of slots public RR(int length) {integers = new Integer[length]; Public void addTask(int value) {int slot = 0; // if (integers[slot] == null) {integers[slot] = value; // if (integers[slot] = value; System. Out. Println (the String. Format (" ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ > add task index = % s, value = % s ", slot, value)); break; } // if the current position is occupied, look at the next position slot++; // If (slot == integers. Length) {slot = 0; // If (slot == integers. }}} // Execute the task. Public void execute() {// Start a thread to process the task. @override public void run() {int index = 0; // If (index == integers. Length) {index = 0; // If (index == integers. } // if (integers[index] == null) {index++; continue; } else {thread.currentThread ().sleep(new Random().nextint (1000)); } catch (InterruptedException e) { e.printStackTrace(); } // Simulate the task execution content, Println (String.format(" executeIndex = % s, value = % s ",index,integers[index])); // Integer [index] = null; // Integer [index] = null; } } } }).start(); } public static void main(String[] args) {// Test start, define 3 slots RR RR = new RR(3); // Call up the executor thread and start polling rr.execute(); For (int I = 0; i < 10; i++) { rr.addTask(i); }}}Copy the code
  1. Results analysis

4. The advantages and disadvantages

Achieve the relative average of opportunities, because a task execution time is too long and will never be executed without task priority processing. Important tasks can't be prioritized. You have to wait for your turnCopy the code

3.4 Priority Scheduling (HPF)

  1. An overview of the

Process scheduling assigns the processor each time to the ready process with the highest priority. The highest priority algorithm can be combined with different CPU methods to form the preemptive highest priority algorithm and the non-preemptive highest priority algorithm.

  1. implementation

Add a Task attribute level to identify the priority

Public class Task {// Task name private String name; Private Long addTime; Private int servTime; public Task(String name, int servTime) { this.name = name; this.servTime = servTime; this.addTime = System.currentTimeMillis(); } public void execute() {try {//! Thread.currentthread ().sleep(servTime); } catch (InterruptedException e) { e.printStackTrace(); } System.out.println(String.format("execute:name=%s,addTime=%s,servTime=%s", name, addTime, servTime)); }}Copy the code

TreeMap is still used for sorting. Note that the key takes precedence

Public class HPF {public static void main(String[] args) throws InterruptedException { Final TreeMap<Integer, Task> TreeMap = new TreeMap(); For (int I = 0; i < 5; i++) { System.out.println("add task:" + i); int servTime = new Random().nextInt(1000); Treemap. put(I, new Task(" Task "+ I, servTime, I)); treemap. put(I, new Task(" Task" + I, servTime, I)); } new Thread(new Runnable() {@override public void run() {while (true) {try { At the bottom, map.entry <Integer, Task> Entry = treemap.polllastEntry (); if (entry == null) { Thread.currentThread().sleep(100); } else { entry.getValue().execute(); } } catch (Exception e) { e.printStackTrace(); } } } }).start(); }}Copy the code
  1. Results analysis

3.4 Application Cases

CPU resource scheduling Cloud computing resource scheduling container Docker scheduling and schedulingCopy the code