Chapter 1 algorithm introduction

  • 1. Binary search is much faster than simple search.
  • O(log n) is faster than O(n). The more elements that need to be searched, the faster the former will be than the latter.
  • 3. The running time of the algorithm is not in seconds. The big O notation allows you to compare operands, and it tells you how fast the algorithm is running.
  • 4. The running time of an algorithm is measured in terms of its speed. When we talk about the speed of an algorithm, we are talking about how fast the running time will increase as the input increases.
  • 5. The running time of the algorithm is expressed in big O notation.
  • 6. Logarithms are the reverse of powers.
  • 7. The big O notation indicates the worst-case running time
  • 8.5 kinds of big O running time. 1 minus O log n, also called logarithmic time, binary search. 2-o (n), also called linear time, simple search. 3 minus O times n log n, quicksort. 4-o (n squared), choose sort 5-O (n!). Travel agent problem.

Chapter 2 Selection sorting

  • 1. Computer memory is like a bunch of drawers.
  • 2. When you need to store multiple elements, you can use arrays or linked lists.
  • 3. The elements of the array are all together.
  • 4. Linked list elements are separate, with each element storing the address of the next element.
  • 5. Arrays are fast to read.
  • 6. Linked lists can be inserted and deleted quickly.
  • 7. All elements in an array must be of the same type.
  • 8. Select sorting time complexity O(n squared)

Chapter 3 recursion

  • 1. Recursion means calling your own function.
  • 2. Every recursive function has two conditions: a baseline condition and a recursive condition.
  • 3. The stack has two operations: push and pop.
  • 4. All function calls go to the call stack.
  • 5. Call stacks can be very long, which can take up a lot of memory.
  • 4. Recursion just makes the solution clearer, there is no performance advantage. Loops actually perform better. If a loop is used, the performance of the program may be higher; The program may be easier to understand if you use recursion.
  • 5. If the stack is high, it means that the computer stores a lot of information about function calls. In this case, you have two options. 1- Rewrite the code to use loops instead. 2- Use tail recursion, not all languages support tail recursion.
  • 6. Tail-recursive 1- A recursive function is tail-recursive if all calls to the recursive form of the function appear at the end of the function. A recursive call is tail recursion when it is the last statement to be executed in the entire function body and its return value is not part of the expression. Tail-recursive functions have the property of doing nothing during regression, which is important because most modern compilers take advantage of this feature to automatically generate optimized code. 2- Tail recursion is calculated from the end, each recursion calculated the corresponding result, that is, the function call appears at the end of the caller’s function, because it is the tail, so there is no need to save any local variables. The called function returns directly past the caller and back to the caller’s caller. Tail recursion is passing the result of the current operation (or path) to the lower function in the argument, the deeper function is not facing easier and easier problems, but more and more complex problems, because the argument with the previous several steps of the operation path. 3- Tail recursion is extremely important. Without tail recursion, the stack drain of functions is incalculable and many intermediate functions need to be saved. For example, f(n, sum) = f(n-1) + value(n) + sum; F (n, sum) = f(n-1, sum+value(n)); In this case, only the latter function stack is retained, and the previous one can be optimally deleted.

Chapter 4 Quicksort

  • 1.D&C will decompose the problem step by step. When dealing with lists using D&C, the baseline condition is likely to be an empty array or an array containing only one element.
  • 2. When implementing quicksort, randomly select the elements used as reference values. Quicksort takes O(n log n) on average, O(n squared) in the worst case.
  • 3. Constants in big O notation can sometimes be important, which is why quicksort is faster than merge sort, because quick lookups have smaller constants than merge lookups.
  • 4. For simple and binary lookups, constants are almost irrelevant because O(log n) is much faster than O(n) for long lists.
  • 5. The D&C algorithm is recursive. The problem solving process with D&C consists of two steps. (1) Identify baseline conditions, which must be as simple as possible. (2) Break the problem down (or scale it down) until it fits the baseline.

Chapter 5 hash table

  • 1. Hash tables are powerful data structures that are fast to manipulate and allow you to model data in different ways.
  • 2. You can combine hash functions and arrays to create hash tables.
  • 3. Collisions are bad, and you should use hash functions that minimize collisions.
  • 4. Hash table lookup, insert, and delete are very fast.
  • 5. Hash tables are suitable for simulating mappings.
  • 6. Once the fill factor exceeds 0.7, adjust the length of the hash table.
  • 7. Hash tables can be used to cache data (for example, on a Web server).
  • 8. Hash tables are great for preventing duplication.

Chapter 6 breadth-first Search

  • 1. Breadth-first search indicates whether there is A path from A to B.
  • 2. If so, breadth-first search will find the shortest path.
  • 3. When faced with a problem like finding the shortest path, try using graphs to model and then using breadth-first search to solve the problem.
  • 4. The edges in a directed graph are arrows whose directions specify the direction of the relationship
  • 5. Edges in an undirected graph have no arrows and the relationship is bidirectional
  • 6. Queues are first in first out (FIFO).
  • 7. The stack is last in first out (LIFO).
  • 8. You need to check people in the search list in the order they joined, otherwise you won’t find the shortest path, so the search list must be a queue.
  • 9. For those who have been checked, be sure not to check again, or it may lead to an infinite cycle.
  • 10. The running time of breadth-first search is written as O(V + E), where V is the vertice number and E is the edge number.

Chapter 7 Dikstra algorithm

  • 1. Breadth-first search is used to find shortest paths in unweighted graphs.
  • 2. Dikstra algorithm is used to find the shortest path in the weighted graph.
  • 3. Dikstra works only if the weights are positive.
  • 4. If the graph contains negative edge weights, use the Behrman-Forde algorithm.
  • 5. Dikstra algorithm consists of four steps. (1) Find out the “cheapest” node, which can be reached in the shortest time. (2) The cost of updating the neighbor of this node, the meaning of which will be explained later. (3) Repeat this process until it has been done for each node in the graph. (4) Calculate the final path.
  • 6. Dikstra algorithm is applied to graphs where each edge has associated numbers, called weights.
  • 7. Weighted graphs are called weighted graphs, and unweighted graphs are called unweighted graphs.
  • 8. In an undirected graph, each edge is a ring. The Dikstra algorithm is only applicable to directed Acyclic Graph (DAG).
  • 9. An undirected graph means two nodes pointing at each other, which are rings!

Chapter 8 Greedy algorithm

  • 1. Greedy algorithms seek local optimal solutions and attempt to obtain global optimal solutions in this way.
  • 2. No quick solution has been found for NP-complete problems.
  • 3. When faced with NP-complete problems, the best practice is to use approximate algorithms.
  • 4. Greedy algorithm is easy to implement, fast running speed, is a good approximation algorithm.
  • 5. Determine whether the problem is NP-complete problem 1- The algorithm runs very fast when the number of elements is small, but the speed will become very slow as the number of elements increases. 2- Problems involving “all combinations” are usually NP-complete problems. 3- Don’t break the problem down into small problems. Consider all possible scenarios. This could be an NP-complete problem. 4- If the problem involves sequences (such as city sequences in the travel salesman problem) and is difficult to solve, it may be NP-complete. 5- If the problem involves sets (such as a radio station set) and is difficult to solve, it may be NP-complete. 6- If the problem can be converted to a set covering problem or a traveling salesman problem, it must be NP-complete.
  • 6. Classic NP complete problems: set covering problem, traveling salesman problem, graph coloring problem, backpack problem, classroom scheduling problem
  • Collections are like lists, except that they cannot contain duplicate elements; You can perform some interesting set operations, such as union, intersection, and difference. Union means to merge sets. Intersection means finding elements that are in both sets. The difference set means that elements that appear in one set are eliminated from the other set.
  • 8. Approximation algorithms can be used when it takes too long to obtain an exact solution. The criteria for judging the merits of an approximate algorithm are as follows: 1- how fast it is; 2- The degree to which the approximate solution obtained is close to the optimal solution. Greedy algorithms are good choices. They are not only simple, but they are often fast.

Chapter 9 dynamic programming

  • 1. Dynamic programming is useful when you need to optimize an index under given constraints.
  • 2. When the problem can be decomposed into discrete sub-problems, dynamic programming can be used to solve the problem.
  • 3. Every dynamic planning solution involves grids.
  • 4. The values in the cells are usually the ones you want to optimize.
  • 5. Each cell is a subproblem, so you need to think about how to break the problem down into subproblems.
  • There is no one-size-fits-all formula for calculating dynamic programming solutions.
  • 7. What are the practical applications of dynamic programming? 1- Biologists determine similarities in DNA strands based on the longest common sequence to determine how similar two animals or diseases are. The longest common sequence has also been used to find treatments for multiple sclerosis. 2- Have you ever used a command like git diff? They point out the differences between the two files and are also implemented using dynamic programming. 3- Earlier we discussed how similar strings are. Levenshtein distance indicates how similar two strings are, which is also calculated using dynamic programming. Edit-distance algorithms are used for everything from spell-checking to determining whether a user’s uploaded profile is pirated. 4- Have you ever used an application such as Microsoft Word with hyphenation? How do they determine where to hyphenate to ensure consistent lines? Use dynamic programming!
  • 8. Dynamic programming helps you find the optimal solution under given constraints. In the backpack problem, you have to steal the highest value item for a given backpack capacity.
  • 9. Dynamic programming can be used to solve problems when they can be decomposed into independent and discrete sub-problems.
  • 10. Every dynamic planning solution involves grids.
  • 11. The values in a cell are usually the ones you want to optimize. In the previous knapsack problem, the value of the cell is the value of the item.
  • 12. Each cell is a subproblem, so you should consider how to divide the problem into subproblems, which will help you find the axes of the grid
  • 13.FQA 1- The order of the rows in the table does not matter. 2- Can you fill the grid column by column instead of row by row? In this case, it doesn’t make any difference, but for others, it might. 3- How about adding a smaller item, you need to consider finer granularity, so you have to adjust the mesh. 4- Can you steal part of the merchandise? Can’t handle it. With dynamic programming, you can either take the whole thing or you can’t decide if you should take part of it. But this situation can be handled easily with greedy algorithms! First, take as many of the highest value items as you can. If you run out, take as many of the next highest value items as you can, and so on. 5- Dealing with interdependent situations: Dynamic programming is powerful in solving sub-problems and using those answers to solve large problems. But dynamic programming works only if each subproblem is discrete, that is, independent of other subproblems. That means a trip to Paris can’t be solved using dynamic programming algorithms. 6- Will more than two subpacks be involved in calculating the final solution? It may be necessary to steal more than two items of goods to obtain the optimal solution to the knapsack problem. However, according to the design of dynamic programming algorithm, only two subbackpacks are merged at most, that is, no more than two subbackpacks are involved at all. But these subpacks may contain subpacks. 7- Can the optimal solution cause the backpack to be empty? It’s entirely possible.

Chapter 10 K-Nearest Neighbours Algorithm (K-Nearest Neighbours, KNN)

  • 1. Calculate the distance between two points. Using the Pythagorean formula, distance indicates how similar two sets of numbers are.
  • 2. KNN is used to do two basic tasks — regression and regression:
    • Classification is grouping;
    • Regression is a prediction of a result (such as a number).
  • Instead of calculating the distance between two vectors, comparing their angles is more suitable for complex cases.
  • 4. When using KNN, it is very important to select suitable features for comparison. The appropriate characteristics are:
    • Characteristics that are closely related to the result to be output
    • The characteristic of impartiality
  • OCR stands for Optical Character Recognition. When you take a picture of a printed page, a computer will automatically recognize the text in it. Interest recommendation system 3. Spam filter 4. Stock market prediction
  • 6. How does OCR automatically recognize numbers? KNN can be used. 1. Browse through a large number of digital images and extract features from these numbers. In general, OCR algorithm extracts features such as line segments, points and curves. 2. When you encounter a new image, you extract its features and find out who its closest neighbors are
  • 7. The first step of OCR is to look at a large number of digital images and extract features, which is called training. Most machine learning algorithms involve a training step: in order for a computer to perform a task, it must first be trained.
  • 8. Spam filters use a simple algorithm called Naive Bayes Classifier
  • 9. Feature extraction means converting items into a series of comparable numbers.
  • 10. The selection of suitable features is critical to the success or failure of KNN algorithm.
  • 11. Use normalization. Calculate the average rating of each user and adjust users’ ratings accordingly so that they can be compared based on the same criteria.
  • 12. If too few neighbors are considered, the results are likely to be biased. A good rule of thumb is to consider square root SQRT (N) neighbors if you have N users.

Chapter 11 What to Do Next

  • Fourier transform: MP3 format how it works, JPG compression format, earthquake prediction, DNA analysis, Shazam music recognition software
  • The speed of parallel algorithms is not linear, for reasons such as parallelism management overhead and load balancing.
  • MapReduce is a popular distributed algorithm that you can use through the popular open source tool Apache Hadoop.
    • Distributed algorithms are great for doing a lot of work in a short amount of time, and MapReduce is based on two simple ideas: map functions and reduce functions.
  • Consider using probabilistic algorithms when you’re faced with a huge amount of data and only want answers that are close to the point
  • Currently the most secure cryptographic hash function is bcrypt
  • Linear programming uses Simplex algorithm
  • Simhash can be used for locally sensitive hash algorithms

Chapter 11 additional extensions

  • Application of BloomFilter 1. Blacklist For example, check whether an email address is in the blacklist. 2. If you think about it, the BitSet “incidentally” sorts the value of a set(int value). A typical example is Hbase. Each Region of Hbase contains a BloomFilter, which is used to quickly determine whether a key exists in the Region. If not, return directly to save subsequent queries.

  • 2.HyperLogLog mainly solves inexact counting (which may be more or less, but within a reasonable range) in big data applications. It can accept multiple elements as inputs and give estimates of the cardinality of the input elements, which refers to the number of different elements in the set.

    • The advantage of HyperLogLog is that even if the number or volume of input elements is very, very large, the space required to calculate the cardinality is always fixed and small.

In Redis, each HyperLogLog key costs only 12 KB of memory to calculate the cardinality of close to 2^64 different elements. This is in stark contrast to collections where the more elements you have, the more memory you consume when calculating cardinality.

  • 3.MapReduce is a programming model for parallel computation of large-scale data sets

    • MapReduce computing tasks are divided into two phases: Map phase and Reduce phase. Each stage uses key/value pairs as inputs and outputs.
    • MapReduce uses a “divide and conquer” strategy, in which a large data set is divided into many independent shards that can be processed in parallel by multiple Map tasks
    • The MapReduce framework uses the Master/Slave architecture, consisting of one Master and multiple slaves. JobTracker runs on the Master and TaskTracker runs on the Slave.
  • 4.MapReduce consists of four parts:

    • The MapReduce program written by a Client user is submitted to JobTracker by the Client. Users can view the job running status through interfaces provided by the Client
    • JobTracker JobTracker monitors resources and schedules jobs. JobTracker monitors the health status of all TaskTracker and jobs. If a task fails, JobTracker will track the execution progress of tasks, resource usage and other information, and tell the information to the task scheduler, and the scheduler will select appropriate tasks to use these resources when resources are idle
    • TaskTracker TaskTracker periodically reports the resource usage and task running progress of the node to JobTracker through heartbeat. Receives commands sent by JobTracker and performs operations. TaskTracker allocates the amount of resources (such as CPU and memory) on the node with equal slots. The Hadoop scheduler allocates free slots on each TaskTracker to tasks before a Task can run. Slot types include Map slot and Reduce slot, which are used by MapTask and Reduce Task respectively
    • HDFS stores job data, configuration information, and the final results
  • 5. Red-black Tree is also called R-B Tree. It is a special binary search Tree. Red-black trees have the following five properties: 1) every node is either black or red, 2) the root node is black, and 3) every leaf node (NIL) is black. 4) If a node is red, its children must be black 5) All paths from a node to the leaf of that node contain the same number of black nodes.

    • Property 5 ensures that no path is twice as long as the others, so a red-black tree is a relatively nearly balanced binary tree.

    • Because worst-case times for operations such as inserting, deleting, and finding a value are required to be proportional to the height of the tree, this theoretical upper limit on height allows red-black trees to be efficient in worst-case scenarios, unlike ordinary binary search trees.

    • Why is it that one path in a red-black tree is twice as long as the other? Note that property 4 means that the path cannot have two adjacent red nodes. The shortest possible paths are all black nodes, and the longest possible paths have alternating red and black nodes.

    • Since by property 5 all the longest paths have the same number of black nodes, this means that no path can be more than twice as long as any other path.

Excerpt from [Algorithm Diagram], by Aditya Bhargava, translated by Yuan Guozhong (Posts and Telecommunications Press).

Purchase Address:

  • Jingdong – item.jd.com/12148832.ht…
  • Dangdang – product.dangdang.com/24214704.ht…