0 the introduction

This article is a reading note for The Basics of Artificial Intelligence (High School edition). The illustrations of this book are very good, and it is difficult to understand the concept of graphics, so after reading it, you will cite the pictures in the book (the copyright of the pictures is owned by Sensetime).

Some of the concepts in the book are still obscure, and it is difficult for readers to understand the subtle differences between some concepts and methods from the perspective of small white, so I use my own understanding and erase some difficult details to show them in a more understandable form.

1 Overview of artificial intelligence

1.1 a brief history of

1.2 Application Fields

security

  1. Real-time detection of pedestrians and vehicles from the video.

  2. It detects unusual behavior in the video (a drunk pedestrian or a vehicle driving the wrong way, for example) and alerts it with location information.

  3. It can automatically judge the density of the crowd and the direction of the flow of people, discover the potential danger brought by the overdense crowd in advance, and help the staff guide and manage the flow of people.

medical

  1. Technology for automated analysis of medical images. These techniques can automatically identify key points in medical images and perform comparative analysis.

  2. Using multiple medical images to reconstruct three-dimensional models of human organs, doctors can design and ensure surgery

  3. By providing all of us with health advice and disease risk warnings, we can all live healthier lives.

Intelligent customer service

Smart customer service can communicate with customers like a human. It can understand the customer’s question, analyze the meaning of the question (for example, is the customer asking about the price or the function of the product), and make accurate, appropriate and personalized response.

autopilot

Today’s autonomous vehicles use a variety of sensors, including video cameras, lidar, satellite positioning system (BD)S, global Positioning system (GPS), etc.), to sense the driving environment in real time. Intelligent driving systems can synthesize multiple sensory signals and, by combining maps and signs such as traffic lights and road signs, plan driving routes in real time and issue commands to control the movement of cars.

Industrial manufacturing

Help factories to automatically detect defects of various forms

1.3 concept

What is artificial intelligence? Artificial intelligence is the use of machines to simulate human cognition.

The three training methods of ARTIFICIAL intelligence are supervised learning, unsupervised learning and reinforcement learning. They will be described below.

2. Is this the tail flower?

2.1 Feature Extraction

Human sensory characteristics number of petals, color

Manually design features to determine which features, and then through measurement into a specific value

Deep learning features will not be mentioned here, but will be mentioned later in the article

2.2 perceptron

To tell the difference between two kinds of tail flowers, you have to draw a straight line. You can draw an infinite number of straight lines, but which is the best one?

What to do? I am a poor student, depend meng!

  1. Take three random numbers a=0.5, b=1.0, c=-2 and substitute y = ax[1] + bx[2] + c,
  2. The two characteristics of each flower are also substituted by x[1] and x[2]. For example, if (4, 1) is substituted, y[prediction] = 1, then Y [actual] = 1 (the sample is set as 1 for the color luan tail flower and -1 for the Shan-luan tail flower), so y[actual] -y [prediction] = 0.
  3. Repeat the above two steps to obtain all the “aggregate differences between actual and predicted values”, denoted as Loss1
  4. But how do you know if it’s the best line? Keep guessing! Continue to meng! Guess it like the World Cup.
  5. By taking the gradient along y = ax[1] + bx[2] + c Continue to guess the direction of the decline, the specific process is roughly like this:

The difference between the actual value and the predicted value mentioned above is actually a loss function. There are other loss functions, such as the linear distance formula between two points, cosine similarity formula and so on, which can calculate the difference between the predicted result and the actual result.

Highlight: The loss function is the gap between reality and ideal (brutal)

2.3 Support vector machines

methods The difference between
perceptron The line guess is based on the minimum difference between all predicted points and the actual point
Support vector machine SVM Guess the line based on the fact that all the points have the smallest distance from the straight line

* The difference in the judgment basis also leads to the difference in the loss function (but still guessing)

Intuitively, the bigger the gap, the better (old drivers shut up!)

More than 2.4 classification

What if there are more flowers? In a plant class, the teacher invited an expert in peony identification, lotus identification and plum identification. The teacher took out a plate of flowers for each expert to identify, the probability of peony is 0.013, the probability of lotus expert is 0.265, and the probability of plum expert is 0.722. After the teacher integrated the opinions of the experts, he told the students that this was a plum blossom.

Xiaoming: this teacher is not silly, a flower is what all don’t know, but also please three experts teacher: you give me roll out

The actual calculation process is to output corresponding classification values by using binary classifiers trained by methods such as 2.2 and 2.3 (for example, the three flower classifiers output -1, 2, 3 respectively). Then how to convert these classification values into probabilities? This is to use the normalized exponential function Softmax (if it is a binary, use Sigmoid function), here do not take the formula, you can intuitively read the table in the book to understand:

2.5 Unsupervised learning

Section 2.2 the difference between the predicted value and the actual value can be used to judge whether the student guesses correctly or not, because the biology teacher told the students with poor grades which samples are shanluan and which ones are discolored. But if the teacher doesn’t even tell the poor students the actual category of the sample (unsupervised learning), the poor students don’t know what the sample is.

So what to do?

The intro to machine learning course is always about the tail flower, which is annoying enough. Here’s a different scenario:

Let’s say you’re a livestream boss and you’re looking for a bunch of young anchors, and you have a bunch of candidates, and all you have is their bust and hip measurements. With eight resumes in front of you, you aren’t capable of knowing which are more capable. More attractive to fans. You don’t have time to interview them all, so how should you choose?

  1. When you put their bust and hip measurements on a two-dimensional chart:

  2. This is a swipe that divides them into two groups, so to speak.

  3. Find the center of the cluster using some kind of calculation, such as an average. The closer a point is to the cluster center, the more similar it is.

  4. Figure out the distance between each cluster point and the blue cluster center and the yellow cluster center

  5. If a point is closer to the center of the yellow cluster but you have conveniently crossed it into the blue group (the small square marked with a red border in the image above), then put it in the yellow group.

  6. This is because the range of the group and which sisters are included in the group have changed. At this time, you need to recalculate the center of the cluster using the method in Step 3

  7. Repeat step 4 (count the distance between the centers of points) -> repeat step 5 (adjust the yellow and blue girls) -> repeat step 3 (count the centers), and repeat the process until the little sisters contained in the blue and yellow clusters no longer change. Then stop the cycle.

  8. By this time, the ladies had been divided into two categories. You can draw two types of little sisters:

The computer, unsupervised, managed to divide the girls into two categories and then put two of each on the platform to see who was more competent. If the result is better, we will expand more capable anchors with the sample characteristics of that cluster.

Xiaoming: and, have what great of, I can see a yellow little sister more capable teacher: you get out for me

The clustering algorithm above is called the K neighboring algorithm, where K is the number of clusters to be clustered (which needs to be manually specified), and the example above K=2. Then, if divided into three categories, K=3, the training process can be seen in the following figure to have an intuitive understanding:

3 What is this item (image recognition)

3.1 Feature Extraction

Human sensory features: petal color, petal length, whether or not there are wings (distinguishing cats from birds), whether or not there are mouths and eyes (aircraft and birds)

The kitten The bird The plane The car
Feature 1: No wings no is is no
Feature 2: No eyes is is no no

Artificial design features Sensory features are quantified to obtain numerical features for color (RGB value), edges (rounded corners, right angles, triangles), and textures (waves, lines, meshes)

Deep learning features extract image features through convolution

Underline: the function of convolution is to extract useful information from the image, such as wechat to compress the picture you send, the size is smaller, but you can still distinguish the main content of the image.

1 dimensional convolution 15+24+33=22, 14+23+32=16, 13+22+3*1=10

Two-dimensional convolution 12+30+24+42=28…

Feature information of the image, such as edges, can be obtained by convolution

3.2 Differences between deep learning and traditional pattern classification

Why have neural networks when you have traditional pattern classification?

The difference is that traditional pattern classification requires artificial features such as petal length, color, and so on. In deep learning, the manual design of features is omitted, and the convolutional operation is used to automatically extract the features. The training of classifier is also integrated into the neural network to realize the end-to-end learning

End to End learning (End to End) is directly from the input output, there is no middleman, their own profit.

3.3 Problems existing in deep (multi-layer) neural networks

Generally speaking, the increase of neural network layer, will improve the accuracy. However, the deepening of the network layer leads to:

Students with poor performance in overfitting recite the answers to the predicted questions of the college entrance examination without understanding them. In the exam, if the questions are memorized by the examinee, the examinee can answer correctly. If you haven’t memorized it then you won’t answer it. We can say that the students with poor grades “overfit” the predicted test.

At the other end of the scale, the person who was so bad at under-fitting that he couldn’t even memorize the prediction, getting only 30% of the answers right even when the test questions were exactly the same as the prediction. Then it would be fair to say that this kind of person is not fit.

If you are interested in gradient dispersion and gradient explosion, here is a formula that is very popular and very inspiring on the Internet. The weights are multiplied in a multi-layer network. For example, the weight of each layer is 0.01. In the learning process of Gradient Descent, learning will become very slow. (Like dropping a small ball from the top of a bowl will slow down to the bottom.)

The non-convex optimization learning process may stop at the local minimum because the gradient is zero. When the local minimum stops instead of the global minimum stops, the model learned is not accurate enough.

Look at the picture and get a feel for it

You don’t mean bottom. What top are you talking about

The solution

There is a lot of mathematical logic involved in Uniform Initialization, Batch Normalization, and Shortcut, which will not be explained here.

3.4 applications

Face recognition

The self-driving car slices the images taken from the top of the car into small squares, each of which detects whether an object is a car, a pedestrian or a dog, a red or green light, various traffic signs and so on. Then judge the distance of the object with radar.

4 What is this Song (Voice recognition)

4.1 Feature Extraction

Human sensory characteristics Volume, tone, timbre

By sampling, quantifying, coding. Realization of digitalization of sound wave (acoustic wave to electrical signal)

The artificial design features that the Meier frequency has high resolution in the low frequency part and low resolution in the high frequency part (this is similar to the human ear’s auditory experience, that is, in a certain frequency range, people are more sensitive to the low frequency sound but not to the high frequency sound). Relationship is:

Mel frequency cepstrum coefficient (Mel - FrequencyCepstralCoefficients MFCCs)

Deep learning features are extracted by the 1-dimensional convolution introduced in 3.1

4.2 applications

Music Style classification

Input: Audio file feature: sound feature Output: music type

Speech to text

Input: Audio file features: sound features Output: acoustic model (e.g. 26 English letters)

The acoustic model is then fed into another learner

Input: Acoustic model Features: Semantics and vocabulary Output: Smooth statements (see # 6, How to Get your computer to Output Smooth statements)

When listening to music, a feature vector can be obtained by scanning the window (dividing the music into small segments), and then extracting the features of this segment by the method described in 4.1. Do the same for the songs in the database and the songs recorded by the user to get the feature vector, and then calculate the similarity between the two vectors (the distance between the two vectors can be calculated by calculating the Angle between the two points using the cosine formula or the distance between the two points formula).

5. What is the person doing in the video (video understanding, motion recognition)

5.1 introduction

Video, in essence, is composed of a continuous frame of pictures. Because of the Persistence of vision (when the human eye observes the scenery, the light signal is passed into the brain nerve, and does not disappear immediately, so that people have the impression of continuous picture), it seems to be continuous, that is, video. To identify what objects are in the video, the image recognition and classification methods mentioned above can be used to analyze the single frame image in real time, such as:

But video has a more important property than images: action (behavior).

How do you analyze action from a continuous video?

For example, like the picture above, only the two legs of the pixel point relative to the yellow box (box and dogs are relatively static) in left and right sides “mobile”, “mobile” here we introduce a concept of optical flow (a pixel from one location to another location), move through the pixels of optical flow as the training of the neural network features (X), “Running” is taken as the training target value (Y). After repeated iterative training, the machine can fit a Y = f(X) to judge whether the Object in the video is running.

5.2 the light flow

Suppose, 1) the object movement is very small in the two adjacent frames and 2) the color of the object is basically unchanged in the two adjacent frames

How the neural network tracks a particular pixel is not explained here.

The point at time t points to the position of the point at time T +1, which is the optical flow at this point, which is a two-dimensional vector.

The optical flow of the whole picture is like this:

The optical flow of the entire video looks like this

Assuming that the video width, height and total m frames, the video can be represented by the tensor of width * height * m * 2 (that is, the three-dimensional matrix), and the vector can be fed to the neural network for classification training.

Further optimization, the optical flow can be simplified to 8 directions, and the optical flow histogram of a certain frame can be obtained by adding all optical flows of a certain frame to the eight directions, and the 8-dimensional feature vector can be further obtained.

6 what is a paragraph saying (Natural Language processing)

6.1 Feature Extraction

Serial number The sentence classification
1 Science has proven that swimming is good for your body. sports
2 Fu Yuanhui won a gold medal in Olympic swimming. sports
3 Excellent reading is a very useful knowledge management application. tool
4 This article describes the application of Evernote in knowledge management. tool

Here are four sentences, first participle:

Serial number The sentence
1 Science has shown that swimming is good for the body.
2 Fu Yuanhui won a gold medal in Olympic swimming.
3 Excellent reading is a very useful knowledge management application.
4 This article describes the application of Evernote in knowledge management.

Remove stop words (adverbs, prepositions, punctuation, etc., usually have a stop list in text processing)

Serial number The sentence
1 Science has shown that swimming is good for your body
2 Fu Yuanhui won the gold medal in Olympic swimming
3 Excellent for use in knowledge management applications
4 This paper describes the application of Evernote knowledge management

Code word

Sentence vectorization

6.2 advanced

Word vectography is synonymous with “awesome” and “computer” is synonymous with “computer”. From the above steps, we may think that the two words are completely different, but in fact they have similar meanings. How can AI learn to know this? It is necessary to further enrich the connotation of words from multiple dimensions, such as:

Frequency of reverse document The more a word appears in one type of article, and the less in another type of article, the more it can show that this sub can represent the classification of this article. For example, swimming appears more in the sports articles (2 times), but less in the tools articles (0 times), which is more representative of the sports articles than other words (1 time).

Suppose there are N words in the sentence, the occurrence of a word is T, there are a total of X sentences, and the word appears in W sentences, then the frequency of reverse document TF-IDf is T/N * log(X/W).

6.3 applications

7. Let the computer draw (generate adversarial networks)

Once upon a time, there was a man who made money by selling copies of famous paintings. He began to copy a famous painting:

The first time he drew it like this:

This generation-discrimination model is at the heart of generative adversary-network (GAN).

Through the generator, the random pixel points are arranged in an orderly manner to form a meaningful picture. Then, through the discriminator, the classification of the generated picture and the gap between the real picture are obtained, and the direction of optimization is told to the generator. After several rounds of training, the generator learned to draw “real” pictures.

How does a computer turn random pixels into a meaningful picture? Let’s look at this with a simplified example.

y=2x+1
f(x)
f(x)

Highlights: Functions can change the distribution of data (from straight to curved, cook says)

8 How does AlphaGo play chess? (Reinforcement Learning)

8.1 Rough Cognition

Supervised/unsupervised training: Make every task as correct as possible. Reinforcement learning: Whether multiple tasks achieve the ultimate goal

Every mission is accurate, is not to achieve the ultimate goal? Let’s look at an example:

Alice, the owner of a wholesale store, asks her manager, Bill, to increase sales. Bill instructs his salespeople to sell more radios. One of the salespeople, Charles, gets a big profit, but then the company cannot deliver the radios because of a shortage of supplies. Who is to blame? From Alice’s point of view, Charles’ actions have brought shame on the company. But from Bill’s point of view, Charles successfully fulfilled his sales task, and Bill increased his sales (sub-task achievement). The Society of the Mind, chapter 7.7

8.2 AlphaGo

The oldest way to play Go is to make a decision tree, starting from the upper left corner to the lower right corner. Each empty position is a branch, and then predict the probability of winning each game, and find the move with the greatest probability to play. This is the drop predictor.

To reduce complexity, the key is to reduce the breadth and depth of the search.

When we grow a small pot, if we don’t prune the branches, then nutrients are wasted on branches that don’t grow well. Withered or abnormal branches need to be pruned in time to ensure that nutrients are transferred to the normal (or desired) branches.

In the same way, limited computer power wasted on exhausting all go moves would result in very slow and time-consuming generalisation of the game.

Can we speed up the selection of better drop suboptions by “pruning” the large decision tree of drop selectors? How do you tell which branches are good and which are bad? This requires a game value estimator (which board is more likely to win), which removes the worthless board first and no longer traverses it, which reduces both the breadth and depth of the search.

Policy Network uses the Monte Carlo search tree to deduce from the current chess game (playing at random) to the final chess game, The payoff is positive if you win, negative if you win. After that, the algorithm will backtrack step by step along the winning subscheme of the game process in reverse, increasing the score of the winning subscheme on the path, and correspondingly decreasing the score of the losing subscheme, so the probability of choosing the winning scheme will increase when the same situation is encountered later. Therefore, the drop-off selection can be accelerated and is called a fast walking subnetwork.

Through the policy network + value network + Monte Carlo search tree to achieve the choice of the optimal drop subscheme, while the two robots play chess with each other, so that the network is constantly trained to learn the drop subscheme.

8.3 define

Here’s the dry definition

What is reinforcement learning?

Reinforeement learning is used when the focus is not on whether a judgment is accurate, but whether the course of action will bring the maximum benefit. Such as playing chess, trading stocks, or making business decisions.

The goal of reinforcement learning is to obtain a poliey to guide the action. In go, for example, the strategy guides each move according to the board situation. In stock trading, this strategy tells us when to buy and when to sell.

A reinforcement learning model generally consists of the following parts:

A set of states that can change dynamically (SUTE)

The position of the black and white pieces on a Go board is the price of the stock in a stock exchange

A group of optional actions (metion)

In Go, it is a position where you can place a player. In the case of a stock exchange, it’s how many shares are being bought or sold at each point in time and how many.

An environment that can interact with the decision agent. This environment determines how the state changes after each action.

The position of the player (subject) affects the game (environment), the environment rewards the subject (win) or punishes the subject (lose), the buying or selling of the trader (subject) affects the price of the stock (environment, supply and demand determine price), the environment rewards the subject (make money) or punishes the subject (lose money)

When a decision making body changes its state through an action, it is rewarded or punished (the reward is negative).

“Artificial intelligence foundation high school edition” this book, have time to suggest that readers can read their own, book links

Jinkey. ai/ Post /tech/5… The author of this article Jinkey (wechat public account Jinkey-love, official website jinkey.ai) article is allowed to reprint without tampering with the name, delete or modify this paragraph of copyright information reprint, as an infringement of intellectual property rights, we reserve the right to pursue your legal liability, hereby declare!