Why machine learning?

Some tasks are more complex to code directly, and we can’t handle all the nuances and simple coding, so machine learning is necessary. Instead, we feed our machine learning algorithms a lot of data and let them explore the data and build models to solve problems. For example: recognizing 3d objects from new angles in a new cluttered lighting scene; Write a program to calculate the probability of fraud in credit card transactions.

Here’s how machine learning works: Instead of writing a program for each specific task, it collects lots of examples to specify the correct output for a given input. Algorithms use these instances to produce programs. Unlike a handwritten program, the program can contain millions of pieces of data, and is also suitable for new examples and trained data. If the data changes, the program trains on the new data and is updated. A lot of computing is cheaper than paying for a handwritten program.

Applications of machine learning are as follows:

1. Pattern recognition: facial or expression recognition and language recognition of the actual scene.

2. Identification anomalies: abnormal sequence of credit card transactions, abnormal reading patterns of sensors in nuclear power plants.

3. Prediction: future stock price or currency exchange rate, personal movie preference.

What is a neural network?

As a general machine learning model, neural network is a set of specific algorithms, which has brought a revolution in the field of machine learning. Neural network itself is the approximation of ordinary functions, which can be applied to any complex mapping problem of machine learning input to output. Generally speaking, neural network architecture can be divided into three categories:

1. Feedforward neural network: it is the most common type. The first layer is input and the last layer is output. If there are more than one hidden layer, it is called a “deep” neural network. It can calculate the changes in similar transitions between a series of events, and the activity of neurons at each layer is a nonlinear function of the next.

2. Cyclic neural network: a cyclic graph is formed between nodes, which can return to the initial point according to the direction of the arrow. Cyclic neural network has complex dynamics and is difficult to train. It simulates continuous data, which is equivalent to a deep network with a hidden layer for each time segment. In addition to using the same weight on each time segment, there are also inputs. Networks can remember information about hidden states, but it is difficult to train networks to do so.

3. Symmetric connection network: Similar to the recurrent neural network, but the connections between units are symmetric (i.e. the connection weights are the same in both directions). It is easier to analyze than the recurrent neural network, but its functions are limited. Networks with symmetric connections without hidden units are called Hopfiels networks, and networks with symmetric connections with hidden units are called Bozeman machines.

I. Perceptron

Networks without hidden units also have great limitations in input/output mapping modeling. Adding a layer of linear elements does not solve this problem, because linear superpositions are still linear, and fixed nonlinear outputs do not create this mapping. Therefore, it is necessary to build a multi-layer adaptive nonlinear hiding element.

Convolutional Neural Network

Machine learning has been widely focused on object detection, but there are still many factors that make it difficult

Object recognition: 1. Object segmentation and occlusion; 2. Lighting affects pixel intensity; 3. Objects are presented in various forms; 4. Objects with the same function have different physical shapes; 5. Changes brought by different vision; 6. Dimension jump problem.

CNN can be used for handwritten digit recognition to 3D object recognition, but it is more complicated to recognize objects from color images than handwritten digit recognition. Its category and pixel are 100 times of digits (1000 vs 100, 256*256 color vs28*28 gray scale).

ImageNet in the 2012 ILSVRC-2012 competition offers a dataset of 1.2 million high-resolution training images. The test images are not labeled, and participants need to identify the types of objects in the images. The winner, Alex Krizhevsky, developed a deep convolutional neural network. In addition to some of the largest pooling layers, the architecture has seven hidden layers, all of which are convolution layers, and the last two layers are global connections. Activation functions are linear units at each hidden layer, faster than logical units, and use competing specification standards to suppress hidden activity, contributing to strength variations. Hardware, implemented on two Nvidia GTX 580 Gpus (over 1000 fast cores) using an efficient convolutional network, ideal for matrix multiplication, with high memory bandwidth.



Iii. Recurrent Neural Network

Recurrent neural networks (RNN) have two powerful properties that can compute anything a computer can compute :(1) distributed hidden states that allow storage of large amounts of valid information, and (2) nonlinear dynamics that allow updating of hidden states in complex ways. RNN’s powerful computing power and gradient disappearance (or explosion) make it difficult to train. If the weight is very small, the gradient decreases exponentially. If the weight is large, the gradient increases exponentially. Some hidden layers of a typical feedforward neural network can cope with exponential effects. On the other hand, in a long sequence RNN, the gradient is easy to disappear (or burst). Even with good initial weight, it is difficult to detect the current target output that depends on multiple time inputs, so it is difficult to deal with remote dependence.

The way to learn RNN is as follows:

1. Long-term and short-term memory: RNN is made with small modules with long-term memory values.

2.Hessian Free Optimization: Use the optimizer to solve the gradient extinction problem.

3. Echo state network: initialize input → hide and hide → hide and output → hide link, so that the hidden state has a huge reserve of weakly coupled oscillators, which can be selectively driven by the input.

4. Initialize with momentum: As with the echo state network, learn all connections with momentum.

A ›› a € Long/Short Term Memory Network



Hochreiter & Schmidhuber (1997) constructed a long and short-term memory network to solve the problem of acquiring RNN long term memory. The storage unit was designed by using multiplicative logical linear units. As long as the “write” door was kept open, information would be written and kept in the unit, and the “read” door could also be opened to obtain data.

RNN can read a line of writing. The input coordinates of the nib are (x,y, P), where P represents whether the pen is up or down, and the output is a sequence of characters, using a series of small images as input instead of pen coordinates. Graves & Schmidhuber (2009) called RNN with LSTM the best system for reading running books.



 

A º” a € Hopfield Networks

Nonlinear cyclic networks have many manifestations, which are difficult to analyze: stable, oscillating or wonton state. A Hopfield network consists of binary threshold cells with cyclic connections. In 1982, John Hopfield discovered that if the connection is symmetric, there exists a global energy function where every binary “structure” of the entire network has energy, and the binary threshold decision rule causes the network to set a minimum value for the energy function. The simplest way to use this type of computation is to use memory as the minimum energy of the neural network. Using energy minima to represent memory gives a content-searchable memory that allows access to the entire project by understanding the local contents.



Each memory configuration is expected to produce a minimum of energy. However, having two minimum values will limit the Hopfield network capacity. Elizabeth Gardner found that there is a better storage rule that uses all the weights. Instead of trying to store multiple vectors at once, she loops through the training set many times, and trains each unit with a perceptron convergence program so that all the other units of the vector have the correct state.

6. Boltzmann Machine Network

Boltzmann machine is a kind of random recurrent neural network, which can be regarded as a random generation of Hopfield network. It is one of the first neural networks to learn about internal representations. The algorithm aims to maximize the product of the probabilities assigned to the binary vector in the training set, which is equivalent to maximizing the sum of the logarithmic probabilities assigned to the training vector. The method is as follows :(1) when the network has no external input, the distribution of the network is stable at different times; (2) Sampling the visible vector each time.

In 2012, Salakhutdinov and Hinton wrote an efficient small-batch learning program for boltzmann machines. In 2014, the model was updated, called restricted Boltzmann machine, see the original article for details.

7. Deep Belief Network



Back propagation is the standard method for artificial neural networks to calculate the error distribution of each neuron after processing a batch of data, but there are some problems. First, training data should be labeled, but almost all data are not labeled; Secondly, the learning time is insufficient, which means that the network with more hidden layers is slower. Third, it may lead to the most disadvantageous situation locally. So it’s not enough for the deep web.



Unsupervised learning overcomes the limitation of back propagation. Using gradient method to adjust weights helps to keep the architecture efficient and simple. It can also be used to model the structure of sensory input. In particular, it adjusts the weights to maximize the probability of generating models that produce sensory inputs. Belief network is a directed acyclic graph composed of random variables, which can infer the state of unobserved variables and adjust the interaction between variables to make the network more likely to produce training data.

Early graph models were defined by experts who defined image structures and conditional probabilities. These graphs were sparsely connected, and they focused on making correct inferences rather than learning. But for neural networks, learning is the focus, and the goal is not interpretability or sparse connectivity to make inference easier.

8. Deep auto-encoders

The architecture provides two mapping approaches that seem to be a good way to do nonlinear dimensionality reduction, it is linear (or better) in the number of training cases, and the resulting coding model is quite compact and fast. However, it is difficult to optimize depth autoencoders using backpropagation, and the backpropagation gradient will disappear if the initial weight is small. We use unsupervised layer-by-layer pre-training or careful initialization of weights like an echo state network.

There are three different types of shallow autoencoders for pre-training tasks :(1) RBM as an autoencoder; (2) denoising autoencoder; (3) compression autoencoder. For data sets without a large number of annotations, pre-training is helpful for subsequent discriminant learning. Even for deep neural networks, unsupervised training is not necessary for weight initialization for a large number of annotated data sets. Pre-training was the first good way to initialize the weight of deep networks. There are other methods as well. But if you expand your network, you need to do pre-training again.

conclusion

The traditional approach to programming is that we tell the computer what to do, breaking down big problems into lots of small, precise tasks that the computer can easily perform. Instead of telling a computer how to solve a problem, a neural network learns from the data it observes to find a solution.


The above is the translation.

This article is recommended by Beijing Post @ Love coco – Love life teacher, translated by Ali Yunqi Community organization.

Top-10-ted-talks – data-scientist-machine-learning

The article is briefly translated. Please check the attachment for more details.