Neural network development recipes

New Year’s Day: The official account of dove has finally reopened after a long time. This time, the theme of the official account will be about deep learning, referring to the development experience of Professor Andrej and some big names. Firstly, the overall architecture of neural network will be discussed theoretically and technically. This section will describe the development system of neural network model construction for learning.

1. Read the guide

There are a lot of loopholes when training neural network, it is not a simple switch in our cognition, can be used. In many cases, although the wrong network model is constructed (the training image forgets to detect the inverted image, the autoregressive model takes the data it predicts as input, or the configuration of weights, regularization, etc.); Most of the time it’s still training and we can’t detect what’s wrong with it. So the most important thing to successfully develop a neural network is a complete system, patience, and attention to detail.

your_data = # Import your dataset
model = SuperCrossValidator(SuoerDuper.fit, your_data, ResNet50, SGDOptimizer) # Set up your network
Copy the code

We think it’s easy to start training neural networks. Because many libraries and frameworks allow us to solve our data problems in as little as 20 or 30 lines of code. This creates a false impression that a lot of things are plug and play. In fact, neural networks are not, and when we deviate from training ImageNet classifier, it is not an off-the-shelf technology. If you don’t understand how the technology works, there will be a lot of unexpected failures…

2. Silent failure of neural network training

When we misconfigure code, we usually get some exceptions. **You plugged in an integer where something expected a string. The function only expected 3 arguments. This import Failed. That key does not exist. The number of elements in The two lists isn’t equal. This is just the beginning of training the neural network. Some code may be syntactically correct, but it may not be correct in the whole network, and these problems are hard to find. For example, back propagation is a leaky abstraction, and trying to ignore how it works will fail to deal with its problems, and the neural network model built and debugged will be much less effective.

Such as:

When the gradient on the Sigmoid disappears, the nonlinearity may saturate completely and stop learning resulting in training losses that are flat and reject downward. Perhaps because your weight initialization is too large, the output of the matrix multiplication has a wide range, where z*(1-z) is the local gradient of the Sigmoid nonlinear, thus making the gradients of x and W both 0.

ReLU: Non-linear ReLU sets the neuron threshold to 0. The forward and backward delivery cores of the full connection layer using ReLU include:
```
Maximum (0, np.dot(W, x)) # dW = Np.outer (z > 0, x) # dW = np.dot(W, x) #Copy the code
```
If one of its neurons is observed to be set to 0 in the forward pass (i.e. Z = 0, it will not fire), then its weight will be zero gradient. This is known as the Dying ReLUs problem, where if a ReLU neuron is unfortunately initialized it will never set off, or the weight of a neuron will be eliminated by a large update during training to the mechanism, and the neuron will “die permanently”. It’s like a permanent, irreversible brain injury. These neurons never open for any instance in our entire training set and will remain dead forever.
Gradient explosion in RNN: Refer to CS231n for an example shown below:

This RNN expands T time cloth. When we look at the effect of back propagation, we see that the total time of the gradient signal through all the hidden states and back propagation is multiplied by the same matrix (recursive Whh) and interspersed with nonlinear back propagation. When we take a genus A and start multiplying it by another number B (i.e. a * B * B * B * B…) . If | | b * * * * < 1, so the sequence into either 0, or | | b * * * * > 1 blast to infinity. The same thing happens with the back-propagation of RNN (except that B is a matrix rather than a number).

Everything that might be syntactically correct, the constructed neural network is very bad. This is a very annoying problem, perhaps because you forgot to flip the label while flipping the image left and right in the data enhancement section. Our network still works fine, because our network learns internally to detect flipped images, and then it flips the predicted values left and right. Or in an autoregressive model, the predicted thing is the input. Or, if we try to crop gradients, we crop losses and ignore outliers during training and so on, we’re lucky if the model we’re building is wrong, because it trains most of the time, just badly…

3. Develop recipes

In view of the above problems, if we want to apply neural networks to a new problem, we should build a process system. Pay attention to its rules, build from simple to complex, make specific assumptions about what’s going to happen out there, test them experimentally or visualize them until we find some problems. If we were to test unproven models, it would take a long time to find the problem. Start describing the entire development process.

3.1 Data desensitization

The first step in training a neural network does not involve touching any neural network code, but rather starts with a thorough examination of the data. This step is very critical. Often, a certain step in data processing will affect the experimental results to a certain extent. Examine data repeatability issues, corrupted image labels, data imbalances, and consider how to define the classification process. We need to know whether the local or global characteristics of the sample can be preprocessed and averaged. Image noise problem. Once we have a good grasp of the data, we can search/filter/sort the data we need (label types) and visualize their distribution by observing the outliers on each axis, which may affect the quality of the data or some errors in the preprocessing.

3.2 Build a complete training-evaluation framework

Once the data is processed, the next stage is to build the complete train-evaluation framework and verify its reliability through a series of experiments. We can start with some simple models, or very small networks (models that are not prone to error) for training, visualizing loss, accuracy, model prediction, and conducting ablation experiments using explicit assumptions in the process.