The practice of learning rate regulation in deep learning

The author | Luke Newman compile | source of vitamin k | forward Data Science

Multilayer perceptron

A multilayer perceptron (MLP) is an artificial neural network (ANN) consisting of an input layer, one or more hidden layers, and a final layer called an output layer. In general, the layer near the input layer is called the lower layer, and the layer near the output layer is called the outer layer. Each layer except the output layer contains a biased neuron and is fully connected to the next layer.

When an ANN contains a deep hidden layer, it is called a deep neural network (DNN).

In this survey, we will train a deep MLP on the MNIST fashion dataset and use exponential growth to find the optimal learning rate, plot losses, and find the point of loss growth to achieve over 85% accuracy. For best practices, we will implement early stops, save checkpoints, and plot the learning curve using TensorBoard.

You can see here jupyter Notebook:github.com/lukenew2/le…

Exponential learning rate

Learning rate is arguably the most important hyperparameter. In general, the optimal learning rate is about half of the maximum learning rate (that is, the learning rate deviated by the training algorithm). One way to find a good learning rate is to train the model to run several hundred iterations, starting with a very low learning rate (e.g., 1e-5) and gradually increasing to a very large value (e.g., 10).

This is achieved by multiplying the learning rate by a constant factor at each iteration. If you picture loss as a function of learning rate, you should first see it declining. But after a while, the learning rate gets too high, so the losses bounce back quickly: the optimal learning rate will be just below the turning point. You can then reinitialize your model and use this good learning rate to train it normally.

Keras model

Let’s enter the relevant library first

import os
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

PROJECT_ROOT_DIR = "."
IMAGES_PATH = os.path.join(PROJECT_ROOT_DIR, "images")
os.makedirs(IMAGES_PATH, exist_ok=True)

def save_fig(fig_id, tight_layout=True, fig_extension="png", resolution=300) :
    path = os.path.join(IMAGES_PATH, fig_id + "." + fig_extension)
    print("Saving figure", fig_id)
    if tight_layout:
        plt.tight_layout()
    plt.savefig(path, format=fig_extension, dpi=resolution)
Copy the code

import tensorflow as tf
from tensorflow import keras
Copy the code

Next, load the data set

(X_train, y_train), (X_test, y_test) = keras.datasets.fashion_mnist.load_data()

X_train.shape

X_train.dtype
Copy the code

Normalized pixel

X_valid, X_train = X_train[:5000] / 255.0, X_train[5000: /255.0
y_valid, y_train = y_train[:5000], y_train[5000:] 
X_test = X_test / 255.0
Copy the code

Let’s take a quick look at sample images from the dataset to give us a sense of the complexity of the sorting task:

class_names = ["T-shirt/top"."Trouser"."Pullover"."Dress"."Coat"."Sandal"."Shirt"."Sneaker"."Bag"."Ankle boot"]

n_rows = 4
n_cols = 10
plt.figure(figsize=(n_cols * 1.2, n_rows * 1.2))
for row in range(n_rows):
    for col in range(n_cols):
        index = n_cols * row + col
        plt.subplot(n_rows, n_cols, index + 1)
        plt.imshow(X_train[index], cmap="binary", interpolation="nearest")
        plt.axis('off')
        plt.title(class_names[y_train[index]], fontsize=12)
plt.subplots_adjust(wspace=0.2, hspace=0.5)
save_fig('fashion_mnist_plot', tight_layout=False)
plt.show()
Copy the code

We are ready to build our MLP with Keras. Here is a classified MLP with two hidden layers:

model = keras.models.Sequential([
    keras.layers.Flatten(input_shape=[28.28]),
    keras.layers.Dense(300, activation="relu"),
    keras.layers.Dense(100, activation="relu"),
    keras.layers.Dense(10, activation="softmax")])Copy the code

Let’s look at this code line by line:

First, we create a Sequential model, the simplest Keras model for neural networks, which consists of just a bunch of sequentially connected layers.
Next, we build the first layer and add it to the model. It is a Flatten layer whose purpose is to convert each input image into a 1D array: if it receives input data X, x.reshape (-1, 1) is computed. Because it is the first layer of the model, you should specify the input shape. Alternatively, you can add keras.layers.InputLayer as the first layer and set its input_shape=[28,28].
Next, we add a hidden layer of 300 neurons and specify that it uses the ReLU activation function. Each full connection layer manages its own weight matrix, containing all connection weights between neurons and their inputs. It also manages a bias vector, one per neuron.
We then added a second hidden layer of 100 neurons, again using the ReLU activation function.
Finally, we added an output layer with 10 neurons using the Softmax activation function (because we performed a classification where each class was mutually exclusive).

Use a callback

In Keras, the fit() method takes a callback parameter that allows you to specify the list of objects that Keras will call at the start and end of training, at the start and end of each epoch, and even before and after each batch processing.

To achieve exponentially increasing learning rates, we need to create our own custom callbacks. Our callback takes a parameter, a factor that increases the learning rate. To characterize losses as a function of learning rates, we track the rates and losses of each batch.

Note that we define the function as on_batch_end(), depending on our goal. It can also be on_train_begin(), on_train_end(), on_batch_begin(). For our use case, we want to increase the learning rate and record losses after each batch:

K = keras.backend

class ExponentialLearningRate(keras.callbacks.Callback) :
    def __init__(self, factor) :
        self.factor = factor
        self.rates = []
        self.losses = []
    def on_batch_end(self, batch, logs) :
        self.rates.append(K.get_value(self.model.optimizer.lr))
        self.losses.append(logs["loss"])
        K.set_value(self.model.optimizer.lr, self.model.optimizer.lr * self.factor)
Copy the code

Now that our model is created, we simply call its compile() method to specify the Loss function and optimizer to use. Alternatively, you can specify a list of additional metrics to be calculated during training and evaluation.

First, we use a “sparse classification cross entropy” loss because we have sparse labels (that is, for each instance, there is only one index of the target class, in our case, from 0 to 9), and the classes are mutually exclusive). Next, we specify the use of stochastic gradient descent and initialize the learning rate to 1E-3 and increase it by 0.5% in each iteration:

model.compile(loss="sparse_categorical_crossentropy",
              optimizer=keras.optimizers.SGD(lr=1e-3),
              metrics=["accuracy"])
expon_lr = ExponentialLearningRate(factor=1.005)
Copy the code

Now let’s train the model in just one EPOCH:

history = model.fit(X_train, y_train, epochs=1,
                    validation_data=(X_valid, y_valid),
                    callbacks=[expon_lr])
Copy the code

We can now plot the loss as a function of the learning rate:

plt.plot(expon_lr.rates, expon_lr.losses)
plt.gca().set_xscale('log')
plt.hlines(min(expon_lr.losses), min(expon_lr.rates), max(expon_lr.rates))
plt.axis([min(expon_lr.rates), max(expon_lr.rates), 0, expon_lr.losses[0]])
plt.xlabel("Learning rate")
plt.ylabel("Loss")
save_fig("learning_rate_vs_loss")
Copy the code

As we would expect, the initial loss gradually decreases as the learning rate increases. But after a while, the learning rate becomes so large that losses bounce back: the optimal learning rate will be slightly below the point at which losses begin to climb (typically about 10 times lower than the tipping point). We can now reinitialize our model and train it normally with a good learning rate.

There are more learning rate tips, including creating a learning schedule, that I hope to cover in a future survey, but it’s equally important to have an intuitive understanding of how to manually select good learning rates.

Our losses start to rebound around 3E-1, so let’s try using 2E-1 as our learning rate:

keras.backend.clear_session()
np.random.seed(42)
tf.random.set_seed(42)
Copy the code

model = keras.models.Sequential([
    keras.layers.Flatten(input_shape=[28.28]),
    keras.layers.Dense(300, activation="relu"),
    keras.layers.Dense(100, activation="relu"),
    keras.layers.Dense(10, activation="softmax")])Copy the code

model.compile(loss="sparse_categorical_crossentropy",
              optimizer=keras.optimizers.SGD(lr=2e-1),
              metrics=["accuracy"])
Copy the code

Use TensorBoard for visualization

TensorBoard is a great interactive visualization tool that you can use to view learning curves during training, compare learning curves, visualize computational graphs, analyze training statistics, view images generated by models, visualize complex multidimensional data projected to three and automatic clustering, and much more! The tool is installed automatically when you install TensorFlow, so you should already have it installed.

Let’s start by defining the root log directory that will be used for TensorBoard logs, plus a small function that will generate a subdirectory path based on the current time so that it is different every time it is run. You may want to include additional information in the log directory name, such as the value of the hyperparameter being tested, to make it easier to understand what you are viewing in TensorBoard:

root_logdir = os.path.join(os.curdir, "my_logs")

def get_run_logdir() :
    import time
    run_id = time.strftime("run_%Y_%m_%d-%H_%M_%S")
    return os.path.join(root_logdir, run_id)

run_logdir = get_run_logdir() For example, './my_logs/run_2020_07_31-15_15_22'
Copy the code

The Keras API provides a TensorBoard() callback function. The TensorBoard() callback is responsible for creating the log directory and, at training time, creating event files and writing a summary (a binary data record used to create a visual TensorBoard).

Each run has a directory, each containing a subdirectory for training logs and validation logs. Both contain event files, but the training log also contains analysis traces: This enables TensorBoard to show exactly how much time the model is spending on each part of the model (across all devices), which is useful for finding performance bottlenecks.

early_stopping_cb = keras.callbacks.EarlyStopping(patience=20)
checkpoint_cb = keras.callbacks.ModelCheckpoint("my_fashion_mnist_model.h5", save_best_only=True)
tensorboard_cb = keras.callbacks.TensorBoard(run_logdir)

history = model.fit(X_train, y_train, epochs=100,
                    validation_data=(X_valid, y_valid),
                    callbacks=[early_stopping_cb, checkpoint_cb, tensorboard_cb])
Copy the code

Next, we need to start the TensorBoard server. We can do this directly in Jupyter by running the following command. The first line loads the TensorBoard extension, and the second line starts the TensorBoard server on port 6004 and connects to it:

%load_ext tensorboard 
%tensorboard — logdir=./my_logs — port=6004
Copy the code

You should now be able to see the TensorBoard Web interface. Click the Scaler TAB to see the learning curve. In the lower left corner, select the log you want to visualize (for example, the training log from the first run), and then click Epoch_Loss Scaler. Please note that during our training, training losses went down smoothly.

You can also visualize the entire graph, learn weights (projected into 3D) or analyze tracks. The TensorBoard() callback also has the option to record additional data, such as the embedding of NLP data sets.

This is actually a very useful visualization tool.

conclusion

Here we get 88% accuracy, which is the best depth MLP we can achieve. If we want to further improve performance, we can try convolutional neural networks (CNN), which are very efficient for image data.

That’s enough for our purposes. We learned how to:

Build a deep MLP using Keras’s Sequential API.
Find the optimal learning rate by exponentially increasing the learning rate, plotting losses, and finding the point where losses recur.
Best practices when building deep learning models include using callbacks and visualizing learning curves using TensorBoard.

If you’d like to see the demo slides here or the full code and instructions in jupyterNotebook, feel free to check out the Github repository: github.com/lukenew2/le…

Additional resources

www.tensorflow.org/tensorboard…

Towardsdatascience.com/learning-ra…

The original link: towardsdatascience.com/learning-ra…

Welcome to panchuangai blog: panchuang.net/

Sklearn123.com/

Welcome to docs.panchuang.net/

The practice of learning rate regulation in deep learning

Multilayer perceptron

Exponential learning rate

Keras model

Use a callback

Use TensorBoard for visualization

conclusion

Additional resources

Related Posts

Small white learn PyTorch | 6 model function of building access traversal storage (advanced)

Automatic target targeting system based on MATLAB GUI

NLP Study Notes 9- Stop using words