5 Step Life-cycle for Neural Network Models in Keras

By Jason Brownlee

Translation from: The Gold Project

This article is permalink: github.com/xitu/gold-m…

Translator: lsvih

Proofreader: CACppuccino

Five steps to build a neural network in Keras

Creating and evaluating deep neural networks using Keras is very convenient, but you need to follow a few strict steps to build the model.

In this article, we will explore the creation, training, and evaluation of deep neural networks in Keras step by step, and learn how to use the trained model for prediction.

By the end of this article you will know:

How to define, compile, train and evaluate a deep neural network in Keras.
How to choose and use the default model to solve the problems of regression and classification prediction.
How to use Keras to develop and run your first multi-layer perceptron network.
March 2017 update: The example was updated to Keras 2.0.2 / TensorFlow 1.0.1 / Theano 0.9.0.

Five steps to build a neural network in Keras

The caption is copyrighted by Martin Stitchener.

review

Here is a summary of the 5 steps we will introduce to build a neural network model in Keras.

Define the network.
Compile the network.
Training networks.
Evaluation network.
Make predictions.

Five steps to build a neural network in Keras

Want to learn more about deep learning using Python?

Subscribe for free for 2 weeks, get my email, and explore MLP, CNN, and LSTM! (Sample code included)

Now click to sign up and get a free PDF version of the tutorial.

Click here to start your lesson!

Step 1: Define the network

The first thing to do is to define your neural network.

In Keras, a neural network can be defined through a series of layers. The containers of these layers are Sequential classes. (Sequential model)

The first step is to create an instance of the Sequential class. You can then create the network layers you need in the order in which the layers are connected.

For example, we can do the following two steps:

model = Sequential()
model.add(Dense(2))Copy the code

Alternatively, we can define the model by creating an array of layers and passing it to a Sequential constructor.

layers = [Dense(2)]
model = Sequential(layers)Copy the code

The first layer of the network must define the expected input dimensions. There are many ways to specify this parameter, depending on the type of model you are building, but in this article’s multi-layer perceptron model we will specify it through the input_dim attribute.

For example, we want to define a small multilayer perceptron model that has two inputs in the visible layer, five neurons in the hidden layer, and one neuron in the output layer. This model can be defined as follows:

model = Sequential()
model.add(Dense(5, input_dim=2))
model.add(Dense(1))Copy the code

You can think of the sequential model as a pipeline, feeding data from one end and getting predictions from the other.

This concept of separating the normally interconnected layers and adding them to the model as separate layers is a very useful concept in Keras, which clearly shows the responsibility of each layer in the transformation of data from input to output. For example, the Activation function for summing and converting signals in each neuron can be extracted separately and added to the Sequential model as if at the same level.

model = Sequential()
model.add(Dense(5, input_dim=2))
model.add(Activation('relu'))
model.add(Dense(1))
model.add(Activation('sigmoid'))Copy the code

The choice of the output layer activation function is particularly important, which determines the format of the predicted values.

For example, here are some common types of predictive modeling questions, with their structures and standard activation functions that can be used at the output level:

Regression problem: Use the linear activation function “linear” and use the number of neurons that match the number of outputs.
Dichotomy problem: Use the logical activation function “sigmoid” to set only one neuron in the output layer.
Multi-category problems: Use Softmax to activate the function “Softmax”; If you use one-hot coded output format, then each output corresponds to one neuron.

Step 2: Compile the network

Once the network is defined, it must be compiled.

Compiling is an efficient step. It transforms the sequence of layers we define into a format that can be performed on the GPU or CPU according to the Keras configuration through a series of efficient matrix transformations.

You can think of the compilation process as a precalculation of your network.

Whether you’re training using an optimizer scheme or loading a set of pre-training weights from a saved file, everything you do after you define the model needs to be compiled, because the compilation step will transform your network into an efficient structure suitable for your hardware. So, too, is making predictions.

The compilation step requires setting parameters specific to the training of your network. It is particularly important to set the optimization algorithm used by the training network and the loss function that evaluates the network to minimize the results achieved by the optimization algorithm.

The following example compiles a model defined for a regression problem, specifying the stochastic gradient descent (SGD) optimization algorithm and the mean square error (MSE) as a function.

model.compile(optimizer='sgd', loss='mse')Copy the code

The type of predictive modeling problem also limits the types of loss functions that can be used.

For example, here are the standard loss functions corresponding to several different types of predictive modeling:

Regression problems: mean square error”mse“.
Dichotomous problem: logarithmic loss (also called cross entropy) “_binarycrossentropy“.
Multiple classification problem: Multiple types of categorical loss “_categoricalcrossentropy“.

You can check out the loss function supported by Keras.

The most commonly used optimization algorithm is stochastic gradient descent, but Keras supports several other optimization algorithms.

The following optimization algorithms are probably the most commonly used because their performance is generally good:

Stochastic gradient descent “sgdThe learning rate and momentum parameters need to be tuned.
ADAM “adamThe learning rate needs to be adjusted.
RMSprop “rmspropThe learning rate needs to be adjusted.

Finally, you can specify specific metrics in addition to the value of the loss function during the training of the model. In general, for classification problems, the most frequently collected indicator is accuracy. The metrics that need to be collected are determined by the names in the Settings array.

Such as:

model.compile(optimizer='sgd', loss='mse', metrics=['accuracy'])Copy the code

Step 3: Train your network

After the network is compiled, you can train it. This process can also be viewed as adjusting weights to fit the training data set.

Training network needs to formulate training data, including input matrix X and corresponding output y.

In this step, the network is trained using a backpropagation algorithm and optimized using the optimization algorithm developed at compile time and the loss function.

The back propagation algorithm needs to specify the Epoch (number of turns, Epoch) of the training, and the number of exposures to the data set.

Each epoch can be divided into multiple sets of data input and output pairs, also known as batches. Batch defines the number of input and output pairs before updating weights in each epoch. This is also a way to optimize efficiency and ensure that not too many input/output pairs are loaded into memory (video memory) at the same time.

Here is an example of a simplest training network:

model.compile(optimizer='sgd', loss='mse', metrics=['accuracy'])Copy the code

After the network has been trained, a History object is returned that contains a summary of the model’s performance during training (including the value of the loss function per round and the metrics collected at compile time).

Step 4: Evaluate the network

After the network training is completed, it can be evaluated.

The network can be evaluated using data from the training set, but the indicators obtained from this approach are not useful for predicting the network. Because the network already “sees” the data during training.

So we can use additional data sets that we haven’t seen before to evaluate network performance. This will provide an estimate of the future performance of the network against data that has not been seen before.

The evaluation model will evaluate the loss of input and output pairs in all test sets, as well as other metrics (such as classification accuracy) specified at model compile time. This step returns the results of a set of evaluation indicators.

For example, a model that uses accuracy as an indicator at compile time can be evaluated on new data sets, as follows:

loss, accuracy = model.evaluate(X, y)Copy the code

Step 5: Make predictions

Finally, if we are satisfied with the performance of the trained model, we can use it to make predictions about the new data.

This step is as simple as calling the predict() function directly on the model with a new set of inputs.

Such as:

predictions = model.predict(x)Copy the code

Predicted values are returned in a format defined by the network output layer.

In regression problems, these predicted values derived from linear activation functions may directly fit the format required by the problem.

For the dichotomy problem, the predicted value may be a set of probabilities that indicate the likelihood that the data will fall into the first category. These probability values can be converted to zeros and ones by rounding (K.round).

In the case of multiple classification problems, the result may also be a set of probability values (assuming the output variables are one-hot coded), so it also needs to use the argmax function to convert these probability arrays into the required single-class output.

End-to-End Worked Example

Let’s put it all together with a small example.

We will take the Pima Indians dual problem of diabetes as an example. You can download this dataset from the UCI Machine Learning repository.

This problem has eight input variables and needs to output a class value of 0 or 1.

We will build a multi-layer perceptron neural network with 8 input visible layer, 12 neuron hidden layer, rectifier activation function, 1 neuron output layer and Sigmoid activation function.

We will conduct 100 epoch times training for the network, set the batch size to 10, and use ADAM optimization algorithm and logarithmic loss function.

After training, we use the training data to evaluate the model and then use the training data to make separate predictions for the model. This is done for convenience, as we typically use additional test data sets for evaluation and new data for prediction.

The complete code is as follows:

Keras multilayer perceptron neural network example
from keras.models import Sequential
from keras.layers import Dense
import numpy
# Load data
dataset = numpy.loadtxt("pima-indians-diabetes.csv", delimiter=",")
X = dataset[:,0:8]
Y = dataset[:,8]
# 1. Define the network
model = Sequential()
model.add(Dense(12, input_dim=8, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
# 2. Compile the network
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
# 3. Train your network
history = model.fit(X, Y, epochs=100, batch_size=10)
# 4. Evaluate the network
loss, accuracy = model.evaluate(X, Y)
print("\nLoss: %.2f, Accuracy: %.2f%%" % (loss, accuracy*100))
# 5. Make predictions
probabilities = model.predict(X)
predictions = [float(round(x)) for x in probabilities]
accuracy = numpy.mean(predictions == Y)
print("Prediction Accuracy: %.2f%%" % (accuracy*100))Copy the code

Running the sample gives the following output:

. 768/768 [= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =] 0 s - loss: 0.5219 acc: 0.7591 Epoch 99/100 768/768 [= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =] 0 s - loss: 0.5250 acc: 0.7474 Epoch 100/100 768/768 [= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =] 0 s - loss: 0.5416 acc: 0.7331 32/768 / >... -ETA: 0s Loss: 0.51, Accuracy: 74.87% Prediction: 74.87%Copy the code

conclusion

In this article, we explore five steps to build a neural network when using the Keras library for deep learning.

In addition, you learn:

How to define, compile, train and evaluate a deep neural network in Keras.
How to choose and use the default model to solve the problems of regression and classification prediction.
How to use Keras to develop and run your first multi-layer perceptron network.

Do you have any other questions about Keras’s neural network model? Or do you have any suggestions for this article? Leave a comment in the comments and I’ll try to answer.

Diggings translation project is a community for translating quality Internet technical articles from diggings English sharing articles. The content covers Android, iOS, React, front end, back end, product, design and other fields. If you want to see more high-quality translations, please continue to pay attention to The Jingjin Translation Project, official weibo, zhihu column.

Five steps to build a neural network in Keras

Five steps to build a neural network in Keras

review

Want to learn more about deep learning using Python?

Step 1: Define the network

Step 2: Compile the network

Step 3: Train your network

Step 4: Evaluate the network

Step 5: Make predictions

End-to-End Worked Example

conclusion

Related Posts

Plotly+Pandas+Sklearn: Fire the first Kaggle shot

Standard equation method for training methods

We will promote case-level identification research