Original link:tecdat.cn/?p=15850

Original source:Tuo End number according to the tribe public number

 

In this article, you will discover how to develop, evaluate, and make predictions using standard deep learning models including multilayer Perceptron (MLP), convolutional neural network (CNN), and recursive neural network (RNN).

Develop a multi-layer perceptron model

The multi-layer perceptron model (MLP) is a standard fully connected neural network model.

It consists of a layer of nodes, where each node is connected to all the outputs of the previous layer, and the outputs of each node are connected to all the inputs of the nodes of the next layer.

MLPS are created from one or more dense layers. This model applies to tabular data, that is, data in a table or spreadsheet, with one column per variable and one row per variable. You may need to explore three predictive modeling problems using MLP; They are binary classification, multiple classification and regression.

Let’s fit the model on a real data set for each case.

 

Binary classified MLP

We will use the binary (two classes) classification dataset to demonstrate the MLP for binary classification.

The data set involves predicting whether the structure is in the atmosphere or not given a radar echo.

The dataset will be downloaded automatically using Pandas.

  • Ionospheric Data Set (CSV) 
  • Ionospheric Data Set Description (CSV) 

We’ll use LabelEncoder to encode string labels as integer values 0 and 1. The model will fit 67% of the data and the remaining 33% will be used for evaluation, split using the train_test_split () function.

It is best to use the ‘relu’ activation together with the ‘he_normal’ weight initialization. This combination can greatly overcome the problem of gradient disappearance when training deep neural network models.

The model predicts class 1 possibilities and uses an S-type activation function.

The code snippet is listed below.

# MLP from pandas import read_csv from sklearn.model_selection import train_test_split from sklearn.preprocessing Import LabelEncoder from tensorflow.keras import Sequential from tensorflow.keras. Layers import Dense # X, y = df.values[:, :-1], df.values[:, -1] # ensure all data are floating point values X = x.stype ('float32') # X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33) print(X_train. Shape, X_test. N_features = X_train. Shape [1] # define model = Sequential() model.add(Dense(10, activation='relu', kernel_initializer='he_normal', input_shape=(n_features,))) model.add(Dense(8, activation='relu', kernel_initializer='he_normal')) model.add(Dense(1, activation='sigmoid')) # compile the model model.compile(optimizer='adam', loss='binary_crossentropy', Metrics =['accuracy']) # fit(X_train, y_train, epochs=150, batch_size=32, verbose=0)Copy the code

 

Running the sample will first report the shape of the dataset, then fit the model and evaluate it on the test dataset. Finally, a single line of data is predicted.

Given the randomness of the learning algorithm, your specific results will vary. Try running the example a few times.

 

In this case, we can see that the model achieves about 94% classification accuracy, and then predicts that the probability of a single row of data being in category 1 is 0.9.

(235, 34) (116, 34) (235,) (116,)
Test Accuracy: 0.940
Predicted: 0.991
Copy the code

 

MLP for multi-class classification

We will use the iris multiclass classification dataset to demonstrate the MLP for multiclass classification.

The problem involves predicting the species of irises given the measurement of flowers.

The dataset will be downloaded automatically using Pandas, but you can learn more here.

  • Iris data Set (CSV)
  • Iris data Set Description (CSV)

Since it is a multi-class classification, the model must have a node for each class in the output layer and use the Softmax activation function. The loss function is’ sparse_categorical_crossentropy ‘, which applies to integer encoded class tags (for example, one class is 0, the next class is 1, etc.)

Code snippets for fitting and evaluating MLP on iris data sets are listed below.

Predict ([row]) print(%s (class=%d)' % (argmax(yhat)))Copy the code

 

Running the sample will first report the shape of the dataset, then fit the model and evaluate it on the test dataset. Finally, a single line of data is predicted.

Given the randomness of the learning algorithm, your specific results will vary. Try running the example a few times.

 

In this case, we can see that the model achieves about 98% classification accuracy and then predicts the probability of a row of data belonging to each category, although category 0 has the highest probability.

(100, 4) (50, 4) (100,) (50,)
Test Accuracy: 0.980
Predicted: [[0.8680804 0.12356871 0.00835086]] (class=0)
Copy the code

Regression of MLP

We will use the Boston Housing regression data set to demonstrate the MLP for regression prediction modeling.

The problem involves predicting home values based on the properties of houses and neighborhoods.

The dataset will be downloaded automatically using Pandas, but you can learn more here.

  • Boston Housing Data Set (CSV).
  • Description of Boston Housing Data set (CSV).

This is a regression problem that involves predicting a single value. Therefore, the output layer has a single node and uses a default or linear activation function (no activation function). When fitting the model, the loss of mean square error (MSE) is minimum.

 

# forecast row = [0.00632, 18.00, 2.310, 0,0.5380, 6.5750, 65.20, 4.0900, 1296.0 versus, 15.30, 396.90, 4.98] yhat = model. The predict ([row]) print('Predicted: %.3f' % yhat)Copy the code

Running the sample first reports the shape of the dataset, then fits the model and evaluates it on the test dataset. Finally, a single line of data is predicted.

Given the randomness of the learning algorithm, your specific results will vary. Try running the example a few times.

 

In this case, we can see that the model implements about 60 MSE, which is about 7 RMSE. Then, for a single example, the predicted value is about 26.

(339, 13) (167, 13) (339,) (167,)
MSE: 60.751, RMSE: 7.794
Predicted: 26.983
Copy the code

 

Develop convolutional neural network model

Convolutional neural network (CNN) is a network designed for image input.

They consist of models with convolution layers that extract features (called feature maps) and assemble the layers that decompose features into their most significant elements.

Although CNNS can be used for a variety of tasks that take images as input, they are best suited for image classification tasks.

The popular image classification task is MNIST handwritten digital classification. It involves thousands of handwritten digits that must be classified as numbers between 0 and 9.

The Tf.Keras API provides convenience for downloading and loading this data set directly.

The following example loads the dataset and draws the first few images.

Subplot (5, 5, I +1) # Plot the original pixel pyplot.imshow(trainX[I], cmap=pyplot.get_cmap('gray'))Copy the code

 

Running the sample loads the MNIST dataset and then aggregates the default training and test dataset.

Train: X=(60000, 28, 28), y=(60000,)
Test: X=(10000, 28, 28), y=(10000,)
Copy the code

Then create a graph that shows a sample grid of handwritten images from the training dataset.

Handwritten digital graphs in MNIST datasets

We can train CNN model to classify images in MNIST dataset.

Note that an image is an array of grayscale pixel data; Therefore, channel dimensions must be added to the data before images can be used as input to the model. The reason is that THE CNN model expects the image to adopt the channel last format, that is, each example of the network has the size of [row, column, channel], where channel represents the color channel of image data.

It is also a good idea to scale pixel values from the default range 0-255 to 0-1 when training CNN.

Code snippets for fitting and evaluating CNN models on MNIST datasets are listed below.

Print ([[image]]) print(' %d' % argmax(yhat) ')Copy the code

Running the sample will first report the shape of the dataset, then fit the model and evaluate it on the test dataset. Finally, a single image is predicted.

 

First, report the shape of each image and the number of categories; We can see that each image is 28 by 28 pixels, and we have 10 categories.

In this case, we can see that the model achieves about 98% classification accuracy on the test data set. Then we can see that the model predicts 5 categories of the first image in the training set.

(28, 28, 1) 10
Accuracy: 0.987
Predicted: class=5
Copy the code

 

Develop a recursive neural network model

Recursive neural networks (RNN) are designed to operate on data sequences.

They have proved very effective for natural language processing problems, where sequences of text are used as inputs to models. RNN has also achieved some success in time series prediction and speech recognition.

The most popular type of RNN is the Long Term Short Term memory Network, or LSTM. LSTM can be used in models to take sequences of input data and make predictions, such as assigning category labels or predicting values, such as the next value or multiple values in a sequence.

We will use the automobile sales data set to prove that LSTM RNN is used for univariate time series prediction.

The problem involves predicting the number of cars sold each month.

The dataset will be downloaded automatically using Pandas, but you can learn more here.

  • Automobile Sales Data set (CSV).
  • Description of automobile sales data set (CSV).

We’ll use the most recent five-month data window as a framework for our questions to predict the month’s numbers.

To achieve this, we will define a new function called split_sequence (), which splits the input sequence into data Windows suitable for fitting supervised learning models such as LSTM.

For example, if the order is:

1, 2, 3, 4, 5, 6, 7, 8, 9, 10
Copy the code

 

The sample for the training model would then look like this:

Input Output
1, 2, 3, 4, 5 6
2, 3, 4, 5, 6 7
3, 4, 5, 6, 7 8
...
Copy the code

We will use the last 12 months as our test data set.

LSTM expects each sample in the dataset to have two dimensions. The first is the number of time steps (in this case, 5) and the second is the number of observations per time step (in this case, 1).

Since this is a regression problem, we will use the linear activation function (non-activation function) in the output layer and optimize the mean square error loss function. We will also use the mean absolute Error (MAE) indicator to evaluate the model.

Examples of fitting and evaluating LSTM for univariate time series prediction problems are listed below.

# LSTM from numpy import SQRT from numpy import ASarray from pandas import read_csv from tensorflow.keras import Sequential from tensorflow.keras. Layers import Dense from tensorflow.keras. Layers import LSTM # split_sequence(sequence, n_steps): X, y = list(), list() for i in range(len(sequence)): # find the end of this pattern end_ix = i + n_steps if end_ix > len(sequence)-1: Seq_x, seq_Y = sequence[I :end_ix], sequence[end_ix] x.apend (seq_x) y.apend (seq_y) return ASarray (X), Df = read_csv(path, header=0, index_col=0, Squeeze =True) # retrieve the values values = df.values. Astype ('float32') # define window size n_steps = 5Copy the code

Running the sample will first report the shape of the dataset, then fit the model and evaluate it on the test dataset. Finally, a single example is predicted.

Given the randomness of the learning algorithm, your specific results will vary. Try running the example a few times.

 

In this case, the model has an MAE of about 2,800, and the next value in the sequence predicted from the test set is 13,199, where the expected value is 14,577 (very close).

(91, 5, 1) (12, 5, 1) (91,) (12,)
MSE: 12755421.000, RMSE: 3571.473, MAE: 2856.084
Predicted: 13199.325
Copy the code

Note: It is good practice to differentiate and stabilize the data before fitting the model.

How do I use advanced model capabilities

In this section, you’ll discover how to use some of the slightly more advanced model features, such as viewing the learning curve and saving the model for later use.

How to visualize deep learning models

The architecture of deep learning models can quickly become large and complex.

Therefore, it is important to have a clear understanding of the connections and data flows in the model. This is especially important if you use functional apis to ensure that the layers of the model are really connected as expected.

You can use two tools to visualize models: text description and drawing.

captions

A textual description of the model can be displayed by calling the summary () function on the model.

The following example defines a small three-tier model and then summarizes the structure.

Sequential() model.add(Dense(10, activation='relu', kernel_initializer='he_normal', input_shape=(8,))) model.add(Dense(8, activation='relu', kernel_initializer='he_normal')) model.add(Dense(1, Activation =' sigmoID ')) # Abstract model.summary()Copy the code

Running the example prints the summary for each layer as well as the total summary.

This is a diagnosis to check the output shape and the number of parameters (weights) in the model.

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
dense (Dense)                (None, 10)                90
_________________________________________________________________
dense_1 (Dense)              (None, 8)                 88
_________________________________________________________________
dense_2 (Dense)              (None, 1)                 9
=================================================================
Total params: 187
Trainable params: 187
Non-trainable params: 0
_________________________________________________________________
Copy the code

 

Model architecture diagram

You can create model diagrams by calling the plot_model () function.

This creates an image file containing block diagrams and line diagrams for each layer in the model.

The following example creates a small three-tier model and saves a diagram of the model architecture to ‘model.png’ containing input and output shapes.

Plot_model (model, 'model.png', show_shapes=True)Copy the code

Running the example creates a model diagram that shows boxes for each layer with shape information, along with arrows connecting layers to show the flow of data across the network.

Neural network architecture diagram

How to draw the model learning curve

A learning curve is a graph of a neural network model over time, such as one calculated at the end of each training period.

The learning curve provides insight into the learning dynamics of the model, such as whether the model is well learned, whether the model fits into the training data set or whether the model fits into the training data set.

You can easily create a learning curve for your deep learning model.

First, you must update the call to the FIT function to include a reference to the validation dataset. This is part of the training set and is not used to fit the model, but rather to evaluate the performance of the model during training.

You can split the data manually and specify the VALIDation_data parameter, or you can use the validation_split parameter and specify the split percentage of the training data set, and then have the API perform the split for you. The latter is simpler for now.

The FIT function returns a history object containing traces of the performance metrics recorded at the end of each training period. This includes the selected loss function and measures for each configuration (such as accuracy), and each loss and measure is calculated for the training and validation data set.

The learning curve is the loss graph on the training and validation datasets. We can use the Matplotlib library to create this graph from a history object.

The following example ADAPTS a small neural network to a synthetic binary classification problem. During training, a 30% validation scale was used to evaluate the model, and then a line graph was used to plot cross-entropy losses on the training and validation datasets.

Pyplot.title ('Learning Curves') Pyplot.xlabel ('Epoch') pyplot.ylabel('Cross Entropy') pyplot.plot(history.history['loss'], label='train') pyplot.plot(history.history['val_loss'], label='val') pyplot.legend() pyplot.show()Copy the code

Run the examples to fit the model to the data set. At the end of the run, the history object is returned and used as the basis for creating a line chart.

The cross entropy loss of the training data set can be accessed through the “loss” variable, and the loss of the data set can be verified through the “val_loss” access on the history property of the history object.

Cross entropy loss learning curve of deep learning model

How do I save and load the model

Training and evaluating models is great, but we might want to use the model later without having to retrain it every time.

This can be done by saving the model to a file, then loading it and using it for prediction.

This can be done by using the save () function on the model to save the model. You can load it later using the load_model () function.

The model is saved in H5 format, a valid array storage format. Therefore, you must ensure that you have the H5PY library installed on your workstation. This can be done using PIP; Such as:

pip install h5py
Copy the code

The following example fits a simple model into a synthetic binary classification problem and then saves the model file.

From sklearn. Datasets import make_classification from tensorflow. Keras import Sequential from sklearn Tensorflow. Keras. The layers import Dense from tensorflow. Keras. Optimizers import SGD # data set X, y = make_classification(n_samples=1000, n_features=4, n_classes=2, 1) # determine the number of input features n_features = x.shape [1] # define model = Sequential() model.add(Dense(10, activation='relu', kernel_initializer='he_normal', input_shape=(n_features,))) model.add(Dense(1, Activation =' sigmoID ') # Compile model SGD = SGD(Learning_rate =0.001, Momentum =0.8) model.compile(Optimizer = SGD, Loss ='binary_crossentropy') # fit model.fit(X, y, epochs=100, batch_size=32, verbose=0, Validation_split =0.3) # Save model.save('model.h5')Copy the code

Running the example will fit the model and save it to a file named “Model.h5”.

We can then load the model and use it for predictions, or continue to train it, or do whatever we want with it.

The following example loads the model and uses it to make predictions.

Datasets import make_classification from tensorflow.keras.models import load_model # y = make_classification(n_samples=1000, n_features=4, n_classes=2, Random_state =1) # load_model('model.h5') # forecast row = [1.91518414, 1.14995454, -1.52847073, 0.79430654] Model. Predict ([row]) print(' %.3f' % yhat[0])Copy the code

Running the example loads an image from a file, then uses it to predict a new row of data and print the results.

Predicted: 0.831
Copy the code

 

How to get better model performance

In this section, you’ll find some techniques you can use to improve the performance of deep learning models.

A big part of improving deep learning performance involves avoiding overfitting by slowing down the learning process or stopping the learning process at the appropriate time.

How to reduce Overfitting: Dropout

This was achieved during training, where some layer output was randomly ignored or “off line”.

You can add Dropout to the model in the new model before you want to delete the layers for the input connection.

This involves adding a layer called Dropout (), which takes a parameter that specifies the probability that each output of the previous output will drop. For example, 0.4 means that 40% of the input is removed each time the model is updated.

You can also add Dropout layers to MLP, CNN, and RNN models, although you may also want to explore special versions of Dropout for use with CNN and RNN models.

The following example fits a small neural network model into a synthetic binary classification problem.

Insert a filter layer with a 50% filter rate between the first hidden layer and the output layer.

Dropout sample from sklearn. Datasets import make_classification from tensorflow. Keras import Sequential from Tensorflow. keras. Layers import Dense from tensorflow.keras. Layers import Dropout from matplotlib import Pyplot # data X, y = make_classification(n_samples=1000, n_classes=2, Random_state =1) # determine the number of input features n_features = x.shape [1] # model = Sequential() model.add(Dense(10, activation='relu', kernel_initializer='he_normal', Input_shape = (n_features,))) model. The add (Dropout (0.5)) model. The add (Dense (1, Activation ='sigmoid') # Compile model.compile(Optimizer =' Adam ', loss='binary_crossentropy') # compile model.fit(X, y, epochs=100, batch_size=32, verbose=0)Copy the code

How can batch normalization speed up training

The size and distribution of input to a layer greatly affects the level of training at that layer.

This is often why it is a good idea to standardize input data before modeling with a neural network model.

Batch normalization is a technique used to train very deep neural networks that standardize each input. This has the effect of stabilizing the learning process and significantly reducing the number of training periods required to train the deep network.

 

You can use batch normalization in a network by adding a batch normalization layer before the layer that you want to have standardized input. You can use batch standardization for MLP, CNN, and RNN models.

 

The following example defines a small MLP network for binary classification prediction problems, with a batch normalization layer between the first hiding layer and the output layer.

Datasets import make_classification from tensorflow. Keras import Sequential from sklearn.datasets import make_classification from tensorflow tensorflow.keras.layers import Dense from tensorflow.keras.layers import BatchNormalization from matplotlib import Pyplot # Data X, y = make_classification(n_samples=1000, n_classes=2, Sequential() model = Sequential() model.add(Dense(10, activation='relu', kernel_initializer='he_normal', input_shape=(n_features,))) model.add(BatchNormalization()) model.add(Dense(1, Activation ='sigmoid') # Compile model.compile(Optimizer =' Adam ', loss='binary_crossentropy') # compile model.fit(X, y, epochs=100, batch_size=32, verbose=0)Copy the code

 

How to stop training at the right time and as early as possible

Neural networks are challenging.

Too little training, the model doesn’t fit; Too much training and the model fits too well into the training data set. In both cases, the effectiveness of the model is reduced.

One way to solve this problem is to use early stop. This involves monitoring the loss of the training dataset and the validation dataset (subsets of the training dataset are not used to fit the model). Once the loss of the validation set begins to show signs of overfitting, the training process can be stopped.

 

You can use early stops on models by first making sure you have validation data sets. You can manually define the validation dataset with the validation_data parameter of the fit () function, or you can use validation_split and specify the number of training datasets to keep for validation.

You can then define EarlyStopping and instruct it to monitor the performance metrics to be monitored, such as “val_loss” to confirm the loss of the validation dataset, as well as the number of periods of overfitting observed before taking action, such as 5.

You can then provide the configured EarlyStopping callback to the FIT () function by taking the “callbacks” parameter of the callback list.

This allows you to set the period number to large, confident that the training will end once the model starts to overfit. You may also want to create a learning curve to discover more insights into the learning dynamics of running and stopping training.

The following example demonstrates a small neural network with a synthetic binary classification problem that uses the stop function to stop training immediately after the model starts to overfit (about 50 calendar elements later).

Fit (X, y, epochs=200, batch_size=32, X, y, epochs=200, batch_size=32, Verbose = 0, validation_split = 0.3, callbacks = [es])Copy the code

 

reference

1. Improved Nelson-Siegel model fitting yield curve analysis with r language using neural network

2. R language to achieve fitting neural network prediction and result visualization

3. Python uses genetic algorithm-neural network-fuzzy logic control algorithm for lottery analysis

4. Python for NLP: Classification using Keras’s multi-label text LSTM neural network

5. Use R language to realize the neural network to predict the stock example

6.R language deep learning image classification based on Keras small data set

7. An example of SEQ2SEQ model for NLP uses Keras for neural machine translation

8. Deep learning model analysis of sugar based on grid search algorithm optimization in Python

9. Matlab uses Bayesian optimization for deep learning