Weight constraint in Keras is used to reduce overfitting in deep neural networks

Weight constraints provide a way to reduce the over-fitting of deep learning neural network models to training data and improve the performance of models to new data, such as test sets. There are many types of weight constraints, such as maximum and unit vector specifications, and some require hyperparameters that must be configured. \

In this tutorial, you’ll discover the Keras API for adding weight constraints to deep learning neural network models to reduce overfitting.

After completing this tutorial, you will know:

How to create vector norm constraints using the Keras API.
How to use Keras API to add weight constraints for MLP, CNN, and RNN layers.
How to reduce overfitting by adding weight constraints to existing models.

Tutorial overview

This tutorial is divided into three parts, they are:

Keras weight constraint
Weight constraints on layers
Case study of weight constraint

Keras weight constraint

Keras API supports weight limits. Constraints are specified by layer, but each node is applied and enforced within the layer.

Using constraints typically involves setting the kernel_constraint parameter on the layer for the input weights and biAS_constraint for the deviation weights.

In general, weight constraints are not used for bias weights. A different set of vector specifications can be used as constraints, provided as classes in the keras.Constraints module. They are:

Maximum norm (max_norm) to enforce weights equal to or below a given limit.
Nonnegative gauge (non_neg), the mandatory weight has a positive number.
Unit norm (unit_norm), the mandatory weight is 1.0.
Min-MaxNorm (min_max_norm), used for coercive power weights between a range.

For example, a simple constraint can be introduced and instantiated as follows:

# import norm
from keras.constraints import max_norm
# instantiate norm
norm = max_norm(3.0)
# import norm
from keras.constraints import max_norm
# instantiate norm
norm = max_norm(3.0)
Copy the code

Weight constraints on layers

The weight specification is available for most layers of Keras. In this section, we’ll look at some common examples.

MLP weighting constraint

The following example sets the maximum norm weight constraint on the dense complete connection layer.

# example of max norm on a dense layer
from keras.layers import Dense
from keras.constraints import max_norm
...
model.add(Dense(32, kernel_constraint=max_norm(3), bias_constraint==max_norm(3)))... # example of max norm on a dense layer from keras.layersimport Dense
from keras.constraints import max_norm
...
model.add(Dense(32, kernel_constraint=max_norm(3), bias_constraint==max_norm(3)))...Copy the code

CNN weighting constraint

The following example sets the maximum norm weight constraint at the convolution layer.

# example of max norm on a cnn layer
from keras.layers import Conv2D
from keras.constraints import max_norm
...
model.add(Conv2D(32, (3.3), kernel_constraint=max_norm(3), bias_constraint==max_norm(3)))...Copy the code

RNN weight constraint

Unlike other layer types, recursive neural networks allow you to set weight constraints on input weights and deviations as well as cyclic input weights. Set the repetition weight constraint with the layer’s recurrent_constraint parameter. The following example sets the maximum norm weight constraint on an LSTM layer.

# example of max norm on an lstm layer
from keras.layers import LSTM
from keras.constraints import max_norm
...
model.add(LSTM(32, kernel_constraint=max_norm(3), recurrent_constraint=max_norm(3), bias_constraint==max_norm(3)))... # example of max norm on an lstm layer from keras.layersimport LSTM
from keras.constraints import max_norm
...
model.add(LSTM(32, kernel_constraint=max_norm(3), recurrent_constraint=max_norm(3), bias_constraint==max_norm(3)))...Copy the code

Now that we know how to use the heavy constraint API, let’s look at an example in action.

Case studies of weighted constraints

In this section, we will demonstrate how to use the weight constraint to reduce MLP overfitting of simple binary classification problems. This example provides a template for applying weight constraints to your own neural network for classification and regression problems.

Binary classification problem

We will use the standard binary classification problem to define two semicircular datasets, one for each class. Each observation has two input variables that have the same ratio, with a class output value of 0 or 1. This dataset is called the “moon” dataset because of the shape of the observations in each class when drawing. We can use the make_moons () function to generate observations from this problem. We’ll add noise to the data and seed the random number generator so that the same sample is generated every time we run the code.

# generate 2d classification dataset
X, y = make_moons(n_samples=100, noise=0.2, random_state=1)
Copy the code

We can plot two variables as data sets for x and y coordinates on the graph and use class values as colors for observation. A complete example of generating a dataset and drawing a dataset is listed below.

# generate two moons dataset
from sklearn.datasets import make_moons
from matplotlib import pyplot
from pandas import DataFrame
# generate 2d classification dataset
X, y = make_moons(n_samples=100, noise=0.2, random_state=1)
# scatter plot, dots colored by class value
df = DataFrame(dict(x=X[:,0], y=X[:,1], label=y))
colors = {0:'red'.1:'blue'}
fig, ax = pyplot.subplots()
grouped = df.groupby('label')
for key, group in grouped:
    group.plot(ax=ax, kind='scatter', x='x', y='y', label=key, color=colors[key])
pyplot.show()
 
# generate two moons dataset
from sklearn.datasets import make_moons
from matplotlib import pyplot
from pandas import DataFrame
# generate 2d classification dataset
X, y = make_moons(n_samples=100, noise=0.2, random_state=1)
# scatter plot, dots colored by class value
df = DataFrame(dict(x=X[:,0], y=X[:,1], label=y))
colors = {0:'red'.1:'blue'}
fig, ax = pyplot.subplots()
grouped = df.groupby('label')
for key, group in grouped:
    group.plot(ax=ax, kind='scatter', x='x', y='y', label=key, color=colors[key])
pyplot.show()
Copy the code

Running the example creates a scatter diagram showing the semicircle or moon shape observed in each class. We can see that the noise in the scattered points makes the satellite less obvious.

This is a good test problem because classes cannot be separated by a single line, such as ones that are not linearly separable and require non-linear methods such as neural networks to solve. We only generated 100 samples, which is small for a neural network and provides the opportunity to overfit the training data set, with a higher error on the test data set: a good example of using regularization. In addition, the sample has noise, giving the model the opportunity to learn various aspects of the inconsistent sample.

Excessive multilayer perceptron

We can develop an MLP model to solve this binary classification problem. The model will have a hidden layer with more nodes than are needed to solve the problem, providing opportunities for overfitting. We also train the model longer than it takes to make sure it’s over the top. Before we define the model, we split the data set into training set and test set, using 30 examples to train the model and 70 examples to evaluate the performance of the fit model.

X, y = make_moons(n_samples=100, noise=0.2, random_state=1)
# split into train and test
n_train = 30
trainX, testX = X[:n_train, :], X[n_train:, :]
trainy, testy = y[:n_train], y[n_train:]
Copy the code

Next, we can define the model. The hidden layer uses the linear activation function of 500 nodes and rectification in the hidden layer. Use s-shaped activation functions in the output layer to predict class values of 0 or 1. The model is optimized using binary cross entropy loss function and is suitable for binary classification problems and efficient Adam versions of gradient descent.

# define model
model = Sequential()
model.add(Dense(500, input_dim=2, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
Copy the code

The model is then defined to fit 4,000 training data with a default batch size of 32. We will also use the test dataset as the validation dataset.

# fit model
history = model.fit(trainX, trainy, validation_data=(testX, testy), epochs=4000, verbose=0)
Copy the code

We can evaluate the performance of the model on test data sets and report results.

# evaluate the model
_, train_acc = model.evaluate(trainX, trainy, verbose=0)
_, test_acc = model.evaluate(testX, testy, verbose=0)
print('Train: %.3f, Test: %.3f' % (train_acc, test_acc))
Copy the code

Finally, we will plot the performance of the model on the training set and test set for each period. If the model does overfit the training data set, we would expect the accuracy graph on the training set to continue to increase and the test Settings to rise and then fall again as the model learns statistical noise in the training data set.

# plot history
pyplot.plot(history.history['acc'], label='train')
pyplot.plot(history.history['val_acc'], label='test')
pyplot.legend()
pyplot.show()
Copy the code

We can put all these pieces together; The complete example is listed below.

# mlp overfit on the moons dataset
from sklearn.datasets import make_moons
from keras.layers import Dense
from keras.models import Sequential
from matplotlib import pyplot
# generate 2d classification dataset
X, y = make_moons(n_samples=100, noise=0.2, random_state=1)
# split into train and test
n_train = 30
trainX, testX = X[:n_train, :], X[n_train:, :]
trainy, testy = y[:n_train], y[n_train:]
# define model
model = Sequential()
model.add(Dense(500, input_dim=2, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
# fit model
history = model.fit(trainX, trainy, validation_data=(testX, testy), epochs=4000, verbose=0)
# evaluate the model
_, train_acc = model.evaluate(trainX, trainy, verbose=0)
_, test_acc = model.evaluate(testX, testy, verbose=0)
print('Train: %.3f, Test: %.3f' % (train_acc, test_acc))
# plot history
pyplot.plot(history.history['acc'], label='train')
pyplot.plot(history.history['val_acc'], label='test')
pyplot.legend()
pyplot.show()
Copy the code

Run the sample to report model performance on trains and test data sets. We can see that the model outperforms the test data set on the training data set, which is a possible sign of overfitting. Given the randomness of neural networks and training algorithms, your specific results may vary. Because the model is overfitted, we generally do not expect to run the model repeatedly over the same data set with differences in accuracy (if any).

Train: 1.000, Test: 0.914
Copy the code

Create a graph that shows a graph of model accuracy on the training set and test set. We can see the expected shape of the over-fit model, where the test accuracy increases to a point and then starts to decrease again.

Overfit MLP with weighted constraints

We can update the example to use the heavy constraint. There are a few different weighting limits to choose from. A nice simple constraint on this model is to simply normalize the weights so that the norm is equal to 1.0. This constraint has the effect of forcing all incoming weights to be small. We can do this by using unit_norm in Keras. You can add this constraint to the first hidden layer as follows:

model.add(Dense(500, input_dim=2, activation='relu', kernel_constraint=unit_norm()))
Copy the code

We can also achieve the same result by using min_max_norm and setting min and maximum to 1.0, for example:

model.add(Dense(500, input_dim=2, activation='relu', kernel_constraint=min_max_norm(min_value=1.0, max_value=1.0)))
Copy the code

We cannot get the same result with the maximum norm constraint because it allows the specification to be equal to or below the specified limit; Such as:

model.add(Dense(500, input_dim=2, activation='relu', kernel_constraint=max_norm(1.0)))
Copy the code

Examples of complete updates with unit specification constraints are listed below:

# mlp overfit on the moons dataset with a unit norm constraint
from sklearn.datasets import make_moons
from keras.layers import Dense
from keras.models import Sequential
from keras.constraints import unit_norm
from matplotlib import pyplot
# generate 2d classification dataset
X, y = make_moons(n_samples=100, noise=0.2, random_state=1)
# split into train and test
n_train = 30
trainX, testX = X[:n_train, :], X[n_train:, :]
trainy, testy = y[:n_train], y[n_train:]
# define model
model = Sequential()
model.add(Dense(500, input_dim=2, activation='relu', kernel_constraint=unit_norm()))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
# fit model
history = model.fit(trainX, trainy, validation_data=(testX, testy), epochs=4000, verbose=0)
# evaluate the model
_, train_acc = model.evaluate(trainX, trainy, verbose=0)
_, test_acc = model.evaluate(testX, testy, verbose=0)
print('Train: %.3f, Test: %.3f' % (train_acc, test_acc))
# plot history
pyplot.plot(history.history['acc'], label='train')
pyplot.plot(history.history['val_acc'], label='test')
pyplot.legend()
pyplot.show()
Copy the code

Run the sample to report model performance on the training set and test data set. We can see that the strict restriction on the weight size does improve the performance of the model on the test set without affecting the performance of the training set.

Train: 1.000, Test: 0.943
Copy the code

Reviewing the curves and test accuracy of the training set, we can see that the model has overfitted the training data set. The model accuracy of the training set and test set continued to improve to a stable level.

extension

This section lists some ideas for extending the tutorial that you may wish to explore.

Report weighted criteria. Update the example to calculate the size of the network weights and prove that constraints do make the magnitude smaller.
Constraint output layer. Update the sample to add constraints to the output layer of the model and compare the results.
Constraint bias. Update the example to add constraints to the bias weights and compare the results.
Re-evaluate. The example is updated to fit and evaluate the model multiple times and to report the mean and standard deviation of the model performance.

Author: Yishui Hancheng, CSDN blog expert, personal research interests: machine learning, deep learning, NLP, CV

Blog: yishuihancheng.blog.csdn.net

Appreciate the author

Python Chinese community as a decentralized global technology community, to become the world’s 200000 Python tribe as the vision, the spirit of Chinese developers currently covered each big mainstream media and collaboration platform, and ali, tencent, baidu, Microsoft, amazon and open China, CSDN industry well-known companies and established wide-ranging connection of the technical community, Have come from more than 10 countries and regions tens of thousands of registered members, members from the ministry, tsinghua university, Peking University, Beijing university of posts and telecommunications, the People’s Bank of China, the Chinese Academy of Sciences, cicc, huawei, BAT, such as Google, Microsoft, government departments, scientific research institutions, financial institutions, and well-known companies at home and abroad, nearly 200000 developers to focus on the platform.

Recommended reading:

Read common cache problems in high concurrency scenarios \

Using Django to develop DApp\ based on Ethereum smart contract

Let’s read Python tasks like celery\

5 minutes on chain calls in Python

Create a Bitcoin price alert application in Python

▼ clickBecome a community member and click on itIn the see

Weight constraint in Keras is used to reduce overfitting in deep neural networks

Related Posts

You asked me about distributed sessions, and I said rattail juice

Various positions of permanent obstruction in Golang

Productivity: Reduce rework through PyCharm’s template code