CapsuleNets Theory and Practice in CapsuleNets deep Learning

Neural networks, first proposed in the 1950s, have developed rapidly in the last decade and are changing every aspect of our world. From image classification to natural language processing, researchers are building deep neural network models and achieving breakthroughs in various fields. However, with the further development of deep learning, it faces a new bottleneck — only deepening and broadening the mature network model. Until recently, Hinton had proposed a new concept, Capsule Networks, that improved the effectiveness and understandability of traditional methods.

This article will explain why capsule networks are popular and use practical code to strengthen and consolidate understanding of the concept.

Why is the capsule network getting so much attention?

For each network structure, MINST script data sets are generally used to verify its performance. For identifying digital script problems, given a simple grayscale image, the user needs to predict the number it will display. This is an unstructured digital image recognition problem, and deep learning algorithm can achieve the best performance. This paper will use this data set to test three deep learning models, namely, multi-layer Perceptron (MLP), convolutional Neural network (CNN) and Capsule Networks.

Multilayer perceptron (MLP)

Use Keras to build a multi-layer perceptron model, the code is as follows:

# define variables
input_num_units = 784
hidden_num_units = 50
output_num_units = 10

epochs = 15
batch_size = 128

# create model
model = Sequential([
 Dense(units=hidden_num_units, input_dim=input_num_units, activation='relu'),
 Dense(units=output_num_units, input_dim=hidden_num_units, activation='softmax'),
])

# compile the model with necessary attributes
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])Copy the code

Print the outline of model parameters:

After 15 iterations of training, the results are as follows:

Epoch 14/15 34300/34300 [= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =] 41 us/step - - 1 s loss: 0.0597 acc: 0.9834 - val_loss: 0.1227 - val_acc: 0.9635 Epoch 15/15 34300/34300 [= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =] 41 us/step - - 1 s loss: 0.0553 acc: 0.9842-VAL_loss: 0.1245-VAL_ACC: 0.9637Copy the code

As you can see, this model is really simple!

Convolutional Neural Network (CNN)

Convolutional neural networks are widely used in deep learning and perform well. The convolutional neural network model is constructed as follows:

# define variables
input_shape = (28, 28, 1)

hidden_num_units = 50
output_num_units = 10

batch_size = 128

model = Sequential([
 InputLayer(input_shape=input_reshape),

Convolution2D(25, 5, 5, activation='relu'),
 MaxPooling2D(pool_size=pool_size),

Convolution2D(25, 5, 5, activation='relu'),
 MaxPooling2D(pool_size=pool_size),

Convolution2D(25, 4, 4, activation='relu'),

Flatten(),

Dense(output_dim=hidden_num_units, activation='relu'),

Dense(output_dim=output_num_units, input_dim=hidden_num_units, activation='softmax'),
])

model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])Copy the code

Print the outline of model parameters:

It can be seen from the figure above that CNN is more complex than MLP model, and its performance is as follows:

Epoch 14/15 34/34 [= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =] - 4 s 108 ms/step - loss: 0.1278 acc: 0.9604 - val_loss: 0.0820 - val_acc: 0.9757 Epoch 15/15 34/34 [= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =] - 4 s 110 ms/step - loss: 0.1256 acc: 0.9626-VAL_loss: 0.0827-VAL_ACC: 0.9746Copy the code

It can be found that CNN training takes a long time, but its performance is excellent.

Capsule Network

The structure of capsule network is more complex than CNN network. The capsule network model is constructed as follows:

def CapsNet(input_shape, n_class, routings):
   x = layers.Input(shape=input_shape)

   # Layer 1: Just a conventional Conv2D layer
   conv1 = layers.Conv2D(filters=256, kernel_size=9, strides=1, padding='valid', activation='relu', name='conv1')(x)

   # Layer 2: Conv2D layer with `squash` activation, then reshape to [None, num_capsule, dim_capsule]
   primarycaps = PrimaryCap(conv1, dim_capsule=8, n_channels=32, kernel_size=9, strides=2, padding='valid')

   # Layer 3: Capsule layer. Routing algorithm works here.
   digitcaps = CapsuleLayer(num_capsule=n_class, dim_capsule=16, routings=routings,
   name='digitcaps')(primarycaps)

   # Layer 4: This is an auxiliary layer to replace each capsule with its length. Just to match the true label's shape.
   # If using tensorflow, this will not be necessary. :)
   out_caps = Length(name='capsnet')(digitcaps)

   # Decoder network.
   y = layers.Input(shape=(n_class,))
   masked_by_y = Mask()([digitcaps, y]) # The true label is used to mask the output of capsule layer. For training
   masked = Mask()(digitcaps) # Mask using the capsule with maximal length. For prediction

   # Shared Decoder model in training and prediction
   decoder = models.Sequential(name='decoder')
   decoder.add(layers.Dense(512, activation='relu', input_dim=16*n_class))
   decoder.add(layers.Dense(1024, activation='relu'))
   decoder.add(layers.Dense(np.prod(input_shape), activation='sigmoid'))
   decoder.add(layers.Reshape(target_shape=input_shape, name='out_recon'))

   # Models for training and evaluation (prediction)
   train_model = models.Model([x, y], [out_caps, decoder(masked_by_y)])
   eval_model = models.Model(x, [out_caps, decoder(masked)])

   # manipulate model
   noise = layers.Input(shape=(n_class, 16))
   noised_digitcaps = layers.Add()([digitcaps, noise])
   masked_noised_y = Mask()([noised_digitcaps, y])
   manipulate_model = models.Model([x, y, noise], decoder(masked_noised_y))

   return train_model, eval_model, manipulate_modelCopy the code

Print the outline of model parameters:

This model takes a long time. After a period of training, the following results are obtained:

Epoch 14/15 34/34 [= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =] - 108 - s 3 s/step - loss: 0.0445 - capsnet_loss: 0.0218 - decoder_loss: 0.0579 - CAPsnet_ACC: 0.9846 - val_loss: 0.0364 - val_CAPsnet_loss: 0.0159 - val_decoder_loss: 0.0522 - val_capsnet_acc: 0.9887 Epoch 15/15 34/34 [= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =] - 107 - s 3 s/step - loss: 0.0423 - CAPsnet_loss: 0.0201-decoder_loss: 0.0567-CAPsnet_ACC: 0.9859-val_Loss: 0.0362-val_CAPsnet_loss: 0.0201-decoder_loss: 0.0567-CAPsnet_ACC: 0.9859-val_loss: 0.0362-val_CAPsnet_loss: 0.0162-VAL_decoder_Loss: 0.0510-val_CAPsnet_ACC: 0.9880Copy the code

It can be found that this network has better effect than the previous traditional network model. The following figure summarizes three experimental results:

This experiment also proves that capsule network is worthy of further study and discussion.

The concept behind the capsule network

To understand the concept of a capsule network, this article will use a picture of a cat as an example to illustrate the potential of a capsule network, starting with a question – what is the animal in the image below?

It’s a cat, you must have guessed it! But how do you know it’s a cat? Now break this down:

Case 1 — Simple image

How did you know it was a cat? It is possible to break it down into individual features such as eyes, nose, ears, etc. As shown below:

Therefore, it is essentially a decomposition of higher-level features into lower-level features. For example:

P (face) = P (nose) & (2 x P (beard)) & P (mouth) & (2 x P (eyes)) & (2 x P (ears))

Where, P(face) is defined as the presence of a cat face in the image. Through iteration, you can define more low-level features, such as shapes and edges, to simplify the process.

Case 2 — Rotate the image

Rotate the image 30 degrees, as shown below:

If you use the same features as before, you won’t be able to identify it as a cat. This is because the direction of the underlying feature has changed, causing the previously defined feature to change as well.

To sum up, a cat recognizer might look something like this:

To be more specific, it is expressed as:

P = (P (nose) (face) & (2 x P (beard)) & P (mouth) & (2 x P (eyes)) & (2 x P (ears))) OR

(P(rotated_ nose) &(2 x P(rotated_ beard) &P (rotated_ mouth) &(2 x P(rotated_ eyes) &(2 x P(rotated_ ear)))

Case 3 — Flip the image

To add complexity, here’s a fully flipped image:

One approach might be to brute force search for all possible rotations of low-level features, but this approach is time consuming and labor-intensive. Therefore, the researchers propose that additional attributes, such as rotation Angle, contain the low-level features themselves. In this way, not only the existence of features can be detected, but also the existence of rotation, as shown in the following figure:

To be more specific, it is expressed as:

P = (face) [P (nose), R (nose)] and [P (beard _1), R (beard _1)] and [P (beard _2), R (beard) _2] & [P (mouth), R (mouth)] &…

Where, the rotation feature is represented by R(), which is also called rotation equivalence.

As can be seen from the above situation, the ability to capture more low-level features such as scale, thickness, etc. by enlarging the idea will help us to understand an object’s image more clearly. This is how the capsule network was designed to work.

Another feature of the capsule network is dynamic routing, which is explained in the following section with the cat/dog classification problem.

The two animals above look very similar, but there are some differences. Can you spot the dog?

As we did before, the features in the image are defined to find differences.

As shown, define very low-level facial features such as eyes, ears, etc., and combine them to find a face. After that, the facial and body features were combined for the task of determining whether it was a cat or a dog.

Now suppose that there is a new image and the extracted low-level features, and its category needs to be determined based on the above information. Can we pick a random feature, like an eye, and make a category based on that alone?

The answer is no, because eyes are not a distinguishing factor. The next step is to analyze more features, such as the nose, which is chosen at random next.

Only the features of eyes and noses cannot complete the classification task. The next step is to obtain all the features and combine them to determine the category. As shown in the picture below, the category can be determined by combining four features: eyes, nose, ears and beard. Based on the above procedure, this step is performed iteratively at each feature level to route the correct information to the feature detector where the classification information is required.

In the capsule component, when the higher level capsule agrees to the lower level capsule input, the lower level capsule input to the higher level capsule, which is the essence of dynamic routing algorithm.

Compared with traditional deep learning architectures, capsule networks are more robust in terms of data direction and Angle, and can even be trained at relatively few data points. The disadvantage of capsule network is that it requires more training time and resources.

Detailed explanation of capsule network code on MNIST dataset

First, the data set is downloaded from the Recognition digital script project, a digital script recognition problem which is mainly to recognize the number displayed on a given 28×28 image. Make sure Keras is installed before you start running the code.

Now open the Jupyter Notebook software and enter the following code. First import the required modules:

Copy the code

Then perform random initialization:

# To stop potential randomness
seed = 128
rng = np.random.RandomState(seed)Copy the code

Next set the directory path:

Copy the code

Copy the code

Copy the code

Copy the code

root_dir = os.path.abspath('.')
data_dir = os.path.join(root_dir, 'data')Copy the code

Now load the data set in.CSV format.

train = pd.read_csv(os.path.join(data_dir, 'train.csv'))
test = pd.read_csv(os.path.join(data_dir, 'test.csv'))

train.head()Copy the code

Show the numbers represented by the data:

Copy the code

img_name = rng.choice(train.filename)
filepath = os.path.join(data_dir, 'train', img_name)

img = imread(filepath, flatten=True)

pylab.imshow(img, cmap='gray')
pylab.axis('off')
pylab.show()Copy the code

Now save all images as Numpy arrays:

temp = [] for img_name in train.filename: image_path = os.path.join(data_dir, 'train', img_name) img = imread(image_path, Flatten =True) img = img.astype('float32') temp. appEnd (img) train_x = Np. stack(temp) train_x /= 255.0 train_x = train_x.reshape(-1, 784).astype('float32') temp = [] for img_name in test.filename: image_path = os.path.join(data_dir, 'test', img_name) img = imread(image_path, Flatten =True) img = img.astype('float32') temp.append(img) test_x = np.stack(temp) test_x /= 255.0 test_x = test_x.reshape(-1, 784).astype('float32') train_y = keras.utils.np_utils.to_categorical(train.label.values)Copy the code

This is a classic machine learning problem, dividing the data set into 7:3. 70% of them are training sets and 30% are verification sets.

Split_size = int(train_x.shape[0]*0.7) train_x, val_x = train_x[:split_size], train_x[:split_size :] train_y, val_y = train_y[:split_size], train_y[split_size:]Copy the code

The performance of three different deep learning models for this data will be analyzed below, namely, multi-layer perceptron, convolutional neural network and capsule network.

1. Multilayer perceptron

Define a three-layer neural network, one input layer, one hidden layer and one output layer. The number of input and output neurons is fixed, the input is 28×28 image, the output is 10X1 vector representing the class, the hidden layer is set to 50 neurons, and the gradient descent algorithm is used to train.

# define vars
input_num_units = 784
hidden_num_units = 50
output_num_units = 10

epochs = 15
batch_size = 128

# import keras modules

from keras.models import Sequential
from keras.layers import InputLayer, Convolution2D, MaxPooling2D, Flatten, Dense

# create model
model = Sequential([
 Dense(units=hidden_num_units, input_dim=input_num_units, activation='relu'),
 Dense(units=output_num_units, input_dim=hidden_num_units, activation='softmax'),
])

# compile the model with necessary attributes
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])Copy the code

Print the outline of model parameters:

trained_model = model.fit(train_x, train_y, nb_epoch=epochs, batch_size=batch_size, validation_data=(val_x, val_y))Copy the code

After 15 iterations, the results are as follows:

Copy the code

Copy the code

Copy the code

Epoch 14/15 34300/34300 [= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =] 41 us/step - - 1 s loss: 0.0597 acc: 0.9834 - val_loss: 0.1227 - val_acc: 0.9635 Epoch 15/15 34300/34300 [= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =] 41 us/step - - 1 s loss: 0.0553 acc: 0.9842-VAL_loss: 0.1245-VAL_ACC: 0.9637Copy the code

The result is good, but it can be improved.

2. Convolutional neural network

Transform the image into a grayscale image (2D), and then input it into the CNN model:

# reshape data
train_x_temp = train_x.reshape(-1, 28, 28, 1)
val_x_temp = val_x.reshape(-1, 28, 28, 1)

# define vars
input_shape = (784,)
input_reshape = (28, 28, 1)


pool_size = (2, 2)

hidden_num_units = 50
output_num_units = 10

batch_size = 128Copy the code

The CNN model is defined below:

model = Sequential([
 InputLayer(input_shape=input_reshape),

Convolution2D(25, 5, 5, activation='relu'),
 MaxPooling2D(pool_size=pool_size),

Convolution2D(25, 5, 5, activation='relu'),
 MaxPooling2D(pool_size=pool_size),

Convolution2D(25, 4, 4, activation='relu'),

Flatten(),

Dense(output_dim=hidden_num_units, activation='relu'),

Dense(output_dim=output_num_units, input_dim=hidden_num_units, activation='softmax'),
])

model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

#trained_model_conv = model.fit(train_x_temp, train_y, nb_epoch=epochs, batch_size=batch_size, validation_data=(val_x_temp, val_y))
model.summary()Copy the code

Print the outline of model parameters:

Adjust the process by adding data:

# Begin: Training with data augmentation ---------------------------------------------------------------------# def Train_generator (x, y, batch_size, shift_fraction = 0.1) : train_datagen = ImageDataGenerator(width_shift_range=shift_fraction, height_shift_range=shift_fraction) # shift up to 2 pixel for MNIST generator = train_datagen.flow(x, y, batch_size=batch_size) while 1: x_batch, y_batch = generator.next() yield ([x_batch, y_batch]) # Training with data augmentation. If shift_fraction=0., Trained_model2 = Model. Fit_generator (Generator = train_GENERATOR (train_X_TEMP, train_Y, 1000, 0.1), steps_per_epoch=int(train_y.shape[0] / 1000), epochs=epochs, validation_data=[val_x_temp, val_y]) # End: Training with data augmentation -----------------------------------------------------------------------#Copy the code

Results of CNN model:

Copy the code


Copy the code

Epoch 14/15 34/34 [= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =] - 4 s 108 ms/step - loss: 0.1278 acc: 0.9604 - val_loss: 0.0820 - val_acc: 0.9757 Epoch 15/15 34/34 [= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =] - 4 s 110 ms/step - loss: 0.1256 acc: 0.9626-VAL_loss: 0.0827-VAL_ACC: 0.9746Copy the code

3. Capsule network

The capsule network model was established with the structure as shown in the figure:

The following code is used to build the model:

def CapsNet(input_shape, n_class, routings): """ A Capsule Network on MNIST. :param input_shape: data shape, 3d, [width, height, channels] :param n_class: number of classes :param routings: number of routing iterations :return: Two Keras Models, the first one used for training, and the second one for evaluation. `eval_model` can also be used for training. """ x = layers.Input(shape=input_shape) #  Layer 1: Just a conventional Conv2D layer conv1 = layers.Conv2D(filters=256, kernel_size=9, strides=1, padding='valid', activation='relu', name='conv1')(x) # Layer 2: Conv2D layer with `squash` activation, then reshape to [None, num_capsule, dim_capsule] primarycaps = PrimaryCap(conv1, dim_capsule=8, n_channels=32, kernel_size=9, strides=2, padding='valid') # Layer 3: Capsule layer. Routing algorithm works here. digitcaps = CapsuleLayer(num_capsule=n_class, dim_capsule=16, routings=routings, name='digitcaps')(primarycaps) # Layer 4: This is an auxiliary layer to replace each capsule with its length. Just to match the true label's shape. # If using tensorflow, this will not be necessary. :) out_caps = Length(name='capsnet')(digitcaps) # Decoder network. y = layers.Input(shape=(n_class,)) masked_by_y = Mask()([digitcaps, y]) # The true label is used to mask the output of capsule layer. For training masked = Mask()(digitcaps) # Mask using the capsule with maximal length. For prediction # Shared Decoder model in training and prediction decoder = models.Sequential(name='decoder') decoder.add(layers.Dense(512, activation='relu', input_dim=16*n_class)) decoder.add(layers.Dense(1024, activation='relu')) decoder.add(layers.Dense(np.prod(input_shape), activation='sigmoid')) decoder.add(layers.Reshape(target_shape=input_shape, name='out_recon')) # Models for training and evaluation (prediction) train_model = models.Model([x, y], [out_caps, decoder(masked_by_y)]) eval_model = models.Model(x, [out_caps, decoder(masked)]) # manipulate model noise = layers.Input(shape=(n_class, 16)) noised_digitcaps = layers.Add()([digitcaps, noise]) masked_noised_y = Mask()([noised_digitcaps, y]) manipulate_model = models.Model([x, y, noise], decoder(masked_noised_y)) return train_model, eval_model, manipulate_model def margin_loss(y_true, y_pred): """ Margin loss for Eq.(4). When y_true[i, :] contains not just one `1`, this loss should work too. Not test it. :param y_true: [None, n_classes] :param y_pred: [None, num_capsule] :return: a scalar loss value. """ L = y_true * K.square(K.maximum(0., 0.9-y_pred)) + \ 0.5 * (1-y_true) * k.mold (k.maximum (0., y_pred-0.1)) return k.mold (k.mold (L, 1)) model, eval_model, manipulate_model = CapsNet(input_shape=train_x_temp.shape[1:], n_class=len(np.unique(np.argmax(train_y, 1))), routings=3) # compile the model model.compile(optimizer=optimizers.Adam(lr=0.001), loss=[margin_loss, 'mse'], Metrics ={'capsnet': 'accuracy'}) model.summary()Copy the code

Print the outline of model parameters:

Copy the code

Results of capsule model:

Copy the code

Copy the code

Copy the code

Copy the code

Copy the code

Epoch 14/15 34/34 [= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =] - 108 - s 3 s/step - loss: 0.0445 - capsnet_loss: 0.0218 - decoder_loss: 0.0579 - CAPsnet_ACC: 0.9846 - val_loss: 0.0364 - val_CAPsnet_loss: 0.0159 - val_decoder_loss: 0.0522 - val_capsnet_acc: 0.9887 Epoch 15/15 34/34 [= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =] - 107 - s 3 s/step - loss: 0.0423 - CAPsnet_loss: 0.0201-decoder_loss: 0.0567-CAPsnet_ACC: 0.9859-val_Loss: 0.0362-val_CAPsnet_loss: 0.0201-decoder_loss: 0.0567-CAPsnet_ACC: 0.9859-val_loss: 0.0362-val_CAPsnet_loss: 0.0162-VAL_decoder_Loss: 0.0510-val_CAPsnet_ACC: 0.9880Copy the code

In order to facilitate summary and analysis, the structure of the above three experiments was drawn as the test accuracy diagram:

plt.figure(figsize=(10, 8))
plt.plot(trained_model.history['val_acc'], 'r', trained_model2.history['val_acc'], 'b', trained_model3.history['val_capsnet_acc'], 'g')
plt.legend(('MLP', 'CNN', 'CapsNet'),
 loc='lower right', fontsize='large')
plt.title('Validation Accuracies')
plt.show()Copy the code

It can be seen from the results that the accuracy of capsule network is superior to CNN and MLP.

conclusion

This paper gives a brief non-technical summary of the capsule network, analyzes its two important attributes, and then verifies the performance of multi-layer perceptron, convolutional neural network and capsule network on MNIST handwritten data set.

Dozens of ari cloud products limited time discount, quickly click on the coupon to start cloud practice!

The author information

Faizan Shaikh, Data science, deep learning beginner.

Personal homepage: www.linkedin.com/in/faizanks…

Essentials of Deep Learning this article was translated by Ali Yunqi Community Organization. ‘Getting to Know CapsuleNets (with Python Codes)’ by Faizan Shaikh.

The article is a brief translation. For more details, please refer to the original text

CapsuleNets Theory and Practice in CapsuleNets deep Learning

Related Posts

Spring Cloud Gateway aggregates Swagger documents

JUnit4 tutorial + practice

Introduction to Concurrent Programming in Java (12) Producer and Consumer patterns – Code templates