New intelligence unit compilation

Source: GitHub; jhui.github.io

Translation: Wen Qiang, Ma Wen

In the highly anticipated release of the code for Hinton Capsules network paper “Dynamic Routing between Capsules”, the number of forks on Github exceeded 14,000 in just five days. Can Capsule really replace CNN? Now it’s time for you to do it.

Sara Sabour, a creator of Hinton’s capsule network paper Dynamic Routing Between Capsules, has published the code on GitHub using TensorFlow and NumPy in just 5 days on a single GPU. There were more than 14,000 forks.

In fact, there were many other versions and implementations before the official code was published. Xin Zhiyuan also introduced the concept of capsule network in detail:

[1] [Hinton] Deep learning should start all over again and completely abandon back propagation

[2] Hinton Capsule, the cornerstone of deep learning, may be replaced by CNN

[3] Does Hinton’s Capsule really work better than CNN?

[4] CNN has two major flaws. Capsule is the next-generation CNN

[5] [Understanding Hinton’s latest Capsules] Where CNN is going in the future

[6] Capsule Network 9 Advantages and 4 Disadvantages (video + PPT)

Before looking at the code, though, it’s worth reviewing again Hinton’s innovative CNN paper, which Jonathan Hui has dissected on his blog, starting with the basics, in a very friendly way.

In deep learning, the level of neuron activation is often interpreted as the likelihood of detecting specific features.

However, CNN is good at detecting features, but poor in exploring the spatial relationship between features (Angle of view, size and orientation). For example, the image below might fool a simple CNN model into believing it was a real human face.

A simple CNN model correctly extracted features from the nose, eyes and mouth, but incorrectly activated neurons for face detection. If the spatial orientation is not understood and the size does not match, the activation for face detection will be too high, as shown below at 95%.

Now, suppose that each neuron contains characteristic possibilities and properties. For example, a neuron outputs a vector containing [probability, direction, magnitude]. Using this spatial information, the consistency of direction and size between nose, eye and ear features can be detected, so the activation output for face detection is much lower.

In Hinton’s capsule network paper, the word “capsule” is used to refer to such neurons.

Conceptually, we can think of CNN as training neurons to process different perspectives, with a layer of face detection neurons on the top layer.

As mentioned above, more convolution layers and feature maps were added to enable CNN to handle different perspectives or variations. However, this approach tends to memorize data sets rather than arrive at a more general solution, requiring large amounts of training data to cover different variants and avoiding overfitting. The MNIST dataset contains 55,000 training data, i.e. 5,500 samples for each number. However, children can remember numbers after seeing them a few times. Existing deep learning models including CNN are very inefficient in data utilization. To quote From Geoffrey Hinton:

It (convolutional network) works depressingly well.

Capsule networks are not trained to capture the characteristics of a particular variant, but rather to capture the characteristics and the possibility of variants. So the capsules are designed not only to detect features, but also to train models to learn variants.

In this way, the same capsule can detect the same object class in different directions (for example, clockwise rotation) :

Where Invariance corresponds to the detection of features, which are invariable. For example, the neurons that detect the nose detect the nose regardless of direction. However, the loss of spatial orientation of neurons ultimately undermines the validity of this invariance model.

Equivariance corresponds to variant detection, that is, objects that can be converted to each other (such as detecting faces in different directions). Intuitively, the capsule network detects a 20° rotation of the face, rather than achieving a face matching the 20° rotation variant. By forcing models to learn characteristic variants in capsules, we can infer possible variants more effectively with less training data. In addition, counter attacks can be more effectively prevented.

A capsule is a set of neurons that capture not only the possibility of a feature, but also the parameters of a specific feature.

For example, the first line below represents the probability that a neuron detects the number “7.” A 2-D capsule is a network of two neurons. The capsule outputs a 2-D vector when detecting the number “7”. For the first image in the second row, it outputs a vector v=(0,0.9)v=(0,0.9). The magnitude of the vector 0.9 corresponds to the probability of detecting “7”. The second image in each row looks more like a “1” than a “7.” Therefore, the corresponding probability of “7” is low.

In the third row, rotate the image 20°. The capsule will produce vectors of the same magnitude but in different directions. Here, the Angle of the vector represents the rotation of the number “7”. Finally, two neurons can be added to capture size and stroke width (see figure below).

We call the output vector of the capsule an activity vector, whose amplitude represents the probability of detecting features, and whose direction represents its parameters (attributes).

When calculating the output of a capsule network, first look at a fully connected neural network:

Where the output of each neuron is calculated from the output of the neuron in the previous layer:

Among them,andIs a scalar

For capsule networks, an input to capsuleAnd the outputThese are vectors.

We take a transformation matrixApply to the previous layer’s capsule output. For example, use oneThe matrix, let’s take a k minus DIt’s an m minus D . And then calculateThe weighted sum of:

Among them,It is trained by the iterative Dynamic routing ProcessThe coupling coefficientCoupling coefficientsIt’s designed to sum up to one.

Instead of using ReLU, we use a squashing function to shorten the vector between 0 and unit length.

It reduces the short vector to almost zero, and the long vector to almost unit vectors. So the likelihood of each capsule is between 0 and 1.

In deep learning, we use back propagation to train model parameters. The transformation matrix Wij is still trained by back propagation in capsules. However, the coupling coefficient CIJ is calculated using a new iterative dynamic routing method.

Here is the final pseudocode for dynamic routing:

In deep learning, we use back propagation to train model parameters based on cost functions. These parameters (weights) control the routing of signals from one layer to another. If the weight between two neurons is zero, the activation of the neuron does not propagate to that neuron.

Iterative dynamic routing provides an alternative to how to route signals based on characteristic parameters. By using characteristic parameters, it is theoretically possible to better group the capsules and form a high-level structure. For example, the capsule layer may eventually manifest as an analysis tree that explores “partial-whole” relationships. For example, the face is made up of eyes, nose and mouth. Iterative dynamic routing uses the properties of transformation matrices, possibilities, and features to control how much of the signal propagates up to the upper capsule.

Finally, it is time to use capsules to construct CapsNet, and then classify and reconstruct MNIST numbers. The following is the architecture of CapsNet. A CapsNet consists of three layers, two convolution layers and a full connection layer.

MNIST digital reconstruction tasks mentioned in the paper:

The Capsule model code is used in the following papers:

  • “Dynamic Routing between Capsules “by Sara Sabour, Nickolas Frosst, Geoffrey E. Hinton.

Requirements:

  • TensorFlow (see http://www.tensorflow.org to learn how to install/upgrade)

  • NumPy (see http://www.numpy.org/)

  • GPU

Run tests to verify that the Settings are correct, for example:

Rapid MNIST test results:

  • From the following url to download and extract the MNIST recorded $DATA_DIR/:https://storage.googleapis.com/capsule_toronto/mnist_data.tar.gz

  • From the following url to download and extract the checkpoint to $CKPT_DIR:https://storage.googleapis.com/capsule_toronto/mnist_checkpoints.tar.gz MNIST model

Fast CIFAR10 ensemble test results:

  • Download and extract from the link below to $DATA_DIR/:https://www.cs.toronto.edu/~kriz/cifar.html cifar10 binary version

  • From the following url to download and extract the checkpoint to $CKPT_DIR:https://storage.googleapis.com/capsule_toronto/cifar_checkpoints.tar.gz cifar10 model

  • Pass the directory of the extracted binary as data_dir to ($data_dir)

Sample CIFAR10 Training command

Sample MNIST complete training command:

  • The same is true with validate-Validation pass training

  • To train on more than one GPU pass, num_gPUS = num_gPUS

Sample MNIST baseline training command:

Validation is tested during the training of the above model:

Precautions for continuous operation during training

  • Pass –validate=true

  • A total of 2 Gpus are required: one for training and one for verification

  • If you are training and verifying on the same machine, you need to limit the RAM consumption for each task because TensorFlow fills up all the RAM for the first task, causing the second task to fail.

  • To test/train MultiMNIST pass –num_targets = 2 and — data_DIR = $data_DIR /multitest_6shifted_mnist.tfrecords@10.

  • The code for generating multiMNIST/MNIST records is located in input_data/ MNIST /mnist_shift.py.

Example code to generate the multiMNIST test:

Build expanded_mnist for affNIST’s generalization capability: –shift = 6 –pad = 6.

The code reading affNIST will follow.

The code is maintained by Sara Sabour (Sarasra, [email protected]).


New Wisdom yuan immediately experience new wisdom yuan small program, one key direct AI big coffeeSmall program

Join the community

New Zhiyuan AI technology + industry community recruitment, welcome students interested in AI technology + industry landing, add a small assistant wechat id: AIERA2015_1 to join the group; After passing the review, we will invite you to join the group. Be sure to modify the group remarks after joining the community (name – company – position; Professional group audit is strict, please understand).

In addition, the new Intelligent AI + industry community (intelligent vehicles, machine learning, deep learning, neural networks, etc.) is recruiting engineers and researchers working in related fields.

Join the new Zhiyuan technology community to share AI+ open platform