Is padding important in deep learning models?

This article from the public CV technical guide technical summary series

Click a concern, focus on computer vision technology summary, the latest technology tracking, classic paper interpretation.

preface

This paper introduces two experiments to demonstrate the influence of padding on deep learning models.

The experiment of a

Convolution is translational: shift the input image by 1 pixel and the output image by 1 pixel (see Figure 1). If we apply global averaging pooling to the output (that is, summing over all pixel values), we get a translation invariant model: no matter how we translate the input image, the output will remain the same.

In PyTorch, the model looks like this: y = torch. Sum (conv(x), Dim =(2, 3)) input x and output y.

Figure 1: Top: Input image with one white pixel (original and 1 pixel shifted version). Chinese: convolution kernel. Bottom: Output image and its pixel sum.

Can you use this model to detect the absolute position of pixels in an image?

For translation-invariant models like the one described, it should be impossible.

Let’s train this model to classify images containing a single white pixel: if the pixel is in the upper left corner, it should print 1, otherwise it should print 0. The training converges quickly, and testing the binary classifier on some images shows that it can detect pixel positions perfectly (see Figure 2).

Figure 2: Top: Input image and classification results. Bottom: Output image and pixel sum.

How does the model learn to classify absolute pixel positions? This is only possible because of the type of padding we used:

Figure 3 shows the convolution kernel after some EPOCH training
When filling with “same” (used in many models), the core center is moved across all image pixels (implicitly assuming the pixel value outside the image is 0)
This means that the right and bottom rows of the kernel never “touch” the upper left pixel in the image (otherwise the kernel center would have to move out of the image)
However, when moving across the image, the right column and/or bottom row of the kernel touches all other pixels
Our model takes advantage of differences in pixel processing
Only positive (yellow) kernel values are applied to the upper left white pixel, thus yielding only positive values, which give positive values
For all other pixel positions, strong negative kernel values (blue, green) are also applied, which gives a negative sum

FIG. 3:3×3 convolution kernel.

Although the model should be translation-invariant, it is not. The problem occurs near the image boundary caused by the type of fill used.

Experiment 2

Does the input pixel affect the output depending on its absolute position?

Let’s try again with a black image with a single white pixel. The image is fed into a neural network consisting of a convolutional layer (all kernel weights are set to 1 and offset items are set to 0). The influence of input pixels is measured by summing the pixel values of the output image. The “valid” fill means that the full kernel remains within the bounds of the input image, while the “same” fill is defined.

Figure 4 shows the impact of each input pixel. For the “valid” fill, the result is as follows:

There is only one location where the kernel touches the corner of the image, and the corner pixel value of 1 reflects this
For each edge pixel, the 3×3 kernel touches that pixel in three locations
For a normally positioned pixel, there are nine core positions, and the pixel touches the core

Figure 4: Applying a single convolution layer to a 10×10 image. Left: “Same” fill. Right: Valid.

Pixels near the boundary have much less influence on the output than the central pixel, which may cause the model to fail when the relevant image details are near the boundary. For “same same” padding, the effect is less severe, but there are fewer “paths” from the input pixels to the output.

The final experiment (see Figure 5) shows when starting with a 28×28 input image (for example, from a MNIST dataset) and feeding it into a neural network with five convolutional layers (for example, a simple MNIST classifier might look like this). In particular, for the “valid” population, there are now large image regions that the model almost completely ignores.

Figure 5: Five convolution layers are applied to a 28×28 image. Left: “Same” fill. Right: Valid.

conclusion

These two experiments show that the selection of padding is important, and some poor choices may lead to poor model performance. For more details, see the following paper, which also suggests a solution for how to solve the problem:

1. MIND THE PAD — CNNS CAN DEVELOP BLIND SPOTS

2. On Translation Invariance in CNNs: Convolutional Layers can Exploit Absolute Spatial Location

By Harald Scheidl

Compilation: CV technical Guide

Original link: harald-scheidl.medium.com/does-paddin…

Welcome to pay attention to the public number CV technical guide, focus on computer vision technology summary, the latest technology tracking, classic paper interpretation.

Reply keyword “technical summary” in the public account to obtain the summary PDF of the original technical summary article of the public account.

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Is padding important in deep learning models?

The experiment of a

Experiment 2

conclusion

Other articles

Is padding important in deep learning models?

The experiment of a

Experiment 2

conclusion

Other articles

Related Posts

Is IPFS mining recognized by the country? Is IPFS mining a legitimate project?

Teach yourself Python: Data Types, Part 3

print(‘Hello World! ‘)