Affine transformation and bilinear interpolation in Spatial Transformer Networks

When bilinear interpolation performs affine transformation on the image, there will be a problem. When the coordinates of a point in the original image are mapped to the transformed image, the coordinates may appear decimals (as shown in the figure), but we know that the position coordinates of a pixel on the image can only be integers. What should we do then? This is where bilinear interpolation comes in.



The basic idea of bilinear interpolation is to estimate the gray value of a point by the gray value of four points around it, as shown in the figure.

 

Schematic diagram of bilinear interpolation principle

In the implementation, we usually map all the positions on the transformed image to the original image for calculation (which is much more convenient than forward calculation), that is, traverse all the pixels on the transformed image successively, calculate the coordinates mapped to the original image according to the affine transformation matrix (decimal may appear), and then use bilinear interpolation. The value of the point is obtained by weighted average of the values of the four positions around the point.

 

Two, transfer: bilinear interpolation algorithm and matters needing attention

Recently, bilinear interpolation algorithm was used in programming to scale the image. There are a lot of information about this on the net, the introduction is also clear. However, these articles only introduced the algorithm, and did not specifically say how to achieve and how to achieve the best, for example, you can according to the algorithm of online articles to write a bilinear interpolation program, with it to process a picture, and then use Matlab or openCV resize function of the same picture processing, The results are different, and the difference is greater if the source image is smaller. The following is an explanation of bilinear interpolation and the above phenomena:

1. Bilinear interpolation

Assume that the source image size is MXN and the target image is AXB. Then the side length ratios of the two images are respectively m/ A and N /b. Note that this ratio is usually not an integer, so you use floating-point when programming storage. The (I, J) pixel of the target image (row I and column J) can correspond to the source image by side length ratio. Its corresponding coordinates are (I *m/ A,j*n/b).

Obviously, this coordinate is not an integer in general, and non-integer coordinates cannot be used on discrete data such as images. Bilinear interpolation calculates the value (gray value or RGB value) of the point by finding the four pixels closest to the corresponding coordinate. If your corresponding coordinates are (2.5,4.5), then the nearest four pixels are (2,4), (2,5), (3,4), (3,5).

If the image is a gray image, the gray value of point (I, j) can be calculated by the following formula:

f(i,j)=w1*p1+w2*p2+w3*p3+w4*p4;

Where, PI (I =1,2,3,4) is the nearest four pixels, wi(I =1,2,3,4) is the corresponding weight of each point. The calculation of weights is clearly written on Wikipedia and Baidu Encyclopedia.

2. Existing problems

The premise of this section is that you already know what bilinear interpolation is and that you can use a pen to calculate the value of a pixel in the target image given the size of the source image and the target image. Of course, the best case scenario is that you have implemented a bilinear interpolation algorithm in some language that has been originally or reprinted by a bunch of blogs on the web, and then found that the result is completely different from the result of matlab and openCV’s corresponding resize () function.

So what’s going on here?

And the answer is very simple, is the choice of coordinate system, or the corresponding problem between the source image and the target image.

According to some online blogs, the origin of the source image and the target image (0,0) are both selected in the upper left corner, and then each pixel of the target image is calculated according to the interpolation formula. Suppose that you need to reduce a 5×5 image to 3×3, then the corresponding relationship between each pixel of the source image and the target image is as follows:

It can be clearly seen from the figure that if the upper right corner is selected as the origin (0,0), the right-most and bottommost pixels do not actually participate in the calculation, and the gray value calculated for each pixel of the target image is also higher and left than that of the source image.

So, how about I add 1 or I choose the lower right corner as the origin? Unfortunately, it’s going to be the same thing, but this time it’s going to be tilted down to the right.

The best method is that the geometric centers of the two images coincide, and each pixel of the target image is equally spaced, and there is a certain margin with both sides, which is also the practice of MATLAB and openCV. The diagram below:

If you don’t understand what I’m saying above, it doesn’t matter, just change it to the following formula when calculating the corresponding coordinates,

Int x = (I + 0.5) * m/a – 0.5

Int y = (j) + 0.5 * n/b – 0.5

instead of 

int x=i*m/a

int y=j*n/b

The correct result of bilinear interpolation can be obtained by using the above formula

Conclusion:

To sum up, here are some of the lessons I’ve learned.

1. Sometimes some information on the Internet is not reliable, so I still need to do more experiments.

2. Do not underestimate some simple, basic algorithms, let you write you may not write, and there may be some mystery hidden.

3 to more hands-on programming, more experience algorithm, read the source code written by Daniel (although sometimes very difficult, but to insist on seeing).

 

3. Transshipment: optimization of bilinear interpolation algorithm for image processing boundary

Bilinear interpolation algorithm is used frequently in image processing, such as in image scaling, and in all distortion algorithms, it can be used to improve the visual effect of processing. First, let’s look at the introduction of the algorithm.

In mathematics, bilinear interpolation algorithm can be regarded as an extension of linear interpolation between two variables. The key idea is to perform linear interpolation in one direction and then in the other. The following diagram gives a general idea of the process.

A simple mathematical expression can be expressed as follows:

F (x, y) = f (0, 0) (1 -) x (1 – y) + f (1, 0) x (1 – y) + f (0, 1) (1 -) x + y f (1, 1), y

Merger, the item can be written as: (x, y) = f (f (0, 0) (1 – x) + f (1, 0) in x) (1 – y) + (f (0, 1) (1 – x) + f (1, 1)) x y

As can be seen from the above equation, this process involves a large number of floating point operations, which is a time-consuming process for a large computing user like an image.

Considering the particularity of the image, the calculation result of its pixel value needs to fall between 0 and 255, and there are only 256 results at most. It can be seen from the above formula that, in general, the calculated F (x,y) is a floating point number, which needs to be rounded. Therefore, we can consider magnifying all variables similar to 1-x and 1-y in the process to appropriate multiples to get the corresponding integer, and finally divide by an appropriate integer as the interpolation result.

How to choose the appropriate magnification factor should be considered from three aspects. First, the accuracy problem. If this number is too small, it may lead to a large error in the result after calculation. Second, the number must not be so large that it would cause the calculation to go beyond what the length shaping can express. Third: speed considerations. If the magnification is 12, the final result should be divided by 12*12=144, but if it is 16, the final divisor is 16*16=256. This is a good number and we can do it by right shift, which is much faster than regular divisible.

Taking all three into consideration, 2048 is a good number to choose.

Let’s assume that an algorithm has obtained the coordinates we want to sample, PosX and PosY, where PosX=25.489 and PosY=58.698. A similar code snippet for this process is as follows:

1 NewX= Int(PosX) ’rounded down,NewX=25 2 NewY= Int(PosY)’ rounded down,NewY=58 3 PartX = (posx-newx) * 2048 (PosY – NewY) * 2048 ‘corresponding expression of Y = 2048 – PartX’ 5 InvX corresponding expression of 1-6 InvY X = 2048 – PartY ‘corresponding expression of 1-7 8 Index1 = Y 9 Index2 = Index1 + SamStride ’10 ImageData(Speed + 2) = ((Sample(Index1 + 2) * InvX + Sample(Index1 + 5) * PartX) * InvY + (Sample(Index2 + 2) * InvX +

Sample(Index2 + 5) * PartX) * PartY) \ 4194304

11 ImageData(Speed + 1) = ((Sample(Index1 + 1) * InvX + Sample(Index1 + 4) * PartX) * InvY + (Sample(Index2 + 1) * InvX +

ImageData(Speed) = ((Sample(Index1) * InvX + Sample(Index1 +)) ImageData(Speed) = ((Sample(Index1) * InvX + Sample(Index1 +) 3) * PartX) * InvY + (Sample(Index2) * InvX +

Sample(Index2 + 3) * PartX) * PartY) \ 4194304

All variables involved in the above code are integers (PosX and PosY are floating point, of course).

The Sample array in the code stores the image data sampled from it, and SamStride is the scan line size of the image.

If you look at the code above, all but two of the sentences refer to floating-point calculations.

In Basic, if all advanced optimizations were checked at compile time, \ 4194304 would compile as >>22, which is 22 bits to the right, or as >>22 if you were using C.

It should be noted that before this code generation, it is necessary to ensure that PosX and PosY are within a reasonable range, that is, cannot exceed the width and height of the sampled image.

This is at least 100% faster than using floating-point types directly, and the result is essentially the same.

Sharing a using the way of dealing with the image zooming program: files.cnblogs.com/Imageshop/V…