Master the SIFT feature detection principle of image

Representation of key points

Several key points have been found and refined in the DoG space, so how are they represented?

A key point can be represented by a triple, with three values representing the location, scale, and direction of the key point. Why compute the direction of the feature points? This is to make the feature descriptor rotationally invariant

Direction definition of sample point (actually, it is the gradient of corresponding position on corresponding scale image) :

Is the scale image of the corresponding scale.

The direction of key points is determined by the sample points in the surrounding area. For example, all sample points in the area are calculated with gradient value and direction. The directions are divided into a number of bins, and then the direction histogram is calculated by weighting the surrounding sample points with Gaussian function. Then we count the sample of bins in which the direction falls, multiply the gradient value of the sample by the Gaussian weight and add it to this sample, to obtain a direction histogram of length 36. In this orientation histogram, the bins corresponding to the maximum peaks are the directions of the key points. If there are more than one peak, or bins greater than 0.8 times the maximum peak, then a number of key points are created at that point, with the same location and size

Local image descriptor

We found the key points of the image at different scales, and we also hope to depict the features around the key points to facilitate the subsequent classification or matching operations.

Local features must be inseparable from the area around the key points, will be key point near the radius of neighborhood (article selected for 16 square neighborhood) is divided into sub area (paper recommendation), in each sub area statistics of length direction histogram (paper histogram length is 8), each histogram is called a seed point, The feature descriptor of such a key point is a vector of length.

Among them, the middle point is the detected key point, the blue point represents the pixel point in the image of this scale, and the red square represents the subregion divided, and the seed point is obtained in the direction histogram of each subregion. We’re just going to give you a hint here, and we’ll talk about the exact size of the region in more detail. We also notice that the key points emit an orange arrow, which indicates the direction of the key points.

Descriptor rotation invariance

Now look at the image, the image is a rectangular area of the rule, we in the direction of statistical histogram is also have a fixed rule, the choice of the rectangular area are generally parallel to the image edge, so in order to remove the influence of rotation, can, each key will be fixed to the same direction, the direction of the same area after fixed rules detected results are very close, This avoids the effect of rotation Angle. Generally, the image can be rotated to align the directions of key points to the direction of the x axis of the image, and then the rotated image can be divided into molecular regions for statistical direction histogram. The value of the coordinate rotation

Here is the Angle between the key point direction and the X-axis direction, which is negative when rotated clockwise and positive when rotated counterclockwise. When the direction histogram statistics the value at the grid point, the increment in direction O is

Light removal

In order to remove the influence of illumination, the feature vectors generated by key points are generally normalized.

Is the variance. Description subvector threshold. In nonlinear illumination, the gradient value of some directions is too large due to the change of camera saturation, but the discrimination of directions is weak. Therefore, after normalization, 0.2 is taken to truncate the larger gradient value, and then normalization is carried out again to improve the discrimination of features.