A list,

SIFT (scale invariant feature Transform) is a description used in image processing field. This description is a local feature descriptor with scale invariance and key points can be detected in the image. 1 SIFT algorithm features: (1) has good stability and invariance, can adapt to rotation, scale scaling, brightness changes, to a certain extent, not by the change of perspective, affine transformation, noise interference. (2) the distinction between sex good, can fast and exact distinction in mass characteristic database information matching (3) the quantity more, even if only a single object, can also produce a large number of feature vector (4) recommend suite, quickly eigenvector matching (5) the scalability, can be combined with other forms of feature vector

2 SIFT algorithm essence

Key points are searched in different scale Spaces and the directions of key points are calculated.



3 SIFT algorithm to achieve feature matching mainly has the following three processes:

(1) Extraction of key points: Key points are some very prominent points that will not disappear due to lighting, scale, rotation and other factors, such as corner points, edge points, bright spots in dark areas and dark spots in bright areas. This step is to search for image positions on all scale Spaces. Potential points of interest with invariant scale and rotation are identified by gaussian differential functions.

(2) Location of key points and determination of feature direction: In each candidate location, a fine fitting model is used to determine the location and scale. Key points are chosen based on their stability. Then, one or more directions are assigned to each key point based on the local gradient direction of the image. All subsequent operations on the image data transform relative to the direction, scale, and position of the key points, thus providing invariance to these transformations.

(3) Through the feature vector of each key point, pairwise comparison is made to find several pairs of feature points matching each other and establish the corresponding relationship between scenes.

4-scale space

(1) Concept

Scale space is a concept and method that attempts to simulate human eyes observing objects in the image field. For example, when looking at a tree, the key is whether we want to look at the leaves or the whole tree: if we look at the whole tree (equivalent to looking at a large scale), we should strip out the details of the image. If it’s a leaf (at a small scale), look for local details.

SIFT algorithm adopts Gaussian kernel function to filter in the construction of scale space, so that the original image saves the most detail features. After Gaussian filtering, the detail features are gradually reduced to simulate the feature representation at large scale.

There are two main reasons for using gaussian kernel function for filtering:

A Gaussian kernel is the only scale-invariant kernel.

B DoG kernel function can be approximated as LoG function, which makes feature extraction easier. Meanwhile, David Lowe proposed in his paper that filtering the original image after two-fold up-sampling can retain more information for subsequent feature extraction and matching. In fact, the image generation of scale space is the image generated after the convolution operation of the current image with different scale kernel parameter σ.

(2) represents

L(x, y, σ) is defined as the convolution operation of the original image I(x, y) with a variable scale 2-dimensional Gaussian function G(x, y, σ).



The construction of the Gauss Pyramid

(1) Concept

The scale space is represented by gauss pyramid in implementation, and the construction of Gauss pyramid is divided into two steps:

A Perform Gaussian smoothing on the image;

B Downsampling the image.



The pyramid model of image is a tower model composed of a series of images of different sizes, from large to small, from bottom to top, by continuously reducing the sampling of the original image. The original image is the first layer of the golden tower, and the new image obtained by each descending sampling is one layer of the pyramid (one image for each layer), and each pyramid has N layers. In order to show the continuity of scale, gaussian filtering is applied to gaussian pyramid based on simple downsampling. As shown in the figure above, an image of each layer of the image pyramid is gaussian blurred with different parameters, Octave represents the number of image groups that can be produced by an image, and Interval represents the number of image layers included in a group of images. In addition, when downsampling, the initial image (bottom image) of a set of images on the Gaussian pyramid is obtained by intersampling the third to last image of the previous set of images.

(2) represents

Gaussian image pyramid with O group, S layer, then



6 DOG space extremum detection

(1) DOG function



(2) DoG Gaussian difference pyramid

A corresponds to the DOG operator, and the DOG pyramid needs to be constructed.

The change of pixel value on the image can be seen through gaussian difference image. (If there is no change, there is no character. Features must vary as many points as possible. The DOG image depicts the outline of the target.



B DOG local extremum detection

The characteristic points are made up of local extreme points in the DOG space. To find the extreme point of the DoG function, each pixel is compared with all its neighbors to see if it is larger or smaller than its neighbors in the image and scale domains. The characteristic points are made up of local extreme points in the DOG space. To find the extreme point of the DoG function, each pixel is compared with all its neighbors to see if it is larger or smaller than its neighbors in the image and scale domains. As shown in the figure below, the middle detection point is compared with its 8 adjacent points at the same scale and 9×2 corresponding points at the upper and lower adjacent scales, a total of 26 points, to ensure that extreme points are detected in both scale space and two-dimensional image space.



B removing edge effect

In the direction of edge gradient, the main curvature value is larger, while in the direction of edge, the main curvature value is smaller. The principal curvature of the DoG function D(x) of the candidate feature points is proportional to the eigenvalues of the 2×2Hessian matrix H.





7 Key points are assigned

(1) To find the extreme point through scale invariance, it is necessary to use the local features of the image to assign a reference direction to each key point, so that the descriptor has invariance to the image rotation. For the key points detected in the DOG pyramid, the gradient and direction distribution characteristics of the pixels in the 3σ neighborhood window of the Gaussian pyramid image are collected. The modulus and direction of the gradient are as follows:



(2) This algorithm adopts the gradient histogram statistics method, which takes the key points as the origin and determines the direction of the key points by the image pixels in a certain area. After the gradient calculation of key points is completed, the histogram is used to calculate the gradient and direction of pixels in the neighborhood. The gradient histogram divides the orientation range from 0 to 360 degrees into 36 columns of 10 degrees each. As shown in the figure below, the peak direction of the histogram represents the main direction of the key point, and the peak value of the direction histogram represents the direction of the neighborhood gradient at the feature point. The maximum value in the histogram is taken as the main direction of the key point. In order to enhance the robustness of matching, only the direction whose peak value is more than 80% of the main direction is reserved as the secondary direction of this key point.



8 Key Points

For each key point, there is location, scale, and direction. Create a descriptor for each key point, and use a set of vectors to describe the key point, so that it does not change with various changes, such as lighting changes, Angle of view changes, etc. This descriptor not only includes the key points, but also the pixels around the key points that contribute to it, and the descriptor should have a high uniqueness in order to improve the probability of correct matching of feature points.



Lowe’s experiment results showed that the descriptor was characterized by 4×4×8 = 128 dimension vector, and the comprehensive effect was the best (invariance and uniqueness).

9 Key Point Matching

(1) Template image (reference image, reference image) and real-time image (observation image,

Observation image) to establish a key point description sub-set. The target recognition is accomplished by comparing the key point descriptors in the two-point set. Euclidean distance is used to measure the similarity of key point descriptors with 128 dimensions.

(3) The matching can be completed by exhaustive method, but it takes too much time. Therefore, the data structure of KD tree is generally used to complete the search. The content of the search is based on the key points of the target image to search the nearest feature point of the original image and the next adjacent feature point of the original image.

The Kd tree, as shown below, is a balanced binary tree



10 summary

SIFT features with stability and invariability, in the field of image processing and computer vision has a very important role, its itself is very complex, because the contact SIFT is not for a long time, the relevant knowledge of which is also very inadequate, after multiple reference, write this article, the content is not detailed, hope more forgive me. The following is a rough summary of the SIFT algorithm.

(1) Extreme value detection of DoG scale space.

(2) Delete unstable extremum points.

(3) Determine the main direction of feature points

(4) Generate descriptors of feature points for key point matching.

Ii. Source code

close all;
clear;
clc;
[filename pathname]=uigetfile('*.jpg'.'Open File');
Image=imread([pathname filename]);
image(Image);
[imshowage,flag,Im1,Im2,Im3,Im1_1,Im2_1,Im3_1]=extraction(Image);

if imshowage==1
    if(flag==1)     
    figure,imshow(Im1_1);
    end
    if(flag==2)
    figure, imshow(Im1_1);
    figure, imshow(Im2_1);
    end 
    if(flag==3)
    figure,imshow(Im1_1); 
    figure,imshow(Im2_1);
    figure,imshow(Im3_1);  
   end
   % [image, descriptors, locs] = sift(imageFile)
%
% This function reads an image and returns its SIFT keypoints.
%   Input parameters:
%     imageFile: the file name for the image.
%
%   Returned:
%     image: the image array in double format
%     descriptors: a K-by- 128. matrix, where each row gives an invariant
%         descriptor for one of the K keypoints.  The descriptor is a vector
%         of 128 values normalized to unit length.
%     locs: K-by4 - matrix, in which each row has the 4 values for a
%         keypoint location (row, column, scale, orientation).  The 
%         orientation is in the range [-PI, PI] radians.
%
% Credits: Thanks for initial version of this program to D. Alvaro and 
%          J.J. Guerrero, Universidad de Zaragoza (modified by D. Lowe)

function [image, descriptors, locs] = sift(image)

% Load image
% image = imread(imageFile);

% If you have the Image Processing Toolbox, you can uncomment the following
%   lines to allow input of color images, which will be converted to grayscale.
if size(image,3)>1
   image = rgb2gray(image);
end

[rows, cols] = size(image); 

% Convert into PGM imagefile, readable by "keypoints" executable
f = fopen('tmp.pgm'.'w');
if f == - 1
    error('Could not create file tmp.pgm.');
end
fprintf(f, 'P5\n%d\n%d\n255\n', cols, rows);
fwrite(f, image'.'uint8');
fclose(f);

% Call keypoints executable
if isunix
    command = '! ./sift ';
else
    command = '! siftWin32 ';
end
command = [command ' <tmp.pgm >tmp.key'];
eval(command);

% Open tmp.key and check its header
g = fopen('tmp.key'.'r');
if g == - 1
    error('Could not open file tmp.key.');
end
[header, count] = fscanf(g, '%d %d'[1 2]);
if count ~= 2
    error('Invalid keypoint file beginning.');
end
num = header(1);
len = header(2);
if len ~= 128
    error('Keypoint descriptor length invalid (should be 128).');
end

% Creates the two output matrices (use known size for efficiency)
locs = double(zeros(num, 4));
descriptors = double(zeros(num, 128));

% Parse tmp.key
for i = 1:num
    [vector, count] = fscanf(g, '%f %f %f %f'[1 4]); %row col scale ori
    if count ~= 4
        error('Invalid keypoint file format');
    end
    locs(i, :) = vector(1, :);
    
    [descrip, count] = fscanf(g, '%d'[1 len]);
    if (count ~= 128)
        error('Invalid keypoint file value.');
    end
Copy the code

3. Operation results

Fourth, note

Version: 2014 a

Complete code or ghostwrite plus 1564658423