A list,

Histogram of Oriented Gradient (HOG) is a feature descriptor for target detection in computer vision and image processing. This technique is used to count the number of directional gradients or information in the local image. This method is similar to edge direction histogram, scale invariant feature transform and shape context method. However, the difference between HOG and them is that the calculation of HOG is based on the density matrix of consistent space to improve accuracy. That is, it is calculated on a uniform size cell with dense mesh, and overlapping local contrast normalization technique is used to improve performance. HoG feature combined with SVM classifier has been widely used in image recognition, especially in pedestrian detection. This section introduces the knowledge about HOG.

1 Basic Introduction

HOG was proposed by Navneet Dalal & Bill Triggs in a paper published in CVPR in 2005 [1]. The author gives the flow chart of feature extraction, as shown in Figure 1 below:



The core idea of HOG is that the shape of the detected local object can be described by the distribution of light intensity gradient or edge direction. By dividing the whole image into small connected regions called cells, each cell generates a directional gradient histogram or the edge direction of the pixel in the cell. The combination of these histograms can represent the target descriptor of the detected target. To improve accuracy, the local histogram can be compared and standardized by calculating the light intensity of a larger area called a block in the image as measure, and then use this measure to normalize all cells in the block. This normalization process achieves better irradiation/shadow invariance. Compared with other descriptors, the descriptor obtained by HOG maintains geometric and optical transformation invariance unless the object direction changes. The relationship between Block and Cells is shown in the figure below:



Now, we present the pedestrian detection test conducted by the author, as shown in FIG. 6 below:



Where, (a) represents the average gradient of all training image sets; (b) and © respectively represent the maximum and maximum positive and negative SVM weights on each interval of the image; (d) represents a test image; (e) Test image after r-HOG calculation; (f) and (g) respectively represent r-HOG images weighted by positive and negative SVM weights.

3 Algorithm Description

HOG feature extraction method: Firstly, the image is divided into small connected areas, which we call cell units. Then the histogram of the gradient or edge of each pixel in the cell unit is collected. Finally, these histograms can be combined to form a feature descriptor. As shown in the figure below.

That is: an image: the first step, grayscale that is: the image as a X, Y, Z (grayscale) three-dimensional image; Step 2: Divide into small cells(2*2); The third step is to calculate the gradient(i.e. orientation) of each pixel in each cell. Finally, the gradient histogram (the number of different gradients) of each cell is counted to form the descriptor of each cell. The specific process of the whole algorithm consists of the following parts:

4. Color normalization and Gamma/Color Normalization

In practice, this step can be omitted in practical application, because the author carries out color and gamma normalization of images in grayscale space, RGB color space and LAB color space respectively. The experimental results show that the normalized preprocessing has no effect on the final results, and there are normalized processes in the subsequent steps, which can replace the normalization of the preprocessing. See Resources [1] for details.

5 Gradient Computation

Simply use a one-dimensional discrete differential template in one direction or on horizontal and vertical directions at the same time to deal with image, more specifically, this method needs to use the following filter core color or change violently in images data, the author also tried some other more complex template, such as 3 x 3 Sobel template, And diagonal masks, but in this pedestrian detection experiment, these complex masks all performed worse, so the simpler the template, the better, the authors conclude. The author also tried to add a Gaussian smoothing filter before using the differential template, but the addition of this Gaussian smoothing filter makes the detection effect worse, because many useful image information comes from the edges that change dramatically, and the Gaussian filtering will filter out these edges before calculating the gradient.

Creating the orientation Histograms

This step is to construct a gradient orientation histogram for each cell of the image. Each pixel in the cell votes for some direction-based histogram channel. Weighted voting is used, meaning that each vote is weighted based on the gradient magnitude of the pixel. You can use the amplitude itself or its function to express the weight, the actual test shows that: using the amplitude to express the weight can get the best effect, of course, you can also choose the amplitude function to express, such as the square root of the amplitude, the square of the amplitude, the truncated form of the amplitude, etc.. Cell units can be rectangular or star-shaped. The histogram channels are evenly distributed within the undirected (0180 degrees) or directed (0360 degrees) range.

The authors found that undirected gradients and nine histogram channels can achieve the best results in pedestrian detection tests. The gradient intensity varies widely due to variations in local illumination and foreground/background contrast. This requires normalization of the gradient strength, which the authors do by grouping the cells into large, spatially connected blocks. In this way, the HOG descriptor becomes a vector composed of histogram components of all cell units in each interval. These intervals overlap each other, which means that the output of each cell unit is applied multiple times to the final descriptor. Intervals have two main geometric shapes: rectangular intervals (R-HOG) and annular intervals (C-HOG).





R-hog: R-HOG interval is roughly some square grids, which can be characterized by three parameters: the number of cell units in each interval, the number of pixels in each cell unit, and the number of histogram channels in each cell. Experiments show that the optimal parameters for pedestrian detection are 3×3 cells/interval, 6×6 pixels/cell and 9 histogram channels. The author also found that it is necessary to add a Gaussian spatial window to each block before processing the histogram, because this can reduce the weight of the surrounding pixels of the edge. R-hog and SIFT descriptor look very similar, but their differences are: R-HOG is calculated in a single scale, in a dense grid, without sorting direction; SIFT descriptor is calculated in multi-scale, sparse image key points, sorting the direction. In addition, R-HOG is used to encode the spatial information by combining each interval, while THE DESCRIPTOR of SIFT is used separately.

C-hog: There are two different forms of C-HOG interval. The difference lies in that the central cell of one is complete, while the other is segmented. As shown in the figure above, the author finds that both forms of C-HOG achieve the same effect. The C-HOG range can be characterized by four parameters: the number of Angle boxes, the number of radius boxes, the radius of the center box, and the extension factor of the radius. Through the experiment, for pedestrian detection, the best parameters are set as follows: 4 Angle boxes, 2 radius boxes, 4 pixel radius of the center box and 2 extension factor. As mentioned above, it is necessary to add a Gaussian airspace window in the middle for R-HOG, but it is not necessary for C-HOG. C-hog looks a lot like the method for Shape Context, but the difference is: Cell units contained in c-HOG’s range have multiple orientation channels, while the shap-context-based approach only uses a single Edge presence count.



The authors of Block Normalization Schemes use four different methods to normalize the intervals and compare the results. Introducing v represents a vector that has not been normalized and contains all the histogram information for a given block. | | vitamin k | | v k order norm, said here to 1, 2 k. I’m going to use e for a very small constant. At this point, the normalization factor can be expressed as follows:



L2-norm:L1-norm: L1-SQRT: There is also a fourth normalization method: L2-hyS, which can be obtained by first carrying out l2-norm, clipping the results, and then normalizing again. The authors found that l2-HYS L2-NORM and L1-SQRT had the same effect, and L1-NORM showed slightly unreliability. However, for data not normalized, all four methods show significant improvement. The last step of THE SVM classifier is to input the extracted HOG features into the SVM classifier to find an optimal hyperplane as the decision function. The method adopted by the authors is to use the free SVMLight software package plus the HOG classifier to find pedestrians in the test images.

In summary, HoG has no rotation and scale invariance, so the calculation is small. However, each feature in SIFT needs to be described by 128-dimensional vectors, so the calculation is relatively large. In pedestrian detection, to solve scale-invariant’s problem: scaling the image at different scales is equivalent to scaling the template at different scales. For fixing Rotation-invariant: Create templates in different directions (usually 15)7) to match. In general, the image at different scales is carried out in different directions of the template (157) Matching, each point forms an 8-direction gradient description. SIFT does not need to detect pedestrians due to its huge amount of calculation, and PCA-SIFT method filters out many dimensions of information, only 20 principal components are retained, so it is only suitable for the detection of objects with little change in behavior.

Compared with other feature description methods (such as SIFT,PCA-SIFT and SURF), HOG has the following advantages: First, because HOG operates on the local grid unit of the image, it can maintain good invariance to the geometrical and optical deformation of the image, and these two deformations only appear in a larger spatial domain. Secondly, under the conditions of coarse spatial sampling, fine directional sampling and strong local optical normalization, pedestrians can be allowed to have some subtle body movements as long as they can keep upright posture in general. These subtle movements can be ignored without affecting the detection effect. Therefore, HOG feature is particularly suitable for human body detection in images.

Ii. Source code

clc clear all; close all; Load (' AR_HOG) % % divided into training set and testing set for I = 0% 1:1:12 eval ([' train_ ', num2str (I), and '=' and 'zeros (20182)', '; ']); % % train_set = zeros 20 * 120182 (4); % % test_set = zeros (6 * 120182 4); % eval ([' test_ ', num2str (I), and '=' and 'zeros (6182)', '; ']); % for j=1:1:20 % eval(['train_',num2str(i),'(:,j)','=','my_lbp(:,26*(i-1)+j)'';']); % end % for k=1:1:6 % eval(['test_',num2str(i),'(:,k)','=','my_lbp(:,26*(i-1)+k+20)'';']); Train_set = zeros(20*120,720); Test_set = zeros (6 * 120720); train_set = []; test_set = []; for i = 1:120 train_set = pic_all(:,(i-1)*26+1:(i-1)*26+20); test_set = pic_all(:,(i-1)*26+21:(i-1)*26+26); End %% label = zeros(6*120,1); for i = 1:6*120 y = test_set(:,i); Dis = SQRT (sum ((train_set - repmat (y, 1, 20 * 120)). ^ 2)); Local = find(dis == min(dis)); if mod(local,20)~= 0 label(i) = fix(local/20)+1; else label(i) = local/20; End end %% a = 1:120; A = repmat (a, 6, 1); real_label = a(:); dis_label = real_label - label; acc_rate = length(find(dis_label == 0))/(120*6); % absolute distance and Euclidean distance comparison % D=0; D1=1; % for i=1:1:120 % for j=1:1:19 % for k=j+1:1:20 % a=eval(['train_',num2str(i),'(:,j)',';']); % b=eval(['train_',num2str(i),'(:,k)',';'])'; % d=mandist(b,a); % D=D+d; % d1=dist(b,a); % D1=D1+d1; % end % end % eval(['ave_d_',num2str(i),'=','D/190'';']); % eval(['ave_d1_',num2str(i),'=','D1/190'';']); % D=0; D1=0; % end % o=0; f=0; q=0; w=0; % for i=1:1:120 % for j=1:1:6 % for k=1:1:20 % a=eval(['train_',num2str(i),'(:,k)',';']); % b=eval(['test_',num2str(i),'(:,j)',';'])'; % d=mandist(b,a); % d1=dist(b,a); % % end if d < = 1.1 * eval ([' ave_d_ ', num2str (I)]) % f = f + 1; % else % q=q+1; % % end if d1 < = 1.1 * eval ([' ave_d1_ ', num2str (I)]) % o = o + 1; % else % w=w+1; % end function featureVec= hog(img) img=double(img); step=8; %8*8 pixels as a cell [m1 n1]=size(img); Img =imresize(img,[floor(m1/step)*step,floor(n1/step)*step],'nearest'); [m n]=size(img); % 2, % gamma correction img= SQRT (img); Fy =[-1 0 1]; % define vertical template fx=fy'; % define horizontal template Iy=imfilter(img,fy,'replicate'); % vertical gradient Ix= IMfilter (img,fx,'replicate'); % horizontal Ied= SQRT (Ix.^2+Iy.^2); % gradient amplitude Iphase=Iy./Ix; The =atan(Iphase)*180/3.14159; the=atan(Iphase)*180/3.14159; Gradient Angle for I = 1: % o m for j = 1: n the if (Ix (I, j) > = 0 && Iy (I, j) > = 0) % of the first quadrant the (I, j) = the (I, j); Elseif (Ix (I, j) < = 0 && Iy (I, j) > = 0) % in the second quadrant the (I, j) = the (I, j) + 180. Elseif (Ix (I, j) < = 0 && Iy (I, j) < % = 0) in the third quadrant the (I, j) = the (I, j) + 180. Elseif (Ix (I, j) > = 0 && Iy (I, j) < % = 0) in the fourth quadrant the (I, j) = the (I, j) + 360. End if isnan(the(I,j))==1 %0/0 will get nan, if the pixel isnan, reset to 0; End end end the = the + 0.000001; 4, Divide the cell, find the histogram of the cell (1 cell =8 *8 pixel) % %step*step pixels as a cell Orient =9; % Direction histogram number of directions jiao=360/ Orient; % number of angles in each direction Cell= Cell (1,1); % For all Angle histograms, cells can be dynamically added, so I set a ii=1 first; jj=1; for i=1:step:m ii=1; for j=1:step:n Hist1(1:orient)=0; For p=1:step for q=1:step % Histogram of gradient direction Hist1(ceil(the(i+p-1,j+q-1)/jiao))=Hist1(ceil(the(i+p-1,j+q-1)/jiao))+Ied(i+p-1,j+q-1); end end Cell{ii,jj}=Hist1; % into Cell ii=ii+1; end jj=jj+1; [m n]=size(cell); [m n]=size(cell); feature=cell(1,(m-1)*(n-1)); for i=1:m-1 for j=1:n-1 block=[]; block=[Cell{i,j}(:)' Cell{i,j+1}(:)' Cell{i+1,j}(:)' Cell{i+1,j+1}(:)']; block=block./sum(block); % normalized feature{(i-1)*(n-1)+j}=block; Image HOG feature value [m n]=size(feature); l=2*2*orient; featureVec=zeros(1,n*l); for i=1:n featureVec((i-1)*l+1:i*l)=feature{i}(:); end % [m n]=size(img); % img=sqrt(img); % gamma correction % % below is to find the edge % fy=[-1 0 1]; % defines the vertical template % fx=fy'; % Iy=imfilter(img,fy,'replicate'); % vertical edge % Ix= IMfilter (img,fx,'replicate'); % level edge % Ied= SQRT (Ix.^2+Iy.^2); % edge strength % Iphase=Iy./Ix; % edge slope: inf,-inf,nan % % % % %step*step pixels as a cell % Orient =9; % Number of directions in the histogram % jiao=360/ Orient; % number of angles in each direction % Cell= Cell (1,1); % all Angle histogram,cell can be dynamically added, so first set a % ii=1; % jj=1; % for I =1:step:m % If m/step is not an integer, it is best if I =1:step:m-step % ii=1; % TMPX =Ix(I: I +step-1,j:j+step-1); % TMPX =Ix(I: I +step-1,j:j+step-1); % tmped=Ied(i:i+step-1,j:j+step-1); % tmped=tmped/sum(sum(tmped)); % tMPPhase =Iphase(I: I +step-1,j:j+step-1); % Hist=zeros(1,orient); Cell % for p=1:step % for q=1:step % if isnan(tmpphase(p,q))== 1% 0/0 will get nan, if the pixel isnan, Reset to 0 % tmpPhase (p,q)=0; % end % ang=atan(tmpphase(p,q)); % ang=mod(ang*180/ PI,360); % if TMPX (p,q)<0 % if ang<90 % if first quadrant % ang=ang+180; % end % if ang> 270% % if ang=ang-180; % end % end % ang=ang+0.0000001; % Prevent ang from being 0 % Hist(ceil(ang/jiao))=Hist(ceil(ang/jiao))+tmped(p,q); %ceil rounded up using edge strength weighted % end % endCopy the code

3. Operation results