In the HPatches paper, in order to evaluate patch-based descriptors, three evaluation tasks are proposed and corresponding evaluation indicators and calculation methods are discussed. The three evaluation tasks are as follows:

  1. Patch verification: Facial verification
  2. Image matching: image matching
  3. Patch Retrieval: indicates a field retrieval

These tasks imitate typical application scenarios of Patch, as described below

0. Accuracy and recall

Firstly, the calculation methods of accuracy and recall rate in HPatches are introduced:

Firstly, for a patch query, an ordered patch list is given, and y=(y1,… , yn) ∈ {- 1, 0, + 1} n \ mathbf {} y = (y_1, \ ldots, y_n) \ in \ {1, 0, + 1 \} ^ n y = (y1,… ,yn)∈{−1,0,+1}n This list corresponds to labels, -1 represents negative,+1 represents positive, and 0 represents ignore.

Then the accuracy and recall rate of Rank III are calculated by the following formula

Accuracy: proportion of positive samples in the first three elements


P i ( y ) = k = 1 i [ y k ] + k = 1 i y k P_i(\mathbf{y}) = \frac{\sum_{k=1}^{i}[y_k]_+}{\sum_{k=1}^{i}|y_k|}

Which [z] + = Max ⁡ {0, z} [z] _ + = \ Max \ {0, z \} [z] + = Max {0, z}

Recall rate: the proportion of all positive samples in the sequence of the first three elements


R i ( y ) = k = 1 i [ y k ] + k = 1 n [ y k ] + R_i(\mathbf{y}) = \frac{\sum_{k=1}^{i}[y_k]_+}{\sum_{k=1}^{n}[y_k]_+}

Average accuracy:

1. Patch verification

This task is used to evaluate whether the two patches are generated from the same measurement method (classify whether two patches are extracted from the same measurement), that is, to judge whether the two patches are matched.

Specifically, given a list of patch pairs P=((xi,xi ‘,yi), I =1… , N) \ mathcal {P} = ((\ mathbf {x} _i, \ mathbf {x} ‘_i, y_i), I = 1, \ ldots, P = (N) (xi, xi’, yi), I = 1,… Mathbf {x}_i, mathbf{x}’_i \in \mathbb{R}^{t \times t \times c}xi,xi ‘∈Rt×t×c While yi = + 1 y_i = \ PM yi = 1 + 1

According to this list, the confidence score of each pair sis_isi is calculated, and the list is arranged in descending order according to the matching degree, and a new list is obtained, and its label is extracted from the list. AP (yπ1,… ,yπN) AP (y_{\ PI 1}, \ldots, y_{\ PI N}) AP (yπ1,… ,yπN), where π\ PI π is a number that guarantees sπ1≥ S π1≥… P s PI Ns_ {1} \ PI \ geq s_ {1} \ PI \ geq \ ldots \ geq s_ {\ PI N} s PI 1 p s PI 1 p… P s N of PI

This is very similar to the patch pair directly used in the training. HPatchs data set produced a total of 2×1052 \times 10^52×105 positive samples and 1×1061 \times 10^61×106 negative samples

It should be noted that the matching degree is used for sorting, and there is no constraint on the matching degree calculation method, so the matching degree calculation method can be customized, and even metric learning method can be studied directly.

The paper also said that due to the unbalanced sample distribution, ROC curve was not used, and this was not very similar to the real image matching task, so there was a later evaluation task that was more suitable for the image matching task

E.g. < 1 > The images match

By the way, this task was to evaluate how well the descriptors of the field can identify the correspondences in two images.

Specifically, in this task, the descriptor of one image facet is matched with the descriptor of another image facet.

Suppose an image LkL_kLk is containing N patches, Lk=(xik, I =1… , N) L_k = (\ mathbf {x} _ {ik}, I = 1, \ ldots, N) Lk = (xik, I = 1,… Given an image pair D=(L0,L1)\mathcal{D}=(L_0, L_1)D=(L0,L1), then xi0\mathbf{x}_{i0}xi0 and xi1\mathbf{x}_{i1}xi1 are matching faces after matching. We can use this image pair to evaluate the algorithm

Specifically, given a patch xi0\mathbf{x}_{i0}xi0 of L0L_0L0 in the reference image, its matching patch in the target image L1L_1L1 is X σi0\mathbf{x}_{sigma_i0}xσi0, The matching confidence is Si ∈Rs_i \in \mathbb{R} Si ∈R, then its label can be given as follows


y i = 2 [ sigma i = ? i ] 1 y_i = 2[\sigma_i \overset{?}= i] – 1

So the correct match is +1 and the wrong match is -1. The AP can then be calculated using the above method for this sequence, with the population indicator being the mean of the average accuracy of all matches against DDD for the entire data set.

This evaluation criterion is very similar to the image matching process, which refers to each feature point in the image to find the corresponding matching point in the target image, and finally calculates the corresponding matching degree.

3. Patch Retrieval

In this task, a sequence of faces to be retrieved and a patch pool are given to test the extent to which the faces to be retrieved can be obtained from the face pool by using the face descriptor. It should be noted that the face pool may be generated from multiple images and there are many interference items.

Given a set P=(x0,(xi,yi), I =1… , N) \ mathcal {P} = (\ mathbf {x} _0, (\ mathbf {x} _i, y_i), I = 1, \ ldots, N) = P (x0, (xi, yi), I = 1,… ,N), where X0 \mathbf{x}_{0} X0 is obtained from the reference image L0L_0L0, and the rest patches are obtained from other images of the same scene Lk, K =1… ,KL_k, k=1, \ldots, KLk,k=1… X1 \mathbf{x}_{1}x1 is a correct match yi=+1y_i =+ 1yi=+1, otherwise it equals -1. Since the matched image has K scenes, there are at most K positive samples and the rest are negative samples. Therefore, if the retrieved patch is in the matched image but not the matched patch, the result will be ignored, that is, Yi =0y_i = 0yi=0.

Finally, a confidence degree is assigned to each xi\mathbf{x}_ixi, and the AP can be calculated using the above method.

HPatch gives a total of 1×1041 \times 10^41×104 patches, each patch has 5 positive samples, and 2×1042 \times 10^42×104 interference samples.

4. Final results