The code of this article can be obtained by replying to “Detect image similarity” in the background of wechat public account “01 binary”.

preface

Recently, I have been working on a massive picture retrieval project, which can be simply understood as “search for pictures by pictures”. This function was popular with search engines at the beginning, but later became very practical in the field of e-commerce. Before making this image retrieval project, I searched some information, and now that the project is nearing the end, I would like to do some simple sharing here. This article begins with the basics of image retrieval — using Python to detect image similarity.

When it comes to detecting the similarity of “so-and-so”, I believe that many people’s first idea is to construct two vectors for the things to be compared, and then compare the distance between the two vectors with cosine similarity. This method is widely used, such as comparing the similarity of two users’ interests and comparing the similarity between two texts. However, this method is not used much when comparing image similarity, for reasons I’ll discuss later, but here are two other concepts — image fingerprint and Hamming distance.

Fingerprint image

Image fingerprint and human fingerprint, is a symbol of identity, and image fingerprint is simply speaking, ** is the image in accordance with a certain hash algorithm, after the operation of a set of binary numbers. ** As shown below:

Given an input image, we can use a hash function and calculate its “image hash” value based on the visual appearance of the image. Similar heads should have similar hash values. The Algorithm for building image fingerprints is known as Perceptual Hash Algorithm.

Hamming distance

From the above description of the image fingerprint, we know that the perceptual hash algorithm can be used to transform the picture into a certain string, and there is a representation method named Hamming distance for comparing strings. The following definition is taken from Wikipedia:

In information theory, the Hamming distance between two strings of equal length is the number of different characters in the corresponding positions of the two strings. In other words, it is the number of characters that need to be replaced to transform one string into another.

The hamming distance is usually used to measure the difference between two images. The smaller the Hamming distance is, the higher the similarity is. The Hamming distance is 0, which means that the two pictures are exactly the same.

Perceptual hashing algorithm

There are three commonly used perceptual hashing algorithms, which are average hashing algorithm (aHash), perceptual hashing algorithm (pHash) and difference value hashing algorithm (dHash). As for the introduction and comparison of these three hash algorithms, many blogs have written, and many libraries support direct calculation of the hash value, call the relevant function can be. Without further ado, I recommend an article 👉 “Hash Algorithms in Image Similarity”

The code can be obtained by replying to “Detect image similarity” in the background of wechat public account “01 binary”

The code of the three hash algorithms is as follows:

ahash

dhash

phash

Alternatively, you can install the ImageHash library and call the hash function to perform the calculation.

The idea of comparing the similarity of two pictures

So this gives us a simple idea for comparing the similarity of two images, just get the image fingerprint of the image through the perceptual hash algorithm, and then compare the Hamming distance between the two hashes.

Detailed steps, Ruan Yifeng introduced a simple picture search principle, can be divided into the following steps:

  1. Reduce the size. Reduce the image to 8×8, 64 pixels in total. The purpose of this step is to remove the details of the image and only retain the basic information such as structure, light and shade, and eliminate the differences caused by different sizes and proportions.
  2. Simplify colors. Reduce the image to 64 levels of gray. That means there are only 64 colors in all the pixels.
  3. Calculate the average. Calculate the gray mean of all 64 pixels.
  4. Compare the grayscale of pixels. The gray level of each pixel is compared with the average. Greater than or equal to the average, 1; It’s less than the average. Let’s call that 0.
  5. Compute the hash value. The results of the previous step are combined to form a 64-bit integer, which is the fingerprint of this image. The order of composition is not important, as long as all images are in the same order.

This method is effective for finding identical images, but the search for “similar images” is poor and cannot be searched locally, so it is usually applied to “detect infringement of images”. At present, Google map recognition and Baidu map recognition almost all use the way of deep learning to carry out similarity retrieval, which will be introduced in the next article.

Why is cosine similarity unsuitable for image similarity detection

Finally, let’s talk about why cosine similarity is not used to detect image similarity. We said at the beginning that if we want cosine similarity to measure similarity, we need to construct two vectors. Normally we will image into a pixel vector (frequency) based on pixel gray value, thus the two image similarity is calculated, it is actually calculated two image histogram similarity, however this is only the frequency information of the pixels, lost pixel location information, information loss is too big, applies only in some scenarios. The code using cosine similarity to represent image similarity can also be obtained by replying to “Detect image similarity” in the background of wechat public account “01 binary”.

conclusion

The methods introduced in this paper are all non-deep learning methods to detect image similarity. Although they are easy to understand, each method has its limitations. If you want to make an image retrieval system, the first step is to compare the similarity of images, but now most of them are to extract image features through deep learning method, and then compare, the accuracy is greatly improved. Later, I will talk about how to compare image similarity by extracting image features through deep learning.

Due to the limited ability, there will be some mistakes in the process of sorting out the description. If there are suggestions, you can criticize and correct 🙏 in the message area