Introduction to the

Face recognition is a type of biometrics technology that can be used to confirm a user’s identity. Face recognition technology has great advantages over traditional identity recognition technology, mainly reflected in the convenience. Traditional authentication methods, such as passwords, PIN codes, RADIO frequency cards, passwords, and fingerprints, require users to remember complex passwords or carry identity authentication keys. And password, card are lost risk of disclosure, compared with face recognition, interaction and security are not high enough. Face recognition can use the camera remote contactless recognition, compared with the fingerprint without the operation of pressing the finger in the recognition area, can be automatically recognized by the camera.

At present, face recognition technology has been widely used in security, monitoring, general identification, attendance, search and rescue of lost children and other fields, for improving the efficiency of identity authentication has played an important role. Moreover, more in-depth face recognition research is currently underway, including gender recognition, age estimation, mood estimation, etc. Higher level and higher accuracy of face recognition technology for urban security and contactless identity authentication has a huge role. Face recognition problems can be divided into two categories: face verification and face recognition. Face verification is usually a one-to-one comparison to determine whether two images are of the same person. Face recognition is usually a one-to-many comparison to determine whether the person in the photo is a certain person in the database.

Face recognition is affected by a variety of factors, mainly divided into basic factors, internal factors and external factors. The basic factor is the face itself is similar, people’s facial features, outline roughly the same; Intrinsic factors are people’s internal attributes, such as age change, mental state, makeup, etc. External factors are the quality of the image, such as the sharpness of the photo, the presence of glasses, masks, etc. For humans, it’s easy to recognize a person. For computers, images are represented by a multi-dimensional matrix of numbers, which makes the task more difficult.

The earliest face recognition is semi-automatic face recognition, by manual annotation of face feature points, computer face matching according to the relative position of feature points.

Face recognition research from 1965 to 1990 was mainly based on the method of geometric structural features and template matching, using geometric features to extract the positions of important feature points such as eyes, mouth and nose, as well as the geometric intuitive shapes of important organs such as eyes as classification features, and then calculate the mutual positions and distances between feature points. It measures how similar two images of human faces are.

From 1991 to 1997, there were many methods based on the whole, including principal component analysis (PCA), linear discriminant analysis (LDA) and so on. These methods reduce face dimensions by finding a set of projection vectors, and then send low-dimensional features into machine learning classifiers such as SVM for face classification.

From 1998 to 2013, many face recognition methods with the aid of depth cameras, structured light, infrared cameras and other equipment appeared, making the accuracy of face recognition greatly improved. At the same time, there are early feature-based classification methods, which extract local features in different positions of the face, and the results are often more robust than the overall method. Similarly, HOG, LBP, SIFI and SURF features are extracted from image blocks, and the vector series of local features of each module is used as the representation of face. There are also comprehensive methods. First, feature-based methods are used to obtain local features, and then subspace methods (such as PCA and LDA) are used to obtain low-dimensional features, which are based on global and local feature-based methods. GaussianFace achieved the best accuracy of 98.52% on LFW, nearly matching many of the later deep learning methods.

After 2006, deep learning began to gain attention from researchers and was published in an increasing number of international journals. Since then, deep learning has been widely applied in various target detection fields. In 2015, The average accuracy of FaceNet of Google team on LFW data set reached 99.63%. The accuracy of face recognition based on deep learning has been higher than that of human beings, and deep learning has basically occupied a dominant position in the field of face recognition.

Face recognition common process

Most Face recognition processes include Face Detection, Face Alignment, Face Representation, and Face Matching. As shown below:

  1. Face Detection

The face region is detected from the input image and the coordinates of the face bounding box are returned.

  1. Face Alignment Face Alignment

The face feature points are detected from the face region, and the face is normalized based on the feature points, so that the scale and Angle of the face region are consistent, which is convenient for feature extraction and face matching. The ultimate goal of face alignment is to locate the exact shape of a face in a known face square, which is mainly divided into two categories: optimization based methods and regression based methods. The face alignment algorithm based on regression tree here is the face feature point recognition method published by Vahid Kazemi and Josephine Sullivan in CVPR2014. It is a face alignment method based on regression tree. In this method, a cascade residual regression tree (GBDT) is built to allow the face to step back from its current shape to its real shape.

  1. A human Face represents Face Representation

Feature extraction is carried out from the normalized face region to obtain feature vectors. For example, some deep neural network methods use 128 features to represent faces. The optimal situation is that the feature vectors extracted from different photos of different people are different, while similar feature vectors can be extracted from different photos of the same person.

  1. Face matching Face Mataching

The feature vectors calculated by the two images were compared to obtain the similarity score of the two images. According to the similarity score, those with high scores can be judged as the same person, while those with low scores can be judged as different people.

The basic idea of face representation

The main idea of deep learning face recognition is that different faces are composed of different features. In simple terms, features can include eyelids, nose, eyes, skin color, and hair color, as shown in the table. Then 5 features can describe 25 kinds of human face, that is (feature 1, feature 2, feature 3, feature 4, feature 5) can represent a kind of human face, such as (1,0,0,1,0) can represent a double eyelid, low nose bridge, black eyeball, yellow skin color, black hair color.

The serial number Characteristics of the 0 1
1 eyelid Single-edged eyelid Double eyelid
2 nose Lower the bridge of the nose High bridge of the nose
3 Eye colors black brown
4 Color of skin yellow white
5 Hair color black s.

For the species features of the table each feature has two kinds of performance, a total of 32 kinds of appearance can be expressed to do face recognition is not enough, so we can increase the number of features, such as more features to represent the face, add features 6 face type, 7 lips, etc.; At the same time, you can increase the number of specific manifestations of a feature, such as feature 3, 0 for black, 0.1 for black with a little blue, 0.2 for yellow, 0.25 for brown and so on. Therefore, when the number of features reaches 1024 or higher in practical applications, the eigenvalues are taken as consecutive decimals. After expansion, a human face might be represented as (0.3, 2,1.5, 1.75…). , basically represents an infinite number of human faces.

In practice, these features are not manually set, but learned by the deep neural network in the training process and stored in the parameters of each node in the deep neural network. A deep neural network model is composed of the structure of the network and parameters of each node.

As shown in the figure, there is a 128-dimension feature extraction network. The features of the three photos of Tomohisa Yamashita extracted by neural network are very close in the 128-dimension space, while the results of Rimi Ishihara’s photos processed by neural network are far from those of Tomohisa Yamashita. That is, the features extracted from different photos of the same person are close in distance in feature space, while the photos of different faces are far away in feature space.

Example of project implementation

Referring to the above ideas, I have implemented a simple face recognition program with the address in face_IDENTIFICATION, as shown in the picture below. This project basically copied dlib.net/dnn_face_re… , there are only some small changes. ResNet34 was used as face recognition Network in the method of DLIB. The details of the Residual Network refer to the work of Deep Residual Network at 2015 by He Keming et al.

[Img-bzBs4ASX-1573562657071] [Img-bzBs4ASX-1573562657071]

Design ideas

  1. Interface: This software uses Qt as the interface software design, in order to fast coding, the use of Qt Example Camera sample project, embedded face recognition in it. Rewrote the canvas to call the face recognition code at desired intervals.
  2. Multithreading: according to the different camera resolution, a face recognition process takes 0.2-0.4s, if the use of single thread development, will lead to face recognition when the camera picture stuck. Therefore, the use of Qt multithreading support, the face recognition process in other threads, UI thread and face recognition thread using Qt signal and slot mechanism communication.
  3. Face detection: Use frontal_face_detector in DLIB to detect the face area in the screen.
  4. Human face Landmark: Dlib’s shape_predictor_5_face_landmarks. Dat model was used to detect five feature points of eyes, nose and mouth, which were used to adjust image size and face Angle, normalized to a resolution of 150×150 for feature extraction network.
  5. Feature extraction: ResNet34 network is used for slight adjustment, network input 150×150 image, output 128 feature values.
  6. Identification – Database establishment: CSV file is used to store the identity list of known people (including an identity picture). First, the image as the original data is extracted through feature extraction to generate a matrix with size [image number, 128]. FLANN is used to build indexes for this data.
  7. Recognition – Search database: the 128 feature vectors obtained after feature extraction of the face recognized by the camera are searched for the nearest point in the FLANN index, and the distance between them is calculated. If the distance is within the threshold range, it is judged to be the same user.

Environment depends on

Qt and some major dependent libraries were used in the actual project, but for the convenience of project management, I directly set references to external libraries in the project libs.pri, and mainly used the following external libraries.

Dlib 19.17
opencv 3.4
flann 1.9.1
Copy the code

In other words, if you need to make changes to my code base, you need to configure the libraries first, then change the link addresses of the libraries in libs.pri files, and then the compilation will be successful.

More and more

If you have more questions about the program design, please go to the Issues section of the project and I will answer them as soon as possible.

Reference

  1. face_identification
  2. Dlib 19.17
  3. Opencv 3.4.5
  4. Qt 5.12 Mingw 730 x64
  5. QTCSV 1.5.0
  6. FLANN 1.9.1
  7. Kazemi V , Sullivan J . One Millisecond Face Alignment with an Ensemble of Regression Trees[C]// 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE Computer Society, 2014.
  8. Schroff F , Kalenichenko D , Philbin J . FaceNet: A Unified Embedding for Face Recognition and Clustering[J]. 2015.
  9. He K, Zhang X, Ren S, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 770-778.