In this paper, from project survey, data collection, data preprocessing, deep convolutional neural network training to server deployment, a small project of facial expression recognition has been realized, which is very suitable for students who have been studying but cannot find a suitable training project.

0 Project Achievements

Let me show you my results. The images we tested were, of course, the current hottest Nihongo TV series, “It’s Your Turn”, CP Nikaidan and The Black Island

Interested friends can scan the code for experience, it will jump to my website, select a picture file to Upload, Upload will return the prediction result, among which smile means smile, pout means pout, no-smile means neutral expression. So let’s start our explanation of the whole project

After finishing this project, all codes will be uploaded to Github. There will be detailed code annotations, welcome to check! Github.com/FLyingLSJ/C…

1 says in front

As a machine learning enthusiast, the best way to improve yourself is to work on a project, so what is a project? Trained in MNIST datasets? Dogs vs. cats? I think it’s hard to call it a project, and it can’t make HR see the shining point on the resume. I was lucky to read teacher Chen’s article “I have no project experience, What should I Do?” “, which introduces in detail the definition of the project, the role of individuals in the project, and judging the success of the project, etc., which is very enlightening. The summary is as follows:

  • Project definition: work done to achieve a specific purpose within a given time limit of cost
  • Personal role in the project: participation (a link), leadership (person in charge), independent responsibility
  • Judge the success or failure of the project: What was the initial goal, success if achieved, failure if not achieved. Even if you don’t participate in the whole project process, you need to pay attention to the success or failure of the sub-project (the part you do)
  • Whether the project is advanced or not: the projects with more output and less input within the specified time and cost range are “advanced” projects; “Low-level” projects are those with low productivity and high time consumption

I made it into a mind map, which can be accessed in the background by replying to a mind map

As a machine learning enthusiast, AND I want to engage in related work in the future, I have been thinking about how to create a project to simulate the process of enterprise product development as much as possible, so that I can develop a product or service in advance with the mentality of actual developers. So I choose facial expression recognition as my actual combat project

2. Why is facial expression classification

In the beginning of machine learning, we usually start with ready-made data sets and official demos, and adjust parameters step by step to achieve the precision we need. However, in the actual enterprise development process, there are often many processes involved and many details of engineering problems (although I am also a student now, as far as I know, the general process should be similar).

  • Boss/business initiation requirements
  • Determine business composition
  • Project research: market research, algorithm research
  • Determine algorithms and collect data
  • Determine the framework and benchmark model
  • Server Deployment

The whole project of face expression classification involves several aspects, such as data collection, data preprocessing, face detection, face key point detection, deep learning model training, model online deployment, etc. It not only involves traditional machine learning, but also closely combined with deep learning related knowledge. Therefore, facial expression recognition as the actual combat is more appropriate

3. Project research

Before starting a project, you must do research, including market research and algorithm research

  • The market research

Market research needs to know whether there are similar services or products on the market, such as APP, applets, web pages, etc. Whether the selected program has market demand, whether there are mature competitors and markets; See if the target audience (age, geographic distribution), market share and potential competitors of the content we serve have reached the landing level. Without sufficient research, it is possible that your product is made, but in fact there are mature products on the market, so all the resources invested in the early stage are wasted. The so-called people have no I have, people have my excellent, excellent I lian, lian I go, do know the roots, not to make when the heart is too big

Of course, for our small project, we may not need to have the whole process above, but it is generally similar. We need to know similar products in the market, which of course is whether there is facial expression recognition software, small programs, APIS and so on in the market

Before that, let’s briefly introduce a few application scenarios of facial expression recognition

  1. Microexpressions have high reliability in emotion recognition tasks, and have potential value for emotional recognition tasks, such as marital relationship prediction, communication negotiation, teaching evaluation, etc. In addition to being used for emotion analysis, the researchers observed microexpressions produced during intentional lying. After training in microexpression recognition, the average person’s ability to detect lying improved
  2. In the financial sector, there have been reports that a guide answer engine that combines microexpressions with knowledge graphs can identify whether a bank customer is at risk of fraud
  3. Smart home: identify the user’s behavior, intelligent adjustment of electrical appliances, more intelligent
  4. Automatic driving: monitor and analyze the distraction, fatigue and related negative emotional fluctuations during driving, and improve driving safety with driver assistance system
  5. Education: Real-time measurement of learners’ emotional changes in learning content and learning process (such as concentration, confusion, aversion, etc.)

Now it’s time to test the existing product!

  • Baidu AI Experience Center

    Search baidu AI Experience Center in wechat mini program, you can experience face and body recognition, voice technology and other functions

  • Megvii AI platform

    In kuang depending on artificial intelligence platform www.faceplusplus.com.cn/emotion-rec… In the result returned, the degree of smile is a floating point number with a value of [0,100], three significant digits after the decimal point. A higher value indicates a higher degree of laughter. In the program, I set a threshold of 60, that is, more than 60 is considered a smile.

  • Others: Enterprise-oriented products

Shia thinking (Shanghai) information technology co., LTD. : www.cacshanghai.com/www/index.p…

Flat AnYun: yun.pingan.com/ssr/smart/W…

Among them, Ping An Bank won the first prize in OMG Micro expression Competition, an international authoritative evaluation of micro expression, which shows the technological breakthrough of Ping An Bank in facial expression recognition.

And ZAO, which is very popular recently, is also related to face recognition

  • Algorithm research

After completing the market research, the next step is the algorithm research. The so-called algorithm research is what algorithm is used in the project we are doing now, and how accurate can we achieve

We can find relevant articles on CNKI or Google Academic, and find out if there are relevant contests or data sets. Industry media reports are also a means. For example, Huxun and 36Kr will publish in-depth articles related to the industry.

  • Micro facial expression recognition were summarized: html.rhhz.net/ZDHXBZWB/ht…
  • Based on SIFT algorithm expression facial recognition: html.rhhz.net/YJYXS/html/…

And see if there are open source projects already on Github

4. Data collection

Without the support of data sets, no matter how powerful the deep learning model can be, we need to collect data before starting the training model. The basic idea is to see whether there is open source data set first. If there is no open source data set, then consider the Internet as a treasure house for crawler collection

This project mainly identifies three kinds of facial expressions, namely smile, pout and neutral facial expressions, so the data we need are also based on these three facial expressions

First, we looked for open source datasets, starting from various contest platforms. For example, Kaggle is a data repository. We found a CelebFaces Attributes (CelebA) Dataset contest on Kaggle. The data provided inside contains 202599 images, each image has a total of 40 attributes. We used the Smiling attribute in it, selected 5-6K images for each category, and divided the images into two folders: smiling and neutral expressions

After collecting smiles and neutral expressions, the next is to look for expressions related to the pout. The author searched the network, but could not find expressions related to the pout, so the next consideration is crawler collection. If you don’t have knowledge of reptiles, are you stuck at this step? No! Github always has the resources you need

This project uses the following open source picture crawler project: keywords “pout pout”

  • Github.com/sczhengyabi…

    Download it as an EXE file, run and set key parameters, and you can crawl relevant images from the three major search engines

  • Github.com/kong36088/B…

Using the above method, I scrolled 1,200 + images from the Internet as a training set for the “pout” emoji

5. Data preprocessing

From different sources in the previous step, we get to the training set, then before the training model to cleaning and sorting of data set, generally include the following aspects (detail see the mind map below, note: the contents of the mind map to sort out since the depth study of image recognition, the core technology and the case of actual combat “author: there are three)

  • Data normalization
  • Data sorting and classification
  • Data denoising
  • Data to heavy
  • Data storage and backup

Data processing takes up a lot of time in the whole project, which is tedious but crucial. The author spent a lot of time in this step.

The final image is shown below. Does that mean we can start modeling and training immediately? It’s not the case. If you have know about micro expression have students should know that facial expression is determined by the facial structure, such as the mouth, eyebrows, eyes, nose will influence the expression of expression, in this project, we want to achieve is decisive for expression, mouth area so we can take is to simplify the problem, we focus on the area of the mouth, this area influence the expression of our expression.

In this regard, we can further process the collected images by cutting out the mouth area as the input of our model. Generally, the resolution of the cropped images is around 60-70, which greatly reduces the training time of our model compared with sending the original images into the model training. Now there are a lot of face detection algorithm has been quite mature, we use OpenCV+ Dlib these two picture processing library. Before we start, we need to install them. For OpenCV, it’s easy to install. For Dlib, we’ll show you how to install them for Windows and Linux. Before installation, need to go to https://pypi.org/simple/dlib/ to download and equipment matching WHL files, by the method of WHL for installation

Install OpenCV under Windoes and Linux
pip install opencv-python 
LibXrender. So.1: cannot open shared object file: No such file or directory
# Consider installing the following packages
apt-get install libsm6
apt-get install libxrender1
apt-get install libxext-dev

Install dlib on Windows
pip install *.whl   # * is the name of the downloaded WHL file, the installation process is slow, please be patient

Install dlib in Linux
sudo apt-get install build-essential cmake
sudo apt-get install libgtk-3-dev
sudo apt-get install libboost-all-dev
pip install dlib

Copy the code

After key point detection, the results obtained are shown as follows. In this paper, we used 68 key point detection, so we only need to extract the area (mouth area) around points 48-67

The final results are as follows: The data set size is: 1000 smiles and 1000 neutral faces, and 761 pout faces are left after processing

We split the data set into training set and test set in a ratio of 9:1

6. Select the framework baseline model

With the data set of our target, the next step was to start the training. We chose Pytorch, a deep learning framework, and then we needed to decide on a benchmark model because we didn’t have a lot of data. If you want to apply deep learning to small image data sets, a common and highly efficient method is to use pre-training networks. A Pretrained network is a preserved network that has been trained on large data sets, typically for large-scale image classification tasks. If the original data set is large enough and general enough, then the spatial hierarchy of features learned by the pre-trained network can effectively serve as a general model of the visual world, so that these features can be used for a variety of different computer vision problems, even if the new problems involve completely different classes from the original task. For example, you train a network on ImageNet (mainly animals and everyday objects) and then apply the trained network to an unrelated task, such as identifying furniture in an image. This portability of learned features across different problems is an important advantage of deep learning over many earlier shallow learning methods, which makes deep learning very effective for small data problems

Using the good parameters of the model trained by others on large-scale data, we only modify the final classification parameters and apply them to our data set. Generally, the results are not bad. However, due to the small input data (only mouth area is included, and the pixel size of the image is around 50×50, The input of the classical neural network is 224×244), which cannot meet the input requirements of most benchmark models. Therefore, I designed a simple convolutional neural network by myself. The code of Pytorch framework is as follows:

class simpleconv3(nn.Module):
    def __init__(self):
        super(simpleconv3,self).__init__()
        self.conv1 = nn.Conv2d(3.12.3.2)
        self.bn1 = nn.BatchNorm2d(12)
        self.conv2 = nn.Conv2d(12.24.3.2)
        self.bn2 = nn.BatchNorm2d(24)
        self.conv3 = nn.Conv2d(24.48.3.2)
        self.bn3 = nn.BatchNorm2d(48)
        self.fc1 = nn.Linear(48 * 5 * 5 , 1200)
        self.fc2 = nn.Linear(1200 , 128)
        self.fc3 = nn.Linear(128 , 3)

    def forward(self , x):
        x = F.relu(self.bn1(self.conv1(x)))
        #print "bn1 shape",x.shape
        x = F.relu(self.bn2(self.conv2(x)))
        x = F.relu(self.bn3(self.conv3(x)))
        x = x.view(- 1 , 48 * 5 * 5) 
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x
Copy the code

7. Model training

ResNet18 was used as the benchmark model to freeze all convolution layers and only change the last classifier. Adam optimizer was used to train 500 rounds and the following results were obtained, with an accuracy of about 90%. Due to the computing power of other deep learning platforms, the platform could not download images of training results temporarily. Therefore, the training accuracy and verification accuracy of the model are shown as follows, with a total of 500 rounds. It can be seen that the network has converged around 260batch

Model training accuracy curve

Model validation accuracy curve

8. Deploy servers

  • The front end

    After the model training, we get a.ckpt model file, which we put into the server. In order to get a friendly interaction, we need to write front-end and back-end processing programs. For the front-end, we use the Flask framework. According to the official tutorial https://dormousehole.readthedocs.io/en/latest/ combined with HTML templates, we will soon be able to build a simple web interface, including an upload file button and image display and text, the overall effect is as follows:

The code is as follows:Copy the code
from flask import Flask, request
from flask import render_template
import time
from expression_demo import expression_predict  # Facial expression prediction Project

system_path = ". /"
app = Flask(__name__)  Create a Flask instance that uses a single module and should use __name__

@app.route('/')
def hello(imgPath=None):
    return render_template('index.html', imgPath=system_path+"static/image/logo.jpg")

@app.route('/upload', methods=['POST'])
def upload(imgPath=None, result="None"):
    file = request.files['file']
    fileName = file.filename
    filePath = system_path+"static/image/"+fileName    # Image path
    if file:
        file.save(filePath)
        result = expression_predict(filePath)
        if result is None:
            result = "could not found your beauty face"
        return render_template('index.html', imgPath=system_path+"static/image/"+fileName, result=result)
    else:
        return render_template('index.html', imgPath=system_path+"static/image/logo.jpg")

if __name__ == '__main__':
    app.run(host="0.0.0.0") # 
Copy the code
  • The back-end processing

    Back-end processing is to process the pictures uploaded by users, including some work we did in the early stage, such as: Read the users to upload pictures, the key points for face detection, face detection, image cropping and the return of the forecast operation, we will predict functionality encapsulated into a function, then can be called directly in the main function, after each project are encapsulated into a single function, do call can achieve related functions directly

from expression_demo import expression_predict  # Expression prediction project, write all processing functions in a PY file, import in the main function
Copy the code
  • Code upload server

    Upload the deployed code to the server and run the main code in the background, so that the server can always work. The code to run the main code in the background is as follows, where main.py is your main function.

At this point, all steps are complete!

9. Summarize and think

In this project, we started from 0 to realize a facial expression recognition project, which basically went through all the processes of a small product development from project research, data collection, data preprocessing, face detection, deep learning model training, front-end programming, server deployment, etc.

Inside the company, may have different direction of employees to participate in a project, such as front-end engineer, back-end engineer, algorithm engineer, etc., but do your own programs, the whole process need you to do, rather the whole stack engineers work, is to the personal ability is very good exercise, do down the entire process, more or less will meet many pit, However, as long as the patient to solve, are very good growth opportunities, just like the author is not familiar with the front-end and service side of the knowledge, encountered a lot of pits. In the future thinking, in fact, this project still has a lot to think about

Here are the real pain points:

  • Illumination Angle and side face will affect face detection, thus affecting the subsequent prediction results
  • How to continue to improve the accuracy of the model
  • How does the server handle large requests
  • Logical processing of some requests on the WEB side and so on

Due to the limited level of personal knowledge, welcome suggestions ~

Reference:

  • Deep Learning image Recognition: Core Technology and Case Study
  • Github.com/tinypumpkin…
  • Github.com/foamliu/Fac…
  • www.cvmart.net/community/a…
  • www.thoughtworks.com/insights/bl…
  • www.digitalocean.com/community/t…
  • www.digitalocean.com/community/t…

Welcome to: Machine vision CV public account