What is a InsightFace

InsightFace is a toolbox related to face detection, face recognition. The Python Package has been wrapped to work out of the box. Its upper interface can directly do face detection, gender judgment, etc. The underlying implementations are actually trained models.

The model provided by OneFlow is one of the back ends of the InsightFace project.

For the reasoning effect demonstration of InsightFace, please refer to the demonstration of project face recognition permission system on OneFlow cloud platform.

This project mainly introduces the principle and technical practice related to the training of large scale face recognition.

How is this project used

First, you need to register an account for the OneFlow cloud platform, enter the project and “Fork”. Then click Run to connect the container and run the following command.

cd /workspace && bash ./train_graph_distributed.sh
Copy the code

Output is similar to the following, including:

  • Basic configuration for training
Training: 2021-12-14 14:49:44, 690-RANk_ID: 0 Training: 2021-12-14 14:49:44,720-: Loss cosface Training: 2021-12-14 14:49:44,720-: Network R50 Training: 2021-12-14 14:49:44,720-: Resume False Training: 2021-12-14 14:49:44,720-: Output model Training: 2021-12-14 14:49:44,720-: DATASET MS1M - Retinaface - T1 Training: 2021-12-14 14:49:44,720-: embedding_size 512 Training: 2021-12-14 14:49:44,721-: fp16 True Training: 2021-12-14 14:49:44,721-: Model_PARALLEL True Training: 2021-12-14 14:49:44,721-: Sample_rate 0.1 Training: 2021-12-14 14:49:44,721-: Graph True Training: 2021-12-14 14:49:44,721-: synthetic False Training: 2021-12-14 14:49:44,721-: synthetic False Training: 2021-12-14 14:49:44,721-: synthetic False Training: 2021-12-14 14:49:44,721-: decay 0.0005 Training: 421-12-14 14:49:44,721-: decay 0.0005 Training: 2021-12-14 14:49:44,722-: BATCH_size 128 Training: 2021-12-14 14:49:44,722-: Lr0.1 Training: 2021-12-14 14:49:44,722-: Val_image_num {' LFW ': 12000, 'CFP_FP ': 14000,' AGedB_30 ': 12000} Training: 2021-12-14 14:49:44,722-: 18 fad635 ofrecord_path/dataset / / v1 / ofrecord Training: the 14:49:44 2021-12-14, 722 - : num_classes 93432 Training: 2021-12-14 14:49:44,722-: num_image 5179510 Training: 2021-12-14 14:49:44,722-: NUM_image 5179510 Training: 2021-12-14 14:49:44,722-: NUM_image 5179510 Training: 2021-12-14 14:49:44,722-: Warmup_epoch -1 Training: 2021-12-14 14:49:44,722-: Decay_epoch [10, 16, 22] Training: 2021-12-14 14:49:44,723-: Val_targets [' LFW ', 'CFP_FP ',' AGEDB_30 '] Training: 2021-12-14 14:49:44,723-: ofrecord_part_num 32Copy the code
  • Log to load validation set data
Training: 2021-12-14 14:49:50,124-loading bin:0 Training: 2021-12-14 14:49:51,372-loading bin:1000 Training: 2021-12-14 14:49:52,649-loading bin:2000 Training: 2021-12-14 14:50:17,039-loading bin:9000 Training: 2021-12-14 14:50:18,300-loading bin:10000 Training: 2021-12-14 14:50:19,576-loading bin:11000 Training: 2021-12-14 14:50:20,839-loading bin:12000 Training: 2021-12-14 14:50:22,099-loading bin:13000 Training: 2021-12-14 14:50:23,353-oneflow.Size([14000, 3, 112, 112]) Training: 2021-12-14 14:50:23,709-loading bin:0 Training: 2021-12-14 14:50:24,991-loading bin:1000 Training: 2021-12-14 14:50:26,292-loading bin:2000 Training: 2021-12-14 14:50:27,590-loading bin:3000 Training: 2021-12-14 14:50:28,886-loading bin:4000 Training: 2021-12-14 14:50:30,174-loading bin:5000 Training: 2021-12-14 14:50:31,463-loading bin:6000 Training: 2021-12-14 14:50:32,744-loading bin:7000 Training: 2021-12-14 14:50:34,029-loading bin:8000 Training: 2021-12-14 14:50:35,315-loading bin:9000 Training: 2021-12-14 14:50:36,593-loading bin:10000 Training: 2021-12-14 14:50:37, 867-Loading bin:11000 Training: 2021-12-14 14:50:39,144-oneflow.Size([12000, 3, 112, 112])Copy the code
  • Basic information during training (speed, loss change, estimated remaining time, etc.)
Training: 2021-12-14 14:51:02, 452-speed 883.82 samples/ SEC Loss 52.6974 LearningRate 0.1000 Epoch: 0 Global Step: 100 Required: 202 hours Training: 2021-12-14 14:51:09,722-Speed 880.33 samples/ SEC Loss 53.4146 LearningRate 0.1000 Epoch: 0 Global Step: 150 Required: 149 Hours Training: 2021-12-14 14:51:16, 968-speed 883.24 samples/ SEC Loss 51.8446 LearningRate 0.1000 Epoch: 0 Global Step: 200 Required: 122 hours Training: 2021-12-14 14:51:24,237-Speed 880.57 samples/ SEC Loss 50.9537 LearningRate 0.1000 Epoch: 0 Global Step: 250 Required: 106 Hours Training: 2021-12-14 14:51:31, 526-speed 877.99 samples/ SEC Loss 50.5335 LearningRate 0.1000 Epoch: 0 Global Step: 300 Required: 95 hours Training: 2021-12-14 14:51:38,831-Speed 876.17 samples/ SEC Loss 49.6624 LearningRate 0.1000 Epoch: 0 Global Step: 350 Required: 87 Hours Training: 2021-12-14 14:51:46, 151-speed 874.42 samples/ SEC Loss 48.9462 LearningRate 0.1000 Epoch: 0 Global Step: 400 Required: 82 hours Training: 2021-12-14 14:51:53, 476-speed 873.76 samples/ SEC Loss 48.3082 LearningRate 0.1000 Epoch: 0 Global Step: 450 Required: 77 hours Training: 2021-12-14 14:52:00, 810-speed 872.72 samples/ SEC Loss 48.0000 LearningRate 0.1000 Epoch: 0 Global Step: 500 Required: 73 hoursCopy the code

In the remaining time, there will be a short fluctuation at the beginning stage, which is normal. With the completion of static map composition, gradually stable, accurate. After the training is complete, the training log and model are saved under /workspace/model/.

Evolution of face recognition training techniques

In the field of deep learning, one of the most talked about stories is that Hiton, during the cold period of neural network, stuck to it for decades. Finally, in 2012, Hiton led students with AlexNet to participate in Imagnet image recognition competition and won the title with several percentage points higher accuracy than peers. This ushered in the era of deep learning.

The main feature of deep learning is to use deep neural network as feature extractor to carry out end-to-end feature extraction, replacing traditional machine learning methods that rely heavily on manual design of feature extraction rules.

Face recognition based on deep learning also started from the introduction of convolutional neural network into face recognition algorithm by DeepID(CVPR 2014) in 2014, to InsightFace in 2019, basically approaching maturity.

The process of deep learning is intuitively understood as loss function as optimization target, and then iterative optimization methods, such as gradient descent method, are used to guide the model to discover rules from big data, and the learned experience is precipitated to the parameters of neural network.

Face recognition models mostly use convolutional neural networks, especially ResNet, which appeared in 2015 and has excellent performance in the field of image. Therefore, in recent years, there are few innovations in face recognition models, which are basically based on convolutional neural networks. The innovation of face recognition technology, to a large extent, is the innovation of loss function.

From the common Softmax Loss to the ArcFace Loss used by InsightFace, the evolution process is as follows:

In this project, cosface and Arcface, the two most commonly used Loss, are mainly realized. The codes are as follows. Oneflow native interface supports the convenient setting of M1, M2 and M3:

# loss if CFG. Loss = = "cosface" : the self. The margin_softmax = flow.nn.Com binedMarginLoss (1, 0., 0.4) to (cuda) else: Self. Margin_softmax = flow.nn.Com binedMarginLoss (1, 0.5, 0.) to (cuda)Copy the code

The corresponding mathematical formula is:

OneFlow how to solve the very large scale face recognition training problem simply and elegantly

In the practical application of industry, the face ID scale encountered may be tens of millions or hundreds of millions of levels. At this time, a single card cannot complete the training. Generally, the only mature parallel scheme supported by the framework is data parallelism, and simple data parallelism cannot efficiently support large-scale face recognition training.

For common distributed training strategies and basic knowledge, see common distributed parallel strategies

The characteristics of the training model of super-large scale face recognition are:

  1. The final full connection layer is very computationally intensive and occupies high video memory
  2. The convolutional neural network as the first half of feature extractor is not very large

Such characteristics make it not suitable for pure data parallelism or pure model parallelism, but for hybrid parallelism: convolutional network as feature extractor uses data parallelism, while model parallelism is used in the full connection layer, which has the highest efficiency. Oneflow also supports Partial FC native interfaces, which further reduces computation at the full connection layer and greatly reduces video memory usage. Oneflow optimization of parallel Softmax can further speed up the training speed, principle implementation introduction: how to achieve an efficient Softmax CUDA kernel.

Relevant implementation in this project:

#function.py line :118
self.backbone = backbone.to_consistent(placement=placement, sbp=flow.sbp.broadcast)
Copy the code

The above code allows the convolutional network, as a feature extractor, to do data parallel training.

#function.py line :110~117
if cfg.model_parallel:
    input_size = cfg.embedding_size
    output_size = int(cfg.num_classes/world_size)
    self.fc = FC7(input_size, output_size, cfg, partial_fc=cfg.partial_fc).to_consistent(
    placement=placement, sbp=flow.sbp.split(0))
else:
    self.fc = FC7(cfg.embedding_size, cfg.num_classes, cfg).to_consistent(
    placement=placement, sbp=flow.sbp.broadcast)
Copy the code

Above code, let the full connection layer do model parallel training.

The rest of the communication, scheduling, synchronization and so on are left to OneFlow efficiently.

For more details, check out our official tutorial: InsightFace from Bronze to King