“Wine” has been playing an indispensable role in the traditional culture of the Chinese nation. However, today’s “global wine world” is full of flowers and thousands of wines are in full view. How can we understand the characteristics of these wines and their stories behind them? “Hundreds of bottles of App” product research and development team is keen found that the needs of this unique, strive to “circle” wine consumers to provide more intimate and personalized experience, and through the baidu fly blade open source power of deep learning platform, completed the “hundreds of bottles of App” sweep “wine label wine” function, make all kinds of condition of wine knowledge.

“Sweep wine label wine” based on Baidu flying PADDLE AI technical solution



The function of “scan wine to mark wine” mainly uses the technology of image classification. As an open source deep learning platform derived from industrial practice, Baidu Feioar also has extremely rich AI solution resources in image classification.

PaddleClas, the flying oar image classification suite, provides rich image classification model resources, covering 23 series of classification network structures such as ResNet_vd and MobileNetV3, as well as 117 corresponding classification pre-training models. Users can select corresponding models for training and use in different scenarios according to their own needs. In this project, considering that the wine recognition function needs to be deployed on the mobile APP, we adopted the MobileNetV2 model with fewer parameters and calculations. At the same time, the Paddle lightweight inference engine Paddle Lite is used to realize the deployment and efficient prediction of mobile terminal lightweight, and the identification accuracy of test data is more than 97%.



Detailed replay of the case implementation process



1. Data set collection

The dataset contains a total of 114 brands, each in its own category. There are about 250 images for each category in the training set and about 50 images for each category in the verification set. In order to meet the diversity of real scenes, the positioning direction, light and the degree of wine marking in the same category of images in the data set are slightly different.

2. Training cluster

After analyzing the training set, we found that due to the single data collection method, the image repetition in the same category in the training set was relatively high. Direct training would take a long time and easily lead to the problem of over-fitting the model. Therefore, the K-means clustering method is adopted to cluster the training sets, and one is randomly selected from each class as the representative of this class. The training process is shown in the figure below. In the final training set, the sample data under each category is reduced to 1/5 of the original, and only the samples with obvious differences are retained in the data set after clustering.

FIG. 1 Training agglomeration class process

3. Data preprocessing

In real application scenes, the wine bottle being photographed is often affected by light, placement Angle, lens size and other factors, which will be different from the training set. Therefore, it is necessary to improve the generalization ability of the model through data enhancement. Data enhancement consists of five steps, as follows:

  1. Random rotation with 50% probability (rotation range 0-30 degrees);

  2. Perform random clipping to the image in the previous step;

  3. Adjust the size of the cropped image in the previous step to 224*224 resolution (the resolution can be adjusted according to the actual data set);

  4. In the previous step results are random disturbance, color color random disturbance consists of four perturbation method in figure 2, with a 50% probability according to the “bright degree – > hue contrast – > – > balance” order color disturbance, the probability of the remaining 50% in accordance with the “brightness – > – > balance – > hue contrast” order color disturbance;

  5. Flip the result of the previous step left and right 50% of the time;

FIG. 2 Random color disturbance

After the image enhancement operation, in order to better fit the pre-training model provided by the flying paddle, the current image was subtracted by the variance.

4. Model training

Based on the actual use scenario of the model, MobileNetV2 model is adopted as the classification model, which can be regarded as a combination of MobileNetV1 and ResNet. On the basis of MobileNetV1’s introduction of deep separable convolution to reduce computation and parameter number, With the introduction of ResNet residual structure (bottleneck), the accuracy is greatly improved compared with MobileNetV1. Figure 3 compares the substructures of the three models.

FIG. 3 Comparison of MobileNetV1, MobileNetV2 and ResNet substructures

Finally, based on the learning rate strategy of Piecewise decay, learning rate of 0.0025, batch size of 64 and EPOCH of 20, the classification accuracy can reach more than 97%.

5. Model deployment

The size and response speed of mobile apps are often two important indicators to evaluate APP performance. Paddle Lite is a high-performance, lightweight, deep learning prediction engine designed specifically for efficient end-side lightweight prediction including mobile phones.

In this project, Paddle Lite first performs a series of computational graph optimization operations, such as Operator fusion and Memory optimization, on the trained MobileNetV2 model. After optimization, the volume of the model is greatly reduced and the prediction speed is greatly improved. Through the fusion of operations, After clipping the inference library, Paddle Lite obtains the inference library which only contains feed, FETCH, Conv2D, DepthWISe_CONV2D, ElementWISe_add, FC, POOL2D, relu6 and Softmax operators. For the optimized model, Paddle Lite implements a good decoupling separation between the execution stage and the calculation diagram optimization stage, and the mobile end deploys the execution stage directly without any third party dependencies. Paddle-lite provides sample code for the deployment of models on various platforms:

https://github.com/PaddlePaddle/Paddle-Lite-Demo).

The final inference time of a single image in different hardware environments is shown in the following table:

Case Results Presentation



The trained model was successfully deployed in “Baibottles App”, supporting the function of “Sweep wine to mark wine” in the App. As shown in Figure 4, open the photo function in the App and take photos of the wine label you want to identify. Then the corresponding wine can be quickly identified and the relevant information and comment information of the wine can be pushed out.

FIG. 4 Recognition results of “Baipin App”

conclusion



PaddleClas provides an image classification suite with rich model resources. There are different adaptation models for different tasks. Users can flexibly choose corresponding models for different scenes according to their own needs. The lightweight prediction of moving or embedded end can be realized flexibly by using Paddle Lite, a lightweight inference engine. The function of “Scan wine and mark wine” of “Baipin App” is another masterpiece of THE AI project of Feioar Land Enterprise. It fills the blank in the field of intelligent wine recognition on the mobile terminal and enables everyone to become a “wine expert” in a second. If you are interested, please come and have a try.



The related resources