AI Sports: The best practice of intelligence in Ali Sports

Brief introduction: In the past year, Ali Sports Technology Team has been constantly exploring the end intelligence, especially realizing the practice landing and business empowerment under the sports health scene, which is the AI sports project. The AI sports project practices the concept of sports digitalization, provides an important support for the upturning of sports population, takes the first step in the field of intelligent sports of Ali Sports, and brings more interesting and novel gameplay to users. Since its launch, the project has received extensive attention.

The author | the | acoustic source ali technology to the public

A background

In the past year, Ali Sports Technology team has been exploring the end intelligence. In particular, it has realized the implementation of practice and business empowerment under the sports health scenario, which is the AI sports project. The AI sports project practices the concept of sports digitalization, provides an important support for the upturning of sports population, takes the first step in the field of intelligent sports of Ali Sports, and brings more interesting and novel gameplay to users. Since its launch, the project has received extensive attention.

Due to the COVID-19 epidemic in 2020, traditional offline sports will be limited, and home-based sports will gradually become a new trend. Based on Alibaba’s strong technology accumulation, Ali Sports team has developed intelligent sports based on AI recognition to meet the urgent needs of online sports, providing users with a simple and fun new way to exercise at home. All it takes is a mobile phone and 3-4 square meters of ground to carry out AI campaigns. During exercise, the user opens the LeLi APP, fixed the mobile phone to one side of the venue, set the Angle of the mobile phone appropriately, and adjust the distance between the body and the mobile phone according to the automatic voice prompt of the APP, until the body is completely located in the recognition box, then the movement can begin.

Two-terminal intelligent practice

After a year of exploration and improvement, Alisports has established a systematic client-side sports intelligence system, from the verification DEMO to the AI sports intelligence platform that includes a variety of movements and supports the ability transfer. End intelligent motion system based on the depth of the ali inference engine in the mobile terminal to reasoning, identify human body posture and movement, the analysis of the human body posture, movement, and movement Angle, gives the real-time feedback and action to correct, through capacity modular combination, is now supporting a dozen movement action and dozens of gameplay, realize the organic integration of sports and AI, Let the user’s online sports become easy to start and full of fun.

III Technical Support

The basic technical idea of end-intelligent motion is to use MNN inference engine for reasoning and attitude recognition. namely

Real-time detection of human contour in images and videos, positioning of 14 key bone points, including head, shoulder, foot and other key joint parts.
Based on the key point information, we can connect the points to form a line and form a motion. We can analyze the posture, motion Angle and movement track of human body.
Through gesture matching, the movement of users can be detected, and the timing and counting of the movements can be realized. At the same time, real-time detection and analysis of the degree of standardization of actions, state feedback is given, user actions are corrected, interaction is realized and interactive experience is improved.

Under the traditional sports mode, users can get real-time reminder and help from on-site auxiliary personnel (coaches, examiners or relatives and friends) when exercising. Under the terminal intelligent movement mode, users can only interact with mobile phone applications when doing actions. Interaction ability and recognition level are affected by a series of factors such as inference model ability, motion scene complexity and motion matching recognition algorithm. In the process of exploration and landing of terminal intelligent motion ability, there will be some new problems or difficulties, such as man-machine azimuth matching, bone point recognition lost points, point error recognition, two-dimensional distortion, user movement, mobile phone shaking, scene noise and so on. These questions are not repeated, and only a few representative ones are selected to share:

It is the foundation of intelligent sports ability to judge the validity of movements and design the key algorithms to improve the precision of motion matching.
On the premise of ensuring the identification effect, effective measures should be taken to reduce the resource consumption of mobile terminals to improve user experience, which is mainly represented by electricity consumption and heating.
A more flexible approach is adopted to reduce the manpower and time consumption of mobile terminal testing, improve the efficiency of development and testing, and provide strong support for the delivery guarantee of the team.

Improved identification accuracy

The most intuitive and basic feeling that intelligent movement brings to users is the accuracy of movement counting. If the action recognition count is not accurate, the enthusiasm of users to use the APP will be eliminated, and the participation will be not high. To this end, we must first solve the problem of accurate counting.

The basic principle of intelligent motion counting is to decompose a complete action into several small steps, and then trigger recognition and judgment for each step. After all steps are iterated, the validity of the whole action is confirmed. If valid, the count is incremented by 1; If not, repeat the process. In short, intelligent motion recognition and counting is a state machine. A motion is discretized and abstracted into N state machines, {S (0), S (1), S (2)… ,s(n-1)}, the state machine detects in a certain order, all detects means that the user has completed the action, and the count is added by 1; If a state is not detected, the corresponding feedback is triggered and the state machine is reset to enter a new loop. Each state machine corresponds to a certain trigger condition, and an action matching result is obtained through the cyclic matching detection of real-time bone point coordinates and states.

It is not difficult to see that the accuracy of action recognition is closely related to the action matching algorithm, and the better the matching effect of the algorithm, the higher the recognition accuracy. In order to improve the accuracy of action recognition, the factors affecting the matching algorithm can be selected as the entry point and breakthrough point, such as bone point, state machine, matching, etc. The corresponding solutions are as follows:

Improve the stability of bone points and ensure the accuracy of state matching results.
The actions with stable, easily recognized and representative bone points were selected as the state machine.
The frame rate should be able to cover all the state machines of an action.

Here are some examples to illustrate.

The accuracy of bone point recognition has an important effect on motion matching. As shown in the figure below, the test subject has an error in bone point identification on his left arm. If you do a straight match, you’ll obviously get the wrong result. In view of this situation, we should make good use of the user’s historical action information and correct the action matching algorithm.

In another case, the user has completed all the actions of a certain action. As shown in the figure below, due to the low sampling frame rate, it is impossible to capture and identify all the postures in the process of jumping jump movement, resulting in unsuccessful matching of a certain state, and finally resulting in wrong matching of jumping jump. For low frame rate problems, we can start from two aspects: model and input source. For the model, without affecting the accuracy of action recognition, the simplified model is adopted to reduce the time consuming of reasoning. For different terminal devices, input sources with different resolutions are used to reduce the time consuming of original data processing.

Reduced performance cost

Affected by physical conditions, mobile terminal computing power and storage space are limited. In addition, deep learning reasoning itself contains a large number of operations and consumes a large amount of resources. If deep learning reasoning is carried out directly on the phone and the resource consumption of the mobile phone’s own services (such as camera, video recording and animation effects) is taken into consideration, CPU and memory overhead will increase significantly, which is intuitively shown as obvious heat and rapid power consumption of the phone. When intelligent movement is implemented on terminal intelligence, special consideration should be given to reducing performance consumption, which is of great importance for improving user experience.

To reduce overall performance consumption, go back to the source and start by reducing the consumption of a single frame. Single frame processing can be divided into three stages: pre-inference, inference and inference.

These three stages play different roles. The previous stage of reasoning mainly completes format conversion, converting the stream data obtained by the camera to the data format required for reasoning, such as YUV format and RGBA format. In the reasoning stage, the main task is to calculate and output the coordinates of bone points. For the input frame data, through the inference engine, a series of algorithms are executed to output the inference results. For example, attitude recognition is to convert the RGBA data of the input picture into the bone point coordinate data. The later stage of reasoning is mainly to complete the display, and to carry out rendering operations and business related operations, such as UI display and animation effect display.

Correspondingly, the above three stages can be optimized respectively. Among them, the optimization in the reasoning process is responsible for by Ali deep reasoning engine MNN, which will not be discussed here. For the data conversion in the pre-inference stage, unnecessary intermediate conversion links should be reduced and the camera stream data should be directly converted into the required format. If Inferential uses RGBA bare data, it directly converts the camera stream data to RGBA format. For the later stage of reasoning, the appropriate rendering scheme should be selected according to the bearing platform to reduce the rendering consumption. For iOS, Metal can be used directly for rendering.

Improve test efficiency

AI intelligent sports is a bold attempt of Ali Sports team in sports digitization. In the application development, especially in the test link, it invests considerable manpower, equipment and time to continuously improve the application function, optimize the application performance and improve the user experience. In addition, the effect test of AI motion recognition is greatly affected by environmental factors, such as light, background, distance, image size of the person in the camera, etc. This puts the way of testing to the test.

Take the traditional test scheme as an example: it is generally a real person, on-the-spot and real-time action, and the tester manually records the results and then analyzes them afterwards, as shown in the figure below.

It is not hard to imagine that the AI intelligent motion run by the mobile phone has a different brand, model, system version and performance parameters, AI intelligent motion of the use of the user may be in different environment, if adopts the traditional test method, test coverage on different factors, the testers, test time, put forward the challenge, and consistent with the test accuracy is not certain. The specific reasons are as follows:

High labor cost: a test requires the cooperation of several students, which is time-consuming and labor-consuming.
The test environment is relatively single: unable to cope with the complex and diverse online environment.
Test results are difficult to quantify. The accuracy of the model, the efficiency of the algorithm, the accuracy of dynamic matching, the degree of accuracy improvement, the performance consumption and so on cannot be quantitatively evaluated.
Problems are difficult to locate. After the analysis and investigation, the problem of online customer complaints could not be reproduced.

Traditional testing methods are difficult to continue. In order to overcome the above difficulties, Ali Sports Technical Team has developed a set of automatic testing tools for AI sports, which is specially used to solve the testing problems of AI intelligent items, realize the quick positioning and regression of online problems, and realize the quantitative evaluation of the precision of model algorithm.

The solution of the automatic test tool is: batch parsing of video sets, simulating real scenes, obtaining bone point data, testing business results, and automatically generating test reports. The specific technical scheme is shown in the figure below:

The new testing tool has significantly reduced the labor cost and improved the testing efficiency. Specific test results are as follows:

It should be noted that the effect of testing tools is related to the number of test samples, and the more abundant the samples, the better the testing accuracy.

IV Business Results

Alisports intelligent sports now supports dozens of sports movements, and has developed rich AI training courses. At the same time, through the modular combination of sports capabilities, Alisports supports the continuous expansion of new movements in the future.

Since the birth of AI intelligent motion, le dynamic APP launched in succession straight arm jumping jacks, push-ups and other upper limb movements, hip bridge, lower limb movements such as squats, and rope skipping, jumping jacks, and other forms of body movement and so on the many kinds of sports, makes the users can not restricted by time and place, with friends anytime and anywhere to participate in the AI movement, raised the APP users of attractive and interesting. In addition, the AI training course innovatively introduces star resources, and promotes the “star partner training” class that runs uninterrupted 7 days a week for 52 weeks throughout the year, so as to drive users to develop sports habits, enjoy sports and fall in love with sports with stars. Ali sports team will continue to create more sports play according to the needs of users, enrich product functions, and form the unique business brand and innovative product features of Ali sports intelligence.

This article is the original content of Aliyun, shall not be reproduced without permission.

AI Sports: The best practice of intelligence in Ali Sports

A background

Two-terminal intelligent practice

III Technical Support

IV Business Results

Related Posts

TrackFormer is a multi-target tracking method based on Transformer