In recent years, more and more companies have focused on the direction of end intelligence. Some head companies have made new explorations on end intelligence and achieved good results. End intelligence has gradually become one of the core driving forces for mobile App business innovation. What are the challenges in advancing end intelligence? What is the core solution?

InfoQ invited Chengfei Lu (alias: Lu Xing), senior technical expert of Alibaba’s Taobao technology department, to talk about the application of mobile intelligence in Taobao and the technical challenges behind singles’ Day. And will be on December 6-7 QCon Global Software Development Conference (Shenzhen station) “Cloud Integration of mobile development” topic to share “Tao Terminal intelligent technology construction and business Innovation”, the subsequent on-site content will be sorted out and released through tao Technology wechat public account.

Invited guests

Chengfei Lu (name: Lu Xing) is a senior wireless development expert of Alibaba. He has in-depth thinking and practical experience in mobile development, super App architecture, and end-to-end AI. After graduation in 2011, I joined Baidu and experienced the r&d process from 0 to 1 of Baidu input method. Joined Taodepartment in 2013, experienced the complete technological evolution process of Mobile Taobao Super App, and led taobao iOS architecture upgrade, architecture governance, stability, performance and other related work. In 2017, we started to explore the direction of end-to-end intelligence, and built innovative applications such as MNN, Walle, AR technology framework and BEAUTY cosmetic AR.

The following is a transcript of the interview:

InfoQ: Mr. Lu Xing, it is a great honor to interview you. You have been exploring the direction of end-to-end intelligence since 2017. How do you think the development of end-to-end intelligence has changed in the past three years?

Lv Xing: At the macro level, the application of end intelligence will gradually expand from exploration to development. In the future, it will definitely become one of the core technology driving forces of business application and business innovation. Specifically, the development of industry-side intelligence can be viewed from the following three perspectives:

  • From a technical point of view, the problems solved are progressive. From the initial basic problems of model operation, to efficiency and scale application problems, including: how does the algorithm model run on the side? How can algorithmic models be rapidly and iteratively deployed? How to lower the threshold of end AI technology to achieve universal application?

  • From the algorithm point of view, the end – side algorithm continues to mature and improve. From the initial face detection, to human posture, gesture, OCR and so on gradually mature. In addition to the visual model, it is also gradually possible to run the search recommendation depth model, voice ASR model and NLP model on the side. For example, we implemented the real-time voice recognition scheme on the mobile terminal based on MNN this year, and achieved good business results in the “Guess to the end” activity of Taobao live on Double 11.

  • From the perspective of application, the overall scope of application continues to expand and deepen. From the initial single point scene, such as the scene of Taobao.com, to the comprehensive spread of multi-app and multi-scene, incomplete statistics, Ali has more than 30 mnN-based terminal intelligence applications.

InfoQ: What are the main stages of the development of Amoy side intelligence?

Lv Xing: As is shown above, there are many problems at each node. In the past three years, we have been working on solving them, mainly going through the following three stages:

1. End-to-end reasoning engine stage: the end intelligence should first solve the problem of the algorithm model running on the end side, otherwise nothing can be said. The inference engine is the pearl in the crown of the application of end intelligence.

2. Algorithm model service stage: in addition to algorithm model operation, end intelligence also involves model transformation, update and release, version management, operation and maintenance monitoring, etc. In this stage, end AI server was used to solve the issue of algorithm model release and update. In particular, in addition to the model, algorithm tasks also involve pre – and post-processing code, so we built a PythonVM based algorithm task runtime container, let algorithm students write Python tasks to achieve fast iteration.

3. End AI R&D paradigm stage: In the large-scale application of end intelligence, the whole link problem of r&d iteration needs to be systematically solved. On the one hand, the implementation of terminal intelligent applications requires the cooperation of algorithm development and mobile development, but there is a natural GAP between the two, completely dependent on oral communication, there is a big problem of collaboration efficiency; On the other hand, AI application scenarios have the characteristics of long tail and fragmentation. Many scenarios are not implemented due to the lack of professional algorithm support, and the lack of unified technology construction makes it difficult to precipitate and reuse the already applied schemes. Therefore, we constructed “end AI RESEARCH and development paradigm”, which consists of MNN workbench, MNN runtime and end AI server. Its core ideas: first, decouple algorithm and mobile development, so that the algorithm development independent iteration; The second is to lower the threshold of AI and make AI a powerful weapon for common development to solve business problems. I’ll be sharing the details at QCon.

InfoQ: What are the difficulties faced by Amoy technologies in the process of intelligent implementation, and what do you think are the biggest challenges? How was it finally resolved?

Lv Xing: Tao department’s rich business scenes have always been the fertile soil for cultivating innovative technologies. The overall technology and application practice of end intelligence have always been in the forefront of the industry. We have open source reasoning engine MNN and open MNN workbench. At present, Tao has 25+ application scenarios and 65+ algorithm models in daily operation, with reasoning running times exceeding 10 billion times per day, covering core scenarios such as product search and recommendation, user touch, Beatlitao and live broadcast. Tao has gone through three Double 11 tests and achieved huge business value. The overall application can be roughly divided into the following categories:

  • Visual applications are mainly applied in scenes such as Pai Li Tao, Taobao live broadcast, shooting tools and evaluation.

  • Recommendation category, mainly in the home page information flow, after purchase, details and other recommended scenarios.

  • The touch type applies to Push, message, and service pop-up scenarios.

  • Voice, mainly in Taobao live, intelligent noise reduction and other scenarios.

By far the biggest challenge is the inference engine MNN challenge, such as:

  • Fragmentation of mobile devices and systems;

  • Mobile computing power and resources are limited;

  • Visual, speech and other diversified algorithm model

I won’t go into details on how to solve these challenges, but I will focus on sharing my core solutions at QCon Shenzhen 2020.

InfoQ: What is the outstanding performance of end intelligence in the practical application of the recent Double 11? Can we talk about it based on actual cases?

Lv Xing: End intelligence has gradually changed from trial application to one of the core driving forces of business innovation, and related applications can be seen in hot business scenarios of Double 11. There are also applications for this year’s hot live scenes. Relying on MNN developed by Tao Department, Taobao live broadcast room launched a challenge of “voice guess price”. The audience can also realize voice interaction in the live broadcast room. By moving their mouths, they can respond to the task of guessing product price sent by the anchor. Terminal intelligence has greatly improved the interactive playability and accuracy of content understanding of live broadcast.

Accurate user perception is realized based on the end AI technology. In the peak traffic phase of Double 11, the end side computing power and data advantages are fully played, and the experience and effect of active contact users are greatly improved. On November 1 alone, the end side AI decisions were run 27.7 billion times.

Through real-time perception and intention recognition of user behavior, commodity list rearrangement and intelligent refresh, DPV and GMV have been greatly improved in large-scale application of taobao information flow and other scenarios.

InfoQ: Can you briefly talk about MNN’s next steps?

Lv Xing: In fact, the essence of MNN is to achieve the most efficient operation of different types of models on different heterogeneous devices. There are three key points that we continue to evolve and explore.

1, support different types of models, from support CV, Data algorithm model to support ASR, NLP algorithm model, MNN recently in control flow, dynamic graph and other aspects have a lot of improvement and upgrade, new support Transformer and other network models.

2. Support different heterogeneous devices, from client CPU ARMV7/64 / V8.2 to GPU OpenCL/Vulkan/Metal, etc., are constantly evolving and improving. MNN also begins to support server Intel x86/NVIDIA GPU inference. Provide unified inference service of cloud – end integration. For each heterogeneous devices need to implement and optimize all the OP leads to high cost of development, our innovative geometric computing architecture scheme is put forward, the number of OP convergence to about 20 core operator, and achieve low cost to cover the various heterogeneous backend, MNN should be industry covering heterogeneous backend support the operator of the most complete inference engine.

3, to achieve the most efficient operation, high performance has been one of MNN’s core advantages, is also widely recognized in the industry. Specific optimization ideas include offline model compression, graph fusion and other methods for optimization, online through assembly, SIMD/ parallelization, matrix algorithm, scheduling and other methods for optimization. In addition, MNN cooperated with PAI to realize the training and quantification to MNN deployment cloud integration scheme, and added compression schemes such as sparse pruning and overflow-aware quantification.

MNN will continue to evolve in the above three directions, but from the perspective of the whole end-to-end intelligent application link, MNN only solves the single point problem of efficient end-to-end operation of the algorithm model. At present, we are moving from MNN single point technology to the direction of end-to-end intelligent technology systematization and productization. As mentioned above, we build the end AI research and development paradigm, solve the transformation, optimization, debugging, release and other problems in the deployment process of algorithm model through MNN workbench, and even achieve independent iteration of algorithm development. MNN workbench is currently in public testing for free. If you are interested, please visit our official website www.mnn.zone to download the experience.

InfoQ: What other technology directions do you see in the future of mobile?

Lv Xing: Technological progress is strongly related to business development. With the rapid development of live broadcasting business, there should be more development of multimedia technology. I pay more attention to some things related to terminal intelligence.

  • AR + + 3 d end AI

In my opinion, the combination of these technologies can make a lot of interesting applications, among which AR provides the scene ability combining virtual and real, end-to-end AI provides the interactive ability in AR, 3D model /AR material provides the content supply, and 5G network provides the network transmission ability of large resource packages. At present, these technologies are not mature, such as the difficulty of achieving low-cost and high-quality 3D modeling. In addition, mobile phones are not the most suitable carrier for AR applications, so we can look forward to the subsequent consumer AR glasses.

  • The intelligence of end-cloud synergy

At present, the cloud does training, the client does reasoning, and the combination of the end cloud is still relatively shallow. We are also exploring the training on the end, and building a set of distributed end-to-end cloud collaborative intelligent system to realize user personalized understanding, protect data privacy, and save cloud costs.