With the development of Internet of Things technology, real-time video analysis technology has been applied in various fields of intelligent Internet of Things. Intel has built a complete set of real-time video analysis solutions based on GStreamer and OpenVINO to provide users with more flexible and convenient real-time video analysis services. This article is compiled by Wu Qiujiao, a senior software engineer at Intel, sharing content on LiveVideoStack.




1. Background




With the development of Internet of Things technology, real-time video analysis technology has been applied to various fields of intelligent Internet of things, such as: intelligent retail, intelligent factory, intelligent monitoring, etc. If the video than the eyes of crop Internet, then real-time video analysis technology is the brain of the Internet of things.


At present, video analysis based on deep learning and computer vision is the most common way.




With iot technology in very general scenario – object identification technology, for example, as shown in figure is the typical object recognition of flowcharts, gathering a video stream in the front, to identify the object, to identify the object, and then carries on the back, need to go through many complicated steps, which you need to call a lot of interface. It also involves color space conversion, scaling, reasoning, and decoding processes, all of which require a lot of computing resources.


At the same time, if there are multiple computing resources such as CPU, GPU and VPU in the actual environment, how can codec and reasoning processes make full use of different computing resources to improve system performance? In addition, how to scale easily and quickly when the system needs to handle more flow analysis tasks. These are the kinds of problems that real-time analysis systems run into. Next, we will introduce how OWT (Open WebRTC Toolkit) solves the complexity, performance, and scalability problems of real-time analysis systems. OWT real-time analysis system is based on Intel OpenVINO and open source GStreamer framework development, we first to OpenVINO, GStreamer and OWT do a brief introduction.


2. Intel Vision Computing Platform




Open VINO is a pipeline toolset launched by Intel, with all the capabilities needed to complete the deployment of algorithms and models.


As can be seen from the figure, it is mainly composed of two parts: Model Optimizer, which can convert the Model trained in other frameworks to the Model suitable for Open VINO for optimization, with faster conversion speed; Inference Engine, namely, in the Inference process of THE AI load running on the equipment, can easily realize the processing effects we need, such as pre-processing, post-processing and feature stacking, in the form of Open VINO interface.


Open VINO also supports multiple models, with more than 150 trained models available for direct use. Can be in the Open Model Zoo (https://docs.openvinotoolkit.org/2019_R1/_docs_Pre_Trained_Models.html) to view the specific support Model. In addition, OpenVINO is optimized on Intel platform to improve the performance of deep learning related to computer vision by more than 19 times, which can make full use of corresponding computing resources.




In addition, OWT uses the GStreamer framework. GStreamer is a highly modular pipeline-driven media framework that has been updated since its first release in 2002. Especially in the ERA of AI, GStreamer is widely used in the FIELD of AI due to its flexibility and extensibility. There are many RICH AI plug-ins that can provide various functions.


GStreamer plugin gST-Video-Analytics has been launched to support Open VINO, which provides inference, monitoring, classification and other functions of the plug-in. See https://github.com/opencv/gst-video-analytics/wiki/Elements.

The OWT video analysis system provides a convenient interface for users to combine different GStreamer plugin to perform different analysis tasks.


Intel Collaborative development kit for WebRTC has been version 4.3.1 since the first version was released in 2014, and the version has been evolving over time. In 2019, it was Open source and became OWT (Open WebRTC Toolkit). At Github https://github.com/open-webrtc-toolkit/owt-server.




The figure shows the overall framework of OWT. OWT not only provides very rich functions to the server side, but also realizes a wide range of client support to ensure the access of various streams. Streaming, Conferencing, Transcoding and Analytics are available on the server side, which are commonly used on the video side.


The client supports JavaScript, Android, IOS, Windows, and Linux operating systems to ensure that user streams can be connected to the OWT system using different transport protocols, such as WebRTC, RTSP, RTMP, HLS, and SIP. Analyze it and transmit it.


At the same time, OWT system has been evolving for a long time, and a lot of work has been done in terms of scalability, distributed deployment, and high availability. The product is developing in a more perfect direction. Compatible with multiple Intel platforms, OWT enables real-time analysis tasks to fully utilize hardware resources, resulting in significant system performance improvements.




As for the video analysis architecture, OWT is divided into four big modules as shown in the figure. The CLIENT’s stream is connected to the system through the ACCESS node and transmitted to the analysis module, and then the video is decoded, pre-processed, inferred and post-processed through the GStreamer pipeline, and then the stream is encoded and pushed back to the diffusion node and returned to the client through the diffusion node. Real-time display on the client.


The OWT system supports Intel’s VCAC-A card, Movidus VPU, and various Intel product families.




It is relatively simple for the consumer to want to analyze the stream through the interface. For example, the IP camera flow is connected to the system through the RTSP protocol. If the user wants to analyze this flow, such as face detection or algorithm analysis, he/she needs to send a simple Restful request on the client to specify which flow to analyze.


After receiving the message, the Management API sends it to the Conference Agent for Session Management, notifies the analysis module, and initializes the analysis module. After initialization, Session control is returned to inform the IP camera streaming A-Node to establish a connection with Analytics. At this point, the Streaming Agent sends the stream to the Analytics Agent for analysis of various algorithms. If the user wants to see the analysis result of the flow in the browser, the user can propagate the analyzed flow to the WebRTC node and then transmit it to the user waiting in the browser.


If you want to record a stream, you can spread the analyzed stream to the Recordign Agent and record the analyzed stream on the client. In addition, if users want to upload the analyzed data to the cloud, they can also import the plug-in for corresponding operations.


Meanwhile, as you can see, the analysis process works in conjunction with the entire OWT solution. OWT provides many control apis, such as analysis, video recording, SIP calls, and access from different protocols. If you want to push the analyzed stream to the RTMP server, the solution also has a Streaming OUT interface. There are also operations for mixing streams, pausing/continuing streams, and so on.




After a stream is sent from another node to the Analytics node, it is fed into the Gstreamer Pipeline, where some operations are performed. For example, it is parsed by H.264 and decoded. The frame loss process of the corresponding scenario is passed to the inference module through the Videorate for inference. For example: detection, classification and other processing, encoding, through the appsink to the output node.


In the figure, it is not clear which API is used in the decoding process, which users can choose by themselves. GStreamer Pipeline has many rich decoding interfaces, such as CPU or GPU decoding, which greatly improves the decoding efficiency. For reasoning, there are plug-ins such as detection and classification provided by GST-Video-Analytics on GitHub. You can use CPU or VPU for reasoning, which can make better use of system resources.


The pipeline build user can customize it by choosing to remove some steps and recombine the process.




If there is an analysis task, how is it implemented in OWT?


Developers can use the interface provided by OWT to implement a pipeline for a specific analysis task by combining GStreamer pluginswith different functions. When an analysis task is compiled into a dynamic database, the algorithm number and the dynamic database name are configured in the configuration file to correspond to the analysis task. After the configuration is complete, users specify the algorithm number and the flow to be analyzed when they start the analysis task through restful requests. OWT analyzes the specified flow.


If the user wants multiple algorithm analysis, it only needs to build multiple different pipelines, and then compile into different dynamic library files, and specify different algorithm numbers in the configuration file to achieve multiple algorithm analysis in OWT.


For the instructions in this section, The Webrtc Hack web site (https://webrtchacks.com/accelerated-computer-vision-inside-a-webrtc-media-server-with-intel-owt/) have a more detailed article, You can look up and learn.




The OWT system has evolved for many years and performs a lot of processing in resource scheduling. OWT supports task scheduling based on CPU, GPU, and VPU resource usage. If the OWT is deployed on multiple machines and with multiple computing resources, the OWT collects CPU, GPU, and VPU usage on each node and allocates new analysis tasks to corresponding nodes based on scheduling policies. We provide several common scheduling policies. You can configure different policies for each module based on actual deployment conditions.




For HA, when Analytics is in the normal state, the IP camera stream imports and records through the flow at the top of the figure. A sudden failure occurs while the analysis task is in progress, which can be detected by Session Control. Session Control finds available nodes, activates the available nodes, imports the flow being analyzed to the analysis node, continues the interrupted analysis task, and then transmits the data to the Recording node for Recording. This enables a highly available process to be implemented in a real world scenario.


3. Summary




The analysis part of the overall solution is combined with OWT based on deep learning. OWT uses GStreamer Pipeline to simplify the real-time video analysis process, and developers can easily combine GStreamer Plugin to achieve different real-time analysis tasks. Meanwhile, OWT utilizes OpenVINO and OWT’s own scheduling mechanism to make full use of computing resources and improve overall performance. OWT supports distributed deployment, and each function module can be quickly and easily expanded. In addition, THE HA mechanism provided by OWT can ensure the steady operation of real-time video analysis system.


Intel has been committed to the development of Visual Cloud, Computing and other platforms based on IA. At the same time, it also provides many Open source resources, such as Open VINO and OWT, which have corresponding Open source projects on GitHub. At the same time, Intel also has some AI solutions in the client side, such as WebNN, etc., which will be implemented in the browser in the future, please look forward to it.