The text/Cai Yuan

Organizing/LiveVideoStack

Live replay:

Mudu. TV/live/watch /…

Hello, my name is CAI Yuan from Jinshanyun CDN and Video Cloud Product Center. This time I will share with you the theme of immersive video transmission.

1. Global Video Cloud Traffic Market Forecast

The chart above shows the estimated 2019 global video cloud traffic market. From 2017 to 2022, the overall video traffic and CDN market growth is very optimistic. The total video CDN traffic could reach the order of 252 exabytes by 2022. During this period, the proportion of video increased year by year and is expected to exceed 80% of the total video by 2021. With the emergence of 5G and the development of ultra-high-definition video and low-delay video, immersive video service will usher in a large eruption period and become the main force of video development.

2. Audio and video call traffic during the outbreak

Through the outbreak, it can be seen that in the overall video traffic, audio and video call traffic increased by more than 200%. Meanwhile, mobile video traffic grew by more than 60 percent. The above chart shows the number of audio and video conferencing apps downloaded during this year’s epidemic, which increased more than 10-fold. The yellow part is the growth of ZOOM downloads, so you can see that it’s growing by a factor of 29 or so. The light blue part is the Google Hangouts Meet, which is more than a 20 percent increase. This shows that the growth of audio and video traffic during the outbreak was actually much higher than expected.

3. Video trends in mobile communications

In the electronic age, network conditions were only about 100KB of network bandwidth. We watched more videos on TV. In the era of Internet 1.0, we began to watch videos on PC, including on-demand. At that time, the more popular application scenarios were Youku, Tudou and Ku6. At the time, the Internet 1.0 era was the iconic video application and website of the 3G era. To the mobile Internet era, under the catalyst of the 4 g network broke out in a few short video applications and similar to the trill quickly now live very popular including the scene of the entertainment, the show, which is representative of the APP as reflected guest, Chinese prickly ash, Momo, volcanoes, besides including video communications such as WeChat, ZOOM, audio and video communication also broke up. In addition, the boom in online education during the epidemic is a product of the Internet 2.0 era. Internet 2.0 bandwidth is around 100 megabits. It is predicted that in the next stage of industrial Internet, 5G bandwidth will increase greatly with the emergence of 5G, and the network will be upgraded from 100 megabits to 1GB. In this context, we can predict future explosive growth in medical scenarios, surveillance, distance learning, online classroom, and VR and AR cloud gaming scenarios.

4. Scene form

4.1 Real-time online education

The above picture shows online education in a real-time scenario, which shows that augmented reality can make learning more realistic and immersive. The dinosaur scene in the above picture is very immersive for students, and the experience is very real. Mixed reality operations can make skilled workers’ operations more realistic.

4.2 Highly interactive online entertainment

VR games have a strong sense of interactivity and substitution, and its body recognition and action recognition can greatly increase the fun of the whole game. We can watch an NBA game or a World Cup soccer game through VR live broadcast or VR glasses. This virtual reality makes people feel like they are there.

4.3 Immersive online office

In the virtual immersive online office, ZOOM, Tencent Conference and Dingdingonline are more popular to further optimize the immersive office applications. Improve collaboration in telecommuting with ultra HD, immersion, and low latency.

According to IDC’s market assessment, the trend in the immersive video cloud market is growing rapidly. This is a very strong growth with a compound annual growth rate of more than 66% over the five-year period from 2019 to 2023. By 2023, the ownership rate of the entire VR and AR devices will exceed 60 million, which is the explosive growth of VR and AR devices in the future. On the basis of hardware support, developers can have more imagination space, the future VR, AR applications and its creative applications will have a great growth.

5. The challenges of immersive video

5.1 What are the challenges?

Human eyes have more demanding requirements on immersive video. The pursuit of video mechanism needs to meet the requirements of 50K, 120fps, 20bit resolution, bit rate and color gamut, so as to meet the requirements of real immersion for human eyes. The current online video we see falls far short of such requirements, with more 720p, 30fps and 8bit. In order to meet the requirements of human eyes for immersive video, there will be great challenges for video codec and transmission.

5.2 Ways to cope with challenges

The product layout of Jinshanyun mainly focuses on three aspects: interactivity, to improve the interactive ability of video; Ultra-high resolution; VR technology. These three technologies build on the entire IaaS platform of Jinshan Cloud. IaaS has a large-scale cloud computing platform, CDN traffic distribution, and underlying IaaS technologies such as RTC plus edge computing. Through IaaS technology, we build a complete PaaS platform for developers, including edge computing platform and edge computing plus RTC audio and video transmission capability, which is our ability to output on low latency interaction. In addition to open codec technology, 8K and 10bit coding capabilities, we also provide a platform including picture quality evaluation and magic mirror, which are specially designed for ultra HIGH definition products for developers and customers to use. In VR, we will focus on VR codec with low delay, VR transmission and gesture body recognition with THE aid of AI to provide the whole VR solution. The above is the layout of jinshan Cloud in the product.

6. Kingsoft Cloud Video Cloud focuses on technical points

Kingsoft cloud video cloud focuses on technical points mainly in the above three directions. The first part is the low delay, through the RTC and edge computing software and platform support, can ensure that the delay in the transmission process of codec within the millisecond level. The second part is codec technology, which is our core, with the support of codec, will provide an ultra HD capability focused on 8K, 10bit, as much as possible to bring the customer bit rate savings. The third part is immersive technology. With the support of AR, VR and AI technology, it integrates video capability, ultra-low delay coding, transmission scheme and the corresponding AI technology capability. All of Jinshanyun’s core technical capabilities will be provided externally through the PaaS platform. The three major technical points will be provided to customers through jinshanyun’s immersive platform and PaaS to support innovative development and application of the underlying capabilities.

6.1 RTC+ Edge computing technology brings low latency

How to bring low latency audio and video transmission capability through RTC+ edge computing technology. At the scale of more than 1000 points of data centers worldwide, Jinshan Cloud edge computing can provide low-latency access of less than 15ms, good routing and scheduling capability across networks, and efficient data flow for audio and video transmission. The edge calculation of Jinshanyun has very strong computing power and distribution of distribution points. Through the ability of RTC software, Jinshanyun also provides the ability of 100 people online audio and video call, rich SDK terminal support. Through the strong audio and video processing and FEC weak network technology support, can provide rich audio and video scene capability support. At present, Jinshanyun is more focused on pan-entertainment scenes, such as pan-entertainment continuous mic, and online audio and video education. In the future, the combination of RTC and edge computing power can be well applied in our immersive low-latency scenes.

6.2 Intelligent UHD coding scheme

The first part of jinshanyun codec technology more than 5 years of technical accumulation, codec technology can save more than 60% or even 80% of the bandwidth compression rate for customers. This compression rate data is at a very leading level of capability in the industry. The second part is through our algorithm including the image algorithm, coding algorithm, the picture for hierarchical processing, focus to enhance the picture quality. The third part is THE AI technology. Through the scene prediction analysis and picture quality analysis, we can provide the optimal coding solution for the videos of different scenes for the video customers, so as to achieve the optimal coding scheme for the users.

6.3 Jinshanyun Deep coding technology

In terms of coding standards, Jinshanyun also supports 264 coding, 265 coding, domestic AVS2 coding and the latest fourth-generation coding standard AV1, which have been commercialized on the platform. We are also a core member of the AOM Open Coding Organization.

Each code has its own intellectual property. In 2019, Jinshanyun provided more than 50 patents.

In performance compression rate is much higher than the open source coding compression rate, to achieve a very leading position in the industry. The efficiency of coding optimization is in the top position of cloud vendors in terms of cost performance.

We also support full link. Full link refers to the ability to support cloud transcoding on the cloud, mobile codec and web codec. Therefore, Jinshan Cloud supports the entire link in the cloud, mobile terminal, PC terminal and web terminal to play, and cloud coding capability.

6.4 Progress of AV1 codec

Currently, AV1 encoding has supported 4K, 8K ultra hd coding and 100fps10bit coding. 10bit has supported 10bit video shooting and application on iPhone12 and mi 10. There will be more applications within these apps. Jinshan Cloud layout in advance on AV1, has supported 10bit codec support on the cloud.

Above is a video showing the AV1 codec. The original piece is 6.37m and can be compressed to 1.59m under 265 encoding, saving about 75% bit rate bandwidth. AV1 can further compress to more than 800 KB, the bit rate savings up to 85%, video transmission on the very large support, can reduce the bit rate to more than 80%, her transmission quality and her transmission delay, will have a good experience to increase.

6.5 Ultra clear picture quality solution

Above is a solution of AI plus uHD codec with the benefit of AI’s ability. There are four major improvements that can be achieved through AI. The first is scene recognition, which can identify video scenes, match different scenes such as sports, shows and games, and use different coding templates for different scenes to achieve the optimal coding application. The second is content segmentation. Through ROI segmentation, AI technology can be used to identify the areas that human eyes focus on in the video, such as lips. More videos will be enhanced on these key areas to make the subjective effect more excellent, make the subject more prominent, and make the background look more pure. The third is a quality analysis, through the study of nerve, kingsoft cloud group support multiple types of evaluation quality including KPA (image) video perceptual evaluation system of quality analysis and VMAF analysis, these analysis to the quality of the different video judgement, such as for high-definition video can be applied more radical coding parameters, for low qing video, There may be a better fix to make the overall effect better and the video quality even better. The fourth is perceptual coding. We can detect the area that the human eye pays the most attention to, for example, the area on the edge of the human eye pays much attention to, and more bit rate will be allocated at the edge for coding. These four blocks are combined by the ability of AI coding and decoding to make the coding more efficient, the bit rate allocation more reasonable, improve the overall picture quality, and reduce the transmission bit rate.

6.6 VR Block Coding

This part mainly shares some core technology investment in VR. The first is the difference between 264 encoding and 265 encoding and AV1 encoding. The 264 code only supports Slice sharding, which is horizontal sharding. In the case of AV1 and 265 encoding, it supports Tile partitioning, which means it supports horizontal and vertical partitioning, which naturally supports block-partitioning, which is very important in VR. Through our block, VR can be shard, after segmentation, the block up if there is no block rendering, but on the whole video in VR hardware rendering very high demand for computing power, likely equipment decoding temperatures may reach 60 degrees Celsius, wear on the head is not able to accept, only to watch on TV. However, with Tile coding, only the video blocks within the field of view need to be decoded, which will greatly reduce the computational power requirements of the head-display for decoding, making hd possible. Another advantage of Tile is that it is much smaller, saving more than 75% of the bit rate and greatly reducing the overall cost of transmission. Tile solution at the same time also poses a challenge, first is turned to delay, because by Tile encoding, it only transfer part of the video, in turn, would have delayed the need to control in the scope of the human eye can accept claim to the edge of computing, and to meet the requirements of network transmission to the requirement of processing will have a more strict demands. Secondly, it will also bring greater challenges to AI and image processing. Traditional image processing is to do processing for the whole picture. After Tile transmission and block partitioning, it has cut and segmentation processing, through block coding and the video processing ability after block coding plus edge computing ability, Jinshanyun handles Tile coding challenges well. Through the chain of “cloud, edge and end”, the head delay can be reduced to the acceptable range of human eyes, and at the same time, the image processing is well enhanced on the Tile coding.

6.7 Immersive FOV field of view

Tile coding is to better achieve FOV Angle of view requirements. FOV field of view is a range seen by the human eye, and the range that the human eye can observe is about 90 degrees. Immersive video is a 360-degree view, but the area that the human eye pays most attention to is between 90 and 120 degrees. Within the scope of the human eye FOV Angle of view, we can transfer and show hd video, when we turn, we will reduce the viewing Angle switch from low resolution to high resolution, and viewing Angle range will switch from high resolution to low resolution, it is actually required to switch video switching on the need to keep within 30 milliseconds to 60 millisecond. To achieve this, on the one hand, we need Tile coding support, on the other hand, we also need edge computing, network transmission, coding support, so that the video transmission bit rate is much smaller, its transmission delay will be greatly reduced, to be able to control within 60 milliseconds. In addition to transport, header integration and adaptation are required. Therefore, the whole link needs the three ends of “cloud, edge and end” to fuse to realize the low delay of head turning. Jinshanyun has a certain amount of time in FOV, and in this respect, it will provide the platform and solution to customers and open it to developers for further use. So that’s a little bit of technology sharing.

7. The vision

The golden cloud is invested in the three major directions shown above. The first aspect is interactive interconnection, through our interactive video, can make Jinshanyun more intelligent, provide lower delay, higher definition of voice and video effect. The second aspect is uHD capability, which provides higher picture quality experience, greater compression rate and better coding efficiency, and provides better cost-effective coding services and video media processing services. The third party interview immersive technology can provide the whole immersive technology to the open platform through block coding, low latency perspective, low latency capability of FOV and interactive recognition through AI. Our vision is to boost the development of HIGH-DEFINITION applications in the 5G era through the whole immersive video platform.