This session will focus on video conferencing in progress and explore new technologies for video conferencing. Video conferencing is not a simple codec and network transmission application, but the data support behind it and the introduction of new capabilities will bring new possibilities and opportunities. In this presentation, we invited Xu Jingxi, Network Technology Group of Tencent Conference Products Department, to share the recent work of Tencent Conference on Network Quality Analysis (omitted in this article), as well as discuss the recent hot light field conference, and communicate with you the potential evolution direction of video conferencing.

Content from Tencent conference product department senior researcher Xu Jingxi in the second half of the video conference table sharing.

I am Xu Jingxi from the Network Technology Group of Tencent Conference Products Department. This time I would like to discuss with you what new technologies may be implemented in video conferencing recently.

Today, we are going to talk about a new form of video conferencing that has attracted a lot of interest recently — light field conferencing. I hope we can introduce some ideas to discuss with you. The Light Field conference will focus on its core modules. What kinds of technologies are currently being used?

1

Tencent conference focus on the forefront of the industry

1.1 Advanced video technology

In fact, the Tencent conference has been focusing on the new developments in the industry and academia. A large number of cutting-edge technologies have been introduced, such as the special YUV4:4:4 coding to improve the high fidelity of shared screens, the TSE technology of Tencent Screen Encoder, as well as the virtual background and beauty technologies that we are pleased to see.

1.2 Excellent audio experience

It is worth mentioning that Tencent Conference spent a lot of effort in the audio aspect, set up a special TeanSound laboratory, invested a lot of research and development, launched UWB language codec, intelligent noise reduction, echo detection and elimination technology. We also recently launched the Tencent Meetings TeanSound module for our partners, allowing them to combine their own hardware implementations to provide the same high quality sound pickup capability as the Tencent Meetings.

2

Relying on Tencent cloud Tencent conference to help enterprises to collaborate and transform

At present, Tencent Conference is available in more than 100 regions around the world, allowing enterprises to communicate with people around the world to do real-time audio and video conference.

3

More choices for customers

A large number of government enterprises, educational enterprises have used our products, and we have also secured many important meetings. We are committed to continue to increase the stability of Tencent Conference and extreme network resistance.

4

Enterprise operation management platform

Tencent conference has a large number of enterprise users, which have higher requirements for internal meetings. It needs to provide complete management tools and problem positioning system for IT, so that enterprise IT can easily understand the overall resource consumption, view the current status of the meeting, or let enterprise service configure the meeting related details without participating in the meeting.

5

A new generation of video conference discussion — light field conference

Tencent Conferences keeps an eye on the latest technological developments because we know that all new technologies have the potential to provide a better meeting experience for our customers. The latest focus of the light field conference is a hot direction, here we share some observations and views.

When it comes to optical conference, the first thing that comes to mind is the Google Starline project. The big selling point of the system is that participants can have the experience of being in the same room together. The most important point is that multiple participants can see different sides of the remote meeting people from different angles.

5.1 show

The most interesting aspect of this technique is how remote attendees are represented locally in 3D, so we’ll discuss this first. In fact, there are a number of technologies that can do this. For example, we can see attendees’ avatars by wearing VR/AR devices, such as Valve Index, or Microsoft’s HoloLens. Or, if you don’t want to wear such a device, you can directly see the three-dimensional effect. Sony has hardware that allows users to see multiple angles of the picture by tracking the viewing Angle of the human eye: from this side, the viewing Angle will be displayed; Viewed from the other side, the view from the other side is displayed and the three-dimensional effect is felt. If you want to allow multiple people to view 3D at the same time, as Google Starline does, you need a light-field display. The typical device is Looking Glass 8K.

5.2 Multiple people simultaneously watch multi-angle display

What exactly does LookingGlass do? In fact, the technology looks sophisticated, but the essence is simple: each pixel provides 45 viewing angles from which the user can see the corresponding picture. The 45 angles are actually 45 different images, which are fed into the device in a certain order, and you can swing the display to see different sides, as in the lower right corner. In fact, there are 45 different angles of video being shown at the same time. Different manufacturers have different practices, with LookingGlass probably using prisms and some using nanofilms. The viewing Angle provided by equipment from different manufacturers will vary. Looking Glass 8K can only be viewed from a horizontal Angle of 50 degrees, divided into 45 angles. When moving up and down, you cannot see the top of each other’s head, which is a little similar to Google’s demonstration. We think Google is likely to showcase hardware with similar technology, but note that Looking Glass doesn’t have such a large display yet, presumably using a custom device.

5.3 to collect

So how do we provide these 45 views of video? The simplest way to do this is to use 45 fan-shaped cameras to take pictures of people, and feed the resulting 45 angles of video to hardware in real time, which allows for the roughest of light field sessions. There may be some problems, such as how to calibrate and synchronize the camera. At the same time, it is not impossible to transmit 45 videos, but it is still a waste of bandwidth. Google has a paper (bottom left above) that talks about how it can do something similar with an array of cameras. If you’re interested, see how you can reduce the number of cameras to achieve the same effect.

Looking back at the promotional images for Google’s Project Starline, it looks like it has an array of cameras on the top and bottom. Our guess is that it’s essentially going to be rebuilt with this whole array of multiple cameras.

5.4 the reconstruction

In fact, there are three different ways to do reconstruction.

One is in the form of virtual man or Avatar (such as Tencent Virtual Man on the upper left). The virtual human can be used to collect the overall characteristics of the human in advance, build the model, and then map the human’s action and expression to the virtual human in the real-time meeting. The disadvantage is to do a lot of pre-processing work, it is not convenient to use.

The second is the popular “point cloud” and surface reconstruction technology (top right) from 2000 to 2001. Microsoft has done a lot of research on this technology. It is also reported that Microsoft’s Holoportation team was acquired by Google after they started their own business, so Google may also use similar technology.

The third is the recently relatively new deep learning-based multi-perspective synthesis technology. Multi-view synthesis (bottom left) says that 45 views we collect and transmit are too many. In fact, it takes a small number of, say, 12 perspectives, and some way to generate the rest of the perspectives to reconstruct those 45 perspectives. And the nice thing about this technique is that it’s very generic, and one of the things that’s typical of this is a work called NERF, and the papers talk about how you can do this with neural radiation fields. However, using the scheme in this paper, in our test environment, it takes 7-8 hours of training to change scenes and personnel every time, and it takes more than 1 minute to render a frame of image, which makes it difficult to land in a real-time meeting. In the new paper, there will be some schemes to increase the generality, can do the scene more universal, rendering time is shorter, you can pay attention to the relevant progress.

So these are three different ways of doing it. At present, we are not sure which scheme Starline uses. Maybe it is a combination of the three. Welcome to discuss. At present, it is clear that only the light field display is necessary.

That’s all for me. Thank you.

For more information, please scan the __ QR code in the picture or click __ to read the original __ to learn more about the conference.