Reducing Internet Congestion and Pursuing Beautiful Experience - Dialogue with Zhang Qi, Chief Scientist of Leyun

✎ Ed.

Founded in 2019, Pale Cloud is the first real-time interactive communication cloud service provider with video conferencing background in China. Pale Cloud has gathered a large number of senior technical experts focusing on audio, video, whiteboard, network, AI and other fields. Over the past two years, Pale Cloud has been committed to helping users achieve high definition, stable, easy to use, low latency real-time interaction. With the development of 5G and AI technology as well as the impact of the global epidemic, the application scenarios of audio and video are becoming increasingly changeable. How can Paoleyun provide users with perfect solutions with better product experience? LivevideoStack recently interviewed Zhang Qi, chief scientist and partner of Pai Yue Yun. He will talk about his understanding and outlook of audio and video technology from the perspectives of products, technical challenges, strategies and AI enabling. Mr. Zhang Qi is also a guest lecturer at LivevideOstackCon Beijing Station. He will give us a wonderful speech at the meeting.

Lecturer: Zhang Qi, Chief Scientist & Partner of Paileyun. Master of mathematics, Zhejiang University, 20 years of experience in video development, 8 years of experience in WebEx audio and video engine architect, author of OpenH264, worked successively in Hongsoft, WebEx, netease and other companies, proficient in video algorithm, also proficient in audio and video engineering, led the audio and video engine architecture design of many companies. With a deep understanding of artificial intelligence technology and its applications in the field of real-time communications, he has the experience of serving hundreds of billions of minutes of audio and video calls per year.

LivevideoStack: Hello, Mr. Zhang, I’m very glad to invite you for an interview. Could you tell us about your work in Paileyun?

Zhang Qi: In Paileyun, I am mainly responsible for the development work related to audio and video, including the research of encoding and decoding, audio and video engine, cutting-edge technology, etc. In addition to development and management, I will also be involved in customer contact. Because what we do is TOB enterprise service, technology, products and services are all very important, so we will pay great attention to the docking of technical support and user feedback.

LivevideoStack: We have seen that Pai Le Yun recently launched the industry’s first “online art audio and video solution”, which includes a video correction technology. Could you introduce this technology and the algorithm used in it?

Zhang Qi: In the scene of online art teaching, the content of the video is mostly canvas. If you want to perfectly present the content of the painting and ensure the spatial proportion of the painting, the shooting Angle is very high. You need to aim the camera at the center of the canvas, which is actually very difficult in practical operation. If there is a slight deviation, the spatial geometric relationship of the work will change. In order to decrease the difficulty of users using our support user can freely, to correct the video after the shooting, which need to pay attention to the filming location of cameras and angles, so you need real-time estimation acquisition parameters, and then to solve the geometric transformation matrix, finally, the video processing, at the same time to reduce the delay brought about by a large number of calculations, we also conducted a GPU optimization, Make the whole operation in about 1 millisecond, user experience more smooth.

LiveVideoStack: In the Internet this highway, the most common occurrence is the network congestion, the consequences ofPacket loss, delay and jitter greatly reduce the quality of audio and video. In terms of dealing with network congestion, how does Paileyun do?

Zhang Qi: The main means for Pale Cloud to cope with network congestion is bandwidth prediction, dynamic bit rate adjustment and adaptive FEC, ARQ, PLC and other technologies, which can achieve smooth call in extreme scenarios. At the same time, three loss recovery strategies of forward error correction, lost packet retransmission and lost packet hiding are adopted to deal with congestion. In addition to weak network confrontation, Pano Backbone has also built the global real-time transmission acceleration network of Pano Backbone, which can solve the problem of trans-regional and transnational links, reduce the probability of congestion, and guarantee the communication quality of audio and video.

LivevideoStack: You mentioned earlier that all resistance to the weak net comes at a cost and can also be considered a trade-off of equivalence. What is the cost here?

Zhang Qi: The cost here refers to the damage of other performance brought by the weak network. For example, packet loss on transmission is a random event, and there is no way to know at the time the data is sent whether it will be lost in transmission. Although forward error correction coding has the ability to resist packet loss, it is blind to the protection of packets, which objectively leads to the reduction of transmission efficiency. From this point of view, to judge the anti-weak network capability of the system from a single dimension, we need to consider comprehensively.

LivevideoStack: What other big changes do you think AI technology will bring to the RTC industry in the future?

Zhang Qi: The impact of AI technology on the entire RTC industry is bound to be very profound. Objectively speaking, the development and maturity of AI technology represented by deep learning provides another promising direction for the breakthrough of key audio and video technologies. Some problems that cannot be solved with traditional technology can be greatly reduced by integrating with AI technology. With the help of AI, breakthroughs have been made in both audio and video codec, the core of the RTC industry, and other audio and video processing and enhancement technologies, such as audio noise reduction, video supersegmentation, and object segmentation and recognition, which are currently popular.

LivevideoStack: You graduated from Zhejiang University, majoring in mathematics. What kind of chance led you to embark on the road of audio and video?

Zhang Qi: My master’s research direction is digital image processing. During the three years of study, I benefited from my supervisor Ye Maodong’s advice. Unfortunately, I didn’t have a mature idea at that time and failed to understand the profound meaning of my supervisor Ye Maodong in mathematics. During graduate school, I also participated in the department of information and communication engineering institute Liu Jilin and hsing-kuo wang teacher some research projects, their research is the domestic earlier video codec technology team, technically, affected by Dr Guo-bin Chen brother is bigger, can say they are a few I set foot on audio and development of the people of the road.

LivevideoStack: Having been in the audio-video industry for so many years, can you predict where the next trend in audio-video technology is going to be?

Zhang Qi: Deep learning, virtual reality technology, 3D video and so on. Deep learning has already been answered, so I won’t repeat it here. Virtual reality and 3D video will depend on the maturity of the related hardware technology, and I believe that the day is not far away.

LivevideoStack: I understand that you are an avid reader. If you were asked to recommend a technical book on audio and video and a book on other types of books outside the industry, which two would you recommend?

Zhang Qi: For the first book, I would recommend Write Great Code Volume 2: Thinking Low-level, Writing High-level. The book was written by Randall Hyde, another of his best-known books is _The Art of Assembly Language_. The reason _The Art of Assembly LANGUAGE_ is not recommended is that most students will not have the opportunity to write Assembly code. But just because you don’t have to write assembly code doesn’t mean you don’t have to have some knowledge of it. The efficiency of code execution is a very important metric for audio and video development. However, in practical work, I found that a considerable proportion of students do not pay attention to the efficiency of code execution. Maybe they don’t have the knowledge or the feeling. This is a handy book for anyone who knows assembly language. It helps you look at high-level languages from an assembly perspective and understand how software runs on the CPU to write efficient and beautiful code.

Write Great Code Volume 2: Thinking Low-Level, Writing High-Level

The second book I’d like to recommend is The Spirit of the Middle — The Autobiography of Wu Qingyuan, a famous figure in Go circles known as the Master Showa. He defeated all the best Japanese players, relegating their games to first or first, and remained the dominant player in Japan for more than two decades. But for such a great man, his life was rather poor. In contrast, his spiritual world is so rich that there is nothing but chess. Jin Yong said he admired Fan Li in the past and Wu Qingyuan in the present. This book is the best portrayal of Wu Qingyuan’s spiritual world.

The Spirit in China: An Autobiography of Wu Qingyuan

LivevideoStack: You’ll be attending LivevideoStack in Beijing in September, so what are you looking forward to bringing to the conference?

Zhang Qi: I will share the design practice of Pale Cloud video encoder and some optimization strategies for real-time video system application scene landing.

As we all know, real-time video system has high requirements for time delay, so the video encoder must meet the requirements of real-time. The improvement of ration-distortion performance of modern encoders comes at the cost of increasing complexity. The fragmentation of current applied devices is very serious and the computing ability of the devices is greatly different. These are the challenges that new technologies will face in implementing real-time audio and video systems. So I’d like to share some of the considerations we had when designing real-time video encoders, in terms of balancing complexity and real-time performance.

LiveIdeoStack: : OK, thank you for your interview, look forward to your wonderful speech at 2021LivevideoStack Beijing!

Editor: Alex

Live Idea OstackCON2021 Beijing Station is open for registration!

For more information, please scan the __ QR code in the picture or click __ to read the original __ to learn more about the conference.

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Reducing Internet Congestion and Pursuing Beautiful Experience — Dialogue with Zhang Qi, Chief Scientist of Leyun

Reducing Internet Congestion and Pursuing Beautiful Experience — Dialogue with Zhang Qi, Chief Scientist of Leyun

Related Posts

Deep learning pipeline parallel PipeDream(3)– transformation model

Xie Lingxi, senior researcher at Huawei: Where will the next generation of AI go? Pangu Grand Model exploration tour

From Spark Streaming to Apache Flink: Architecture and practice for the Bilibili Real-time platform