This article is originally shared by iQiyi technical team. The original title is “The design and practice of building a universal WebSocket Push Gateway”, which has been optimized and changed.

1, the introduction

HTTP is a stateless, TCP-based request/response protocol, that is, requests can only be initiated by the client and responded by the server. In most scenarios, this Pull pattern of request/response will suffice. However, in some cases, such as message Push (the most common application scenario in IM, such as IM offline message Push) and real-time notification, data needs to be synchronized to the client in real time, which requires the server to support the ability to actively Push data.

The traditional Web server push technology has a long history, which has experienced the development of short polling, long polling and other stages (see “Beginner’s Entry: Detailed Explanation of the principle of the Most complete Web Instant messaging Technology in history”). It can solve problems to a certain extent, but it also has shortcomings, such as timeliness, resource waste and so on. The WebSocket specification brought by THE HTML5 standard basically ended this situation and became the mainstream solution of the service side message push technology at present.

The integration of WebSocket in the system is very simple, related to the discussion and information is very rich. However, how to implement a general WebSocket push gateway has not yet mature solutions. Currently, cloud service vendors focus on iOS, Android and other mobile terminal push, and lack support for WebSocket. This article shares iQiyi based on Netty WebSocket long connection real-time push gateway practice experience summary.

Learning and communication:

Im/Push Technology Development Exchange 5 groups: 215477170 [Recommended]

Introduction to Mobile IM Development: One Entry is Enough: Developing Mobile IM from Zero

Open source IM framework source:…

(synchronous published in this article:…

2. Thematic catalogue

This article is the fourth in a series. The general contents are as follows:

Long Link Gateway Technology topic (I) : A Summary of production LEVEL TCP Gateway Technology practice of Jingtokyo Wheat

“Long Link Gateway Technology Topic (2) : Zhihu’s High Performance Long Link Gateway Technology Practice of Ten million level Concurrent”

Long Link Gateway Technology Topic (3) : Technological Evolution of Mobile Terminal Access Layer Gateway of Handtao Billion Class

Long Connection Gateway Technology Topic (4) : IQiyi WebSocket Real-time Push Gateway Technology Practice (* article)

Other related technical articles:

Absolute Dry goods: Technical Essentials of Push Service based on Netty to Achieve Mass Access WebSocket Application Practice Sharing based on Netty

Other articles shared by iQiyi Technical Team:

Iqiyi Technology Sharing: Easy and witty, explaining the past, present and future of video codec Technology

“Iqiyi Technology Sharing: IQiyi Android Client Startup Speed Optimization Practice Summary”

Iqiyi Mobile Terminal Network Optimization Practice Sharing: Network Request Success Rate Optimization

3. Technical pain points of the old scheme

Iqiyi is an important component of our content ecology. As a foreground system, it has high requirements for user experience and directly affects the creative enthusiasm of creators.

At present, iQiyi has used WebSocket real-time push technology in several business scenarios, including:

  • 1) User comments: push comments to the browser in real time;
  • 2) Real-name authentication: Before signing the contract, the user needs to undergo real-name authentication. The user scans the TWO-DIMENSIONAL code and accesses the third-party authentication page. After the authentication is complete, the browser is notified of the authentication status asynchronously.
  • 3) Living identification: similar to real-name authentication, when the living identification is completed, the result will be notified to the browser asynchronously. In the actual business development, we found that WebSocket real-time push technology in the use of some problems.

The questions are:

  • 1) First of all, WebSocket technology stack is not unified, both based on Netty implementation, but also based on Web container implementation, which brings difficulties to development and maintenance;
  • 2) Secondly, WebSocket implementations are dispersed in various projects and strongly coupled with business systems. If other businesses need to integrate WebSocket, they will face the dilemma of repeated development, waste cost and low efficiency.
  • 3) Third: WebSocket is a stateful protocol. When the client connects to the server, it only connects to one node in the cluster and communicates with this node during data transmission. WebSocket clusters need to address the issue of session sharing. With a single-node deployment, this problem can be avoided, but there is a single point of risk because it cannot scale horizontally to support higher loads.
  • 4) Finally: lack of monitoring and alarm. Although the number of WebSocket long connections can be estimated by the number of Linux Socket connections, the number is not accurate, and it is impossible to know the number of users and other indicators with business meaning; It cannot be integrated with existing microservice monitoring to achieve unified monitoring and alarm. PS: Limited to the length of this article does not introduce WebSocket technology itself in detail, interested can read WebSocket from entry to mastery, half an hour is enough! .

4. Technical objectives of the new scheme

As shown in the previous section, to solve the problems of the old solution, we need to implement a unified WebSocket long-connection real-time push gateway.

The new gateway needs to have the following features:

  • 1) Centralized implementation of persistent connection management and push capability: unified technology stack, taking persistent connection as basic capability precipitation, facilitating function iteration, upgrade and maintenance;
  • 2) Decoupling with business: the business logic is separated from the long-connected communication, so that the business system no longer cares about the communication details, and avoids repeated development and wasted RESEARCH and development costs;
  • 3) Easy to use: HTTP push channel is provided to facilitate access of various development languages. The business system only needs a simple call to realize data push and improve research and development efficiency;
  • 4) Distributed architecture: realize multi-node cluster, support horizontal expansion to meet the challenges brought by business growth; Node breakdown does not affect the overall service availability, ensuring high reliability.
  • 5) Multi-terminal message synchronization: Allow users to log in online using multiple browsers or tabs at the same time to ensure synchronous sending of messages;
  • 6) Multidimensional monitoring and alarm: the customized monitoring indicators can be connected with the existing micro-service monitoring system, so that problems can be reported in time to ensure the stability of the service.

    5. Technical selection of new schemes

    In many WebSocket implementation, from the performance, scalability, community support and other aspects of consideration, the final choice of Netty. Netty is a high-performance, event-driven, asynchronous, non-blocking network communication framework widely used in many well-known open source software.

If you don’t know much about Netty, read the following two articles:

  • “History of the most popular Netty entry long article: Basic Introduction, Environment building, Hands-on Combat”
  • WebSocket is stateful. It cannot be load balanced in a cluster like direct HTTP. After a long connection is established, it maintains a session with a server node, so it is difficult to know which node the session belongs to.

There are generally two technical solutions to solve the above problems:

  • 1) One is to use a microservice-like registry to maintain global session mappings;
  • 2) One is to use event broadcast to determine whether each node has a session. The comparison of the two schemes is shown in the following table. WebSocket cluster solution:

Considering the implementation cost and cluster size, a lightweight event broadcast scheme is chosen.

The implementation of broadcast can choose rocketMQ-based message broadcast, Redis-based Publish/Subscribe, zooKeeper-based notification and other schemes, the advantages and disadvantages of the comparison of the following table. In terms of throughput, real-time, persistence, and implementation difficulty, RocketMQ was chosen.

Comparison of broadcast implementation schemes:

6. Realization of the new scheme

6.1 System Architecture The overall architecture of the gateway is as follows:

The overall process of the gateway is as follows:

1) The client shakes hands with any node of the gateway to establish a long connection, and the node adds it to the long connection queue maintained in memory. The client periodically sends heartbeat messages to the server. If the client does not receive heartbeat messages within the specified period, the long connection between the client and the server is disconnected. The server closes the connection and clears the session in the memory.

2) When the service system needs to push data to the client, it sends the data to the gateway through the HTTP interface provided by the gateway.

3) After receiving the push request, the gateway writes the message to RocketMQ.

4) As a consumer, the gateway consumes the message in broadcast mode, and all nodes receive the message.

5) After receiving the message, the node determines whether the target of the pushed message is in the long connection queue maintained in its own memory. If it exists, the node pushes the data through the long connection; otherwise, it directly ignores it.

The gateway is a multi-node cluster. Each node is responsible for a part of the long connection, which can realize load balancing. When faced with massive connections, the gateway can also share the pressure by adding nodes to achieve horizontal expansion.

In addition, when a node breaks down, the client attempts to establish a long-term connection with another node by shaking hands to ensure overall service availability.

6.2 Session Management After a WebSocket long connection is established, sessions are maintained in the memory of each node. The SessionManager component manages the session and maintains the relationship between UID and UserSession internally using a hash table.

UserSession represents the session of the user dimension. A user may establish multiple persistent connections at the same time. Therefore, UserSession also uses a hash table to maintain the relationship between Channel and ChannelSession.

To prevent users from creating unlimited long connections, UserSession closes the earliest ChannelSession when the number of channelsessions exceeds a certain amount to reduce server resource usage. The following figure shows the relationship between SessionManager, UserSession, and ChannelSession.

Our SessionManager components:

6.3 Monitoring and Alarm The gateway provides basic monitoring and alarm capabilities in order to know how many long connections are established and how many users are included in the cluster.

The gateway is connected to Micrometer, exposing the number of connections and the number of users as user-defined indicators for Prometheus to collect, thus connecting with the existing microservice monitoring system.

You can easily view the number of connections, number of users, JVM, CPU, memory and other metrics in Grafana to understand the current service capacity and stress of the gateway. Alarm rules can also be configured in Grafana to trigger odd message (internal alarm platform) alarms when data is abnormal.

7. Performance pressure test of the new scheme

Pressure test preparation:

1) Select two 4-core 16G VMS as the server and client respectively for the pressure test; 2) During the pressure test, 20 ports were opened for the gateway and 20 clients were established; 3) Each client uses a server port to establish 50,000 connections, and can create millions of connections simultaneously. The number of connections (millions) and memory usage are shown in the figure below:

To send one message to one million persistent connections at the same time, the server uses a single thread to send, and the average time to complete the sending is about 10 seconds, as shown in the figure below.

Server push time:

Generally, the long connections established by the same user at the same time are in the single digits. Take 10 long connections as an example. When the number of concurrent connections is 600 and the duration is 120s, the TPS of the push interface is about 1600+, as shown in the following figure.

Long connection 10, concurrent 600, duration 120s pressure data:

The current performance indicators meet our actual business scenarios and support future business growth.

8. Practical application case of the new scheme

In order to illustrate the optimization effect more vividly, at the end of the article, we also take the cover image to add the filter effect as an example to introduce a case of iQiyi using the new WebSocket gateway solution.

Iqiyi can choose to add a filter effect to the cover image when the video is published on the we-media to guide users to provide a better cover.

When a user selects a cover image, an asynchronous background processing task is submitted. After the asynchronous task is processed, the images with different filter effects are returned to the browser through WebSocket. The following figure shows the service scenario.

In terms of r&d efficiency, if you integrate WebSocket into the business system, it will take at least 1-2 days to develop.

If the new WebSocket gateway push ability is directly used, only a simple interface call is needed to achieve data push, development time is reduced to minutes level, research and development efficiency is greatly improved.

In terms of operation and maintenance costs, the business system no longer contains communication details irrelevant to business logic, the maintainability of code is stronger, the system architecture is simpler, and the operation and maintenance costs are greatly reduced.

9. Put it at the end

WebSocket is the mainstream technology to realize server push. Proper use of WebSocket can effectively provide system response and improve user experience. The WebSocket long connection to the gateway can quickly increase the data push capability for the system, effectively reduce the operation and maintenance costs, and improve the development efficiency.

The value of a long-connected gateway is:

  • 1) It encapsulates WebSocket communication details and is decoupled from the business system, so that long-connected gateway and business system can be independently optimized and iterated to avoid repeated development and facilitate development and maintenance;
  • 2) The gateway provides an easy-to-use HTTP push channel and supports access of multiple development languages to facilitate system integration and use;
  • 3) The gateway adopts the distributed architecture, which can realize the horizontal capacity expansion, load balancing and high availability of services;
  • 4) The gateway integrates monitoring and alarm. When the system is abnormal, it can give early warning in time to ensure the health and stability of the service. At present, the new WebSocket long connection real-time gateway has been used in iQiyi image filter result notification, MCN electronic signature and other business scenarios.

There are many areas to explore in the future, such as message resend and ACK, WebSocket binary data support, multi-tenant support, etc.

Appendix: More technical information

[1] The Development of Instant Messaging on the WEB

“Beginner post: The history of the most complete Web Instant messaging technology principle detailed explanation”

Instant Messaging on the Web: Short Polling, Comet, Websocket, SSE

SSE Technology: A new HTML5 Server push Event technology

Comet Technology: Real-time Communication Technology on the Web Side based on HTTP Long Connection

Quick Start: WebSocket Tutorial

“WebSocket in detail (a) : a preliminary understanding of WebSocket technology”

WebSocket in detail (2) : Technical principles, code demonstration and application cases

WebSocket In Detail (3) : Deep WebSocket Communication Protocol details

WebSocket in Detail (4) : Probing into the Relationship between HTTP and WebSocket (Part 1)

WebSocket in Detail (5) : The Relationship between HTTP and WebSocket (Part 2)

WebSocket Details (6) : Probing into the Relationship between WebSocket and Socket

Practice and Ideas of Realizing Message Push with Socket. IO

“LinkedIn’s Web-based Im Practice: Hundreds of thousands of Long Connections on A Single Server.”

The Development of Web Instant Messaging Technology and The Technical Practice of WebSocket and Socket. IO

“Web instant messaging security: A detailed explanation of cross-site WebSocket hijacking Vulnerability (with sample code)”

“Practice of Open Source Framework Pomelo: Building a High-performance Distributed IM Chat Server on the Web”

“Using WebSocket and SSE Technology to Achieve Web Side Message Push”

“The Evolution of Web Communication: From Ajax and JSONP to SSE and Websocket”

Why does mobileIMSDK-Web’s network Layer framework use Socket. IO instead of Netty?

“Theory to Practice: Understanding WebSocket Communication Principle, Protocol Format, security from Zero”

“Wechat small program how to use WebSocket to achieve a long connection (including complete source code)”

WebSocket Protocol: Quick Answers to WebSocket Hot Questions

Web Im Practice Tips: How to get Your WebSocket to reconnect faster?

WebSocket from Beginner to Master in half an hour!

WebSocket Core: 200 Lines of Code

More similar articles…

[2] Push Technology:

A complete Android Push Demo based on MQTT communication Protocol

Android Message Push: The pros and cons of GCM, XMPP and MQTT

Analysis of Real-time Message Push Technology on Mobile Terminal

Absolute Dry goods: Technical Essentials of Push Service based on Netty to Achieve Mass Access

Technical Practice Sharing of Large-scale and High Concurrency Architecture of Aurora Push System

Technical Practice Sharing of Meizu’s real-time Message Push Architecture with 25 million Long Connections

Interview with The Architect of Meizu: Experience on the Real-time Message Push System with Massive Long Connections

Message Push Practices for Hybrid Mobile Applications Based on WebSocket (with code examples)

A Secure and Scalable Subscription/Push Service based on persistent Connections

Practice Sharing: How to build a Highly available Mobile Message Push System?

“Practice of Building a Ten-million-level Online High-concurrency Message Push System with Go Language (from 360 Company)”

“Tencent Carrier pigeon Technology Sharing: Practical Experience of Real-time Message Push with ten billion levels”

“The Practical Road of Real-time Push Technology of Meipai Live Live Barrage System of Meipai Online”

The Evolution of Message Push Architecture of Jingtokyo Mai Merchant Open Platform

Tech Dry Stuff: How to Design a Million-level Message Push System from Scratch

Long Connection Gateway Technology (4) : IQiyi WebSocket Real-time Push Gateway Technology Practice

More similar articles…

This post has been posted on the “Im Technosphere” official account.

▲ The link of this article on the official account is: click here to enter. Release link is:…