The author of this article is Ma Yingying, a web front-end development engineer of netease Intelligent Enterprise. In order to improve the quality of the content, revisions and changes are made.

1, the introduction

WebSocket is an extremely critical link in a complete IM application. It provides a full-duplex communication mechanism for web-based IM applications. However, in order to improve the real-time and reliability of messages in practical application scenarios such as IM, we need to overcome the instability of WebSocket and its underlying dependence on TCP connections for complex networks. Therefore, IM developers usually need to design a complete set of connection preservation, validation and network fragmentation reconnection schemes.

In terms of reconnection after disconnection, the reconnection response speed will seriously affect the “immediacy” of the upper-layer application and user experience. Imagine a minute after opening the network, wechat network can not immediately perceive the recovery of socket connection, can not receive and receive instant chat messages, is it a crash?

Therefore, in the complex network scenario, how to perceive network changes more immediately and quickly, and quickly restore the availability of WebSocket, it becomes particularly important. Based on the author’s development practice, this article will share WebSocket in different states, different network states, how to achieve fast disconnection and reconnection.

! [](https://img2020.cnblogs.com/blog/1834368/202008/1834368-20200805152908413-668907636.jpg)

*** This article is suitable for developers who have experience in the actual development of IM underlying network, or have a deeper understanding of the underlying network implementation. If you don’t know much about the underlying network, it is recommended to skip this article and read the basics in the appendix at the end of the web article before coming back.

*** Content comments: ** This article content is not tall, but more dry goods, practical higher, the content is also very popular, it is recommended to read in detail. Although this article is about WebSocket, but the idea can be extended to the same technology based on TCP protocol.

This article has been simultaneously published in the “instant Messaging Technology Circle” public account, welcome to follow:

! [](https://img2020.cnblogs.com/blog/1834368/202008/1834368-20200805152919117-416007837.png)

▲ The link of this article on the public account is: click here to enter, the original link is: www.52im.net/thread-3098…

2. Preliminary knowledge

What I’m going to share with you is a practical summary, so if you’re not sure what you know about instant messaging on the Web, make sure you read the following: Beginner’s Post: The Most Comprehensive Explanation of The Principles of Instant Messaging on the Web, and Instant Messaging On the Web: Short Polling, Comet, Websocket, and SSE.

Due to space limitations, this article will not go into the technical details of WebSocket. If you are interested, please learn systematically:

  • Quick Start: WebSocket Tutorial
  • “WebSocket in detail (a) : a preliminary understanding of WebSocket technology”
  • WebSocket in detail (2) : Technical principles, code demonstration and application cases
  • WebSocket In Detail (3) : Deep WebSocket Communication Protocol details
  • WebSocket in Detail (4) : Probing into the Relationship between HTTP and WebSocket (Part 1)
  • WebSocket in Detail (5) : The Relationship between HTTP and WebSocket (Part 2)
  • WebSocket Details (6) : Probing into the Relationship between WebSocket and Socket

3. Learn WebSocket quickly

Websocket was born in 2008, became an international standard in 2011, and is now supported by all browsers (see Quick Start: Websocket Tutorial for Beginners). It is a brand new application layer protocol, is specially designed for web client and server side of the real full-duplex communication protocol, can be likened to HTTP protocol to understand webSocket protocol.

! [](https://img2020.cnblogs.com/blog/1834368/202008/1834368-20200805152927662-1755764013.jpg)

(Picture from “WebSocket In Detail (4) : Probe into the Relationship between HTTP and WebSocket (Part 1)”)

Their differences:

  • 1) HTTP protocol identifier is HTTP, WebSocket is WS;
  • 2) HTTP request can only be initiated by the client, the server can not actively push messages to the client, but WebSocket can;
  • 3) HTTP requests have the same origin restriction, and communication between different sources needs to cross domains, while WebSocket does not have the same origin restriction.

Their similarities:

  • 1) Both are communication protocols at the application layer;
  • 2) The default port is the same, 80 or 443;
  • 3) Both can be used for communication between the browser and the server;
  • 4) Both are based on TCP.

A diagram of the two and TCP:

! [](https://img2020.cnblogs.com/blog/1834368/202008/1834368-20200805152933765-1620007406.jpg)

(Image from Quick Start: WebSocket Tutorial)

For more information on the relationship between Http and WebSocket, read:

  • WebSocket in Detail (4) : Probing into the Relationship between HTTP and WebSocket (Part 1)
  • WebSocket in Detail (5) : The Relationship between HTTP and WebSocket (Part 2)

The relationship between WebSocket and Socket can be read in detail: “WebSocket in detail (six) : Probing the relationship between WebSocket and Socket”.

4. The WebSocket reconnect process is disassembled

The first question to consider is, when do I need to reconnect?

The most obvious thing to think about is that the WebSocket connection is broken and we need to initiate the connection again in order to be able to send and receive messages.

But in many cases, even if the WebSocket connection is not broken, it is actually unavailable.

For example:

  • 1) Device switching network;
  • 2) The intermediate route of the link breaks down (common sense is that there are many routing devices on the network path corresponding to a socket connection);
  • 3) The front exit of the link is not available (for example, in the home WiFi, the network connection is normal, but the broadband of the actual operator is overdue and is shut down);
  • 4) The server load continues to be too high to respond.

In these scenarios, the WebSocket is not disconnected, but for the upper layer, it is not normal to send and receive data.

So before reconnecting, we need a mechanism to sense whether the connection is available, whether the service is available, and quickly so that we can quickly recover from the inavailability.

Once the connection is perceived to be unavailable, it is time to discard and disconnect the old connection, and then initiate a new connection. These two steps seem simple, but if you want to do it quickly, it’s not that easy.

** First: ** is to disconnect the old connection, for the client, how fast disconnect? The protocol stipulates that the client must negotiate with the server before disconnecting the WebSocket connection. However, when the client cannot contact the server or negotiate with the server, how to disconnect and quickly recover?

** Second: ** is quick to initiate new connections. This speed is not faster than that. This speed does not mean that the connection is initiated immediately, which will bring unpredictable impact on the server. During reconnection, some retreating algorithms are used to delay the reconnection for a period of time. But what’s the trade-off between reconnection intervals and performance costs? How do I quickly initiate a connection “at the right time”?

With that in mind, let’s take a closer look at these three processes:

! [](https://img2020.cnblogs.com/blog/1834368/202008/1834368-20200805152940295-1736938666.jpg)

5. Quick reconnection Key 1: Quickly sense when reconnection is needed

5.1 scenario

Scenarios requiring reconnection can be subdivided into three types:

  • 1) The connection is clearly disconnected;
  • 2) The connection is not broken but unavailable;
  • 3) The peer service is unavailable.

** For the first scenario: ** This is easy, the connection is directly disconnected, it must be reconnected.

For the latter two: Whether the connection is unavailable or the service is unavailable, the upper-layer applications cannot send and receive instant messages.

5.2 Heartbeat Packets Actively detect network availability

So from this point of view, a simple and rough way to know when a connection needs to be reconnected is through a heartbeat timeout: send a heartbeat packet, and if no server return packet is received after a certain period of time, the service is considered unavailable, as shown in the left (the most direct method).

! [](https://img2020.cnblogs.com/blog/1834368/202008/1834368-20200805152946310-1344930940.jpg)

Well, if you want to feel it quickly, you have to have multiple heart packets, faster heart rate. However, too fast a heartbeat will consume too much traffic and power on the mobile terminal, so this method cannot be used to achieve rapid perception. It can be used as a bottom line mechanism to detect the availability of connections and services.

5.3 Passively Listening for Network Status Changes

In addition to heartbeat detection, network status can also be determined to detect connection unavailability. Because network disconnection, wifi switchover, and network switchover are the most direct causes of connection unavailability, so when the network status changes from offline to online, reconnection is required in most cases, but not necessarily. Because WebSocket is based on TCP, TCP connections are not sensitive to network changes at the application layer. Therefore, even if the network is disconnected for a short period of time, WebSocket connections are still able to communicate normally after the network is restored.

Therefore, when the network is disconnected from the connection, you can immediately determine whether the connection is available by sending a heartbeat packet. If the heartbeat back packet from the server is received normally, the connection is still available. If the heartbeat back packet is not received after timeout, you need to reconnect, as shown on the right of the figure above. The advantage of this method is that it is fast. After the network is restored, it can immediately detect whether the connection is available. If the connection is not available, it can quickly recover.

5.4 summary

To sum up:

  • 1) The periodic heartbeat packet detection scheme is stable and can cover all scenarios, but the speed is not immediate (the heartbeat interval is fixed);
  • 2) The scheme to judge the network status is fast, without waiting for the heartbeat interval, and is sensitive, but the coverage scenario is limited.

Therefore, we can combine two solutions:

  • 1) Timed heartbeat packets are sent at a not too fast frequency, such as 40s/ time, 60s/ time, etc., depending on the application scenario;
  • 2) Immediately send a heartbeat message when the network status changes from offline to online to check whether the current connection is available. If not, immediately restore the connection.

In this way, in most cases, the upper-layer application communication can be quickly recovered from the unavailable state. In a few scenarios, timed heartbeat can be used as a backstop, and it can be recovered within a heartbeat cycle.

6, fast reconnect key 2: fast disconnect the old connection

In general, the old connection should be disconnected if it still exists before the next connection is initiated.

The purpose of this:

  • 1) First, it can release the resources of the client and server;
  • 2) To avoid later error from the old connection to send and receive data.

We know that WebSocket transmits data based on THE TCP protocol, and the two ends of the connection are the server and the client respectively. The TIME_WAIT state of TCP is maintained by the server. Therefore, in most normal cases, the server should initiate the disconnection of the underlying TCP connection, rather than the client.

In other words:

  • 1) If the server is instructed to disconnect WebSocket, it should immediately initiate a TCP disconnection.
  • 2) If the client is instructed to disconnect the WebSocket, it should signal the server and wait for the underlying TCP connection to be disconnected by the server or until it times out.

If the client wants to disconnect from the old WebSocket, the WebSocket connection is available and unavailable.

The details are as follows:

  • 1) When the old connection is available, the client can directly send a disconnection signal to the server, and then the server initiates the disconnection.
  • 2) When the old connection is unavailable, for example, the client switches to wifi and sends a disconnection signal, but the server cannot receive it. The client can only wait until the disconnection times out.

The timeout disconnect process is relatively long, so is there any way to disconnect faster?

The upper-layer application cannot change the protocol rule that only the server initiates the disconnection. Therefore, the application logic can only be started. For example, the upper-layer service logic ensures the complete failure of the old connection and the simulated connection disconnection, and then initiates a new connection to restore the communication.

This approach is the equivalent of trying to disconnect the old connection if it doesn’t work, discarding it and quickly moving on to the next process, so be sure to use it to ensure that the old connection has completely failed in business logic.

Such as:

  • 1) Ensure that all data received from the old connection is discarded;
  • 2) Old connections must not prevent new connections from being established
  • 3) New connections and upper-layer business logic cannot be affected after the old connection is disconnected due to timeout.

7, fast reconnect key 3: quickly initiate a new connection

IM the development experience of students should be aware, met due to network reasons causing reconnection, is absolutely can’t immediately initiate a new connection, or when a network jitter, all the equipment will be immediately launched an connection to the server at the same time, this is a hacker through a large number of requests consume network bandwidth caused by denial of service attacks, This is a disaster for the server (i.e., server-side avalanche).

Therefore, some retreating algorithms are usually used to initiate reconnection after a period of time, as shown in the procedure on the left of the following figure.

! [](https://img2020.cnblogs.com/blog/1834368/202008/1834368-20200805152953778-1519829453.jpg)

What if you want a quick connection? The most direct approach is to shorten the retry interval. The shorter the retry interval, the faster communication can be restored after the network is restored. However, too frequent retries can cause serious damage to performance, bandwidth, and power consumption.

What’s the best trade-off?

  • 1) A more reasonable way is to gradually increase the retry interval as the number of retries increases;
  • 2) On the other hand, monitor the changes of the network. When the network status changes from offline to online, which is more likely to be reconnected, appropriately reduce the reconnection interval.

In the second (2) scheme above, as shown on the right side of the figure above, the reconnection interval increases with the number of retries. These two ways are more reasonable when used together.

In addition, you can also combine with the business logic, according to the possibility of successful reconnection to adjust the appropriate interval, such as the network is not connected or the application in the background of the reconnection interval can be increased, the network can be properly reduced, and so on, to speed up the reconnection speed.

8. Summary of this paper

So just to sum up.

The WebSocket disconnection and reconnection logic is divided into three steps:

  • 1) Determine when reconnection is required;
  • 2) Disconnect the old connection;
  • 3) Initiate a new connection.

Then it analyzes how to complete the three steps quickly in different WebSocket states and different network states.

The specific summary of the process is:

  • 1) First of all: check whether the current connection is available by sending heartbeat packets regularly, and monitor the network recovery event. After the recovery, send a heartbeat immediately to quickly perceive the current status and determine whether to reconnect.
  • 2) Second: Under normal circumstances, the server disconnects the old connection. When the connection is lost, the old connection is directly discarded and the upper layer simulation is disconnected to realize fast disconnection;
  • 3) Finally: When initiating a new connection, use the back-off algorithm to delay the initiation of the connection for a period of time. At the same time, considering the waste of resources and the speed of reconnection, the reconnection interval can be increased when the network is offline, and the reconnection interval can be reduced when the network is normal or the network changes from offline to online, so that the network can be reconnected as soon as possible.

The above is how I achieve WebSocket fast reconnection technology to share, welcome to leave a message and I discuss.

9. References

[1] RFC 6455 documentation

[2] WebSocket Quick Start: WebSocket tutorial

[3] WebSocket: HTTP and WebSocket

[4] WebSocket: HTTP and WebSocket

[5] WebSocket In detail (6) : The relationship between WebSocket and Socket

Appendix: More instant messaging on the Web

“Beginner post: The history of the most complete Web Instant messaging technology principle detailed explanation”

Instant Messaging on the Web: Short Polling, Comet, Websocket, SSE

SSE Technology: A new HTML5 Server push Event technology

Comet Technology: Real-time Communication Technology on the Web Side based on HTTP Long Connection

Practice and Ideas of Realizing Message Push with Socket. IO

“LinkedIn’s Web-based Im Practice: Hundreds of thousands of Long Connections on A Single Server.”

The Development of Web Instant Messaging Technology and The Technical Practice of WebSocket and Socket. IO

“Web instant messaging security: A detailed explanation of cross-site WebSocket hijacking Vulnerability (with sample code)”

“Practice of Open Source Framework Pomelo: Building a High-performance Distributed IM Chat Server on the Web”

“Using WebSocket and SSE Technology to Achieve Web Side Message Push”

“The Evolution of Web Communication: From Ajax and JSONP to SSE and Websocket”

Why does mobileIMSDK-Web’s network Layer framework use Socket. IO instead of Netty?

“Theory to Practice: Understanding WebSocket Communication Principle, Protocol Format, security from Zero”

“Wechat small program how to use WebSocket to achieve a long connection (including complete source code)”

WebSocket Protocol: Quick Answers to WebSocket Hot Questions

Quick Introduction to Electron: The next Generation of Web-based, Cross-platform Desktop Technology

Understanding The Evolution of Front-end Technology: A 20-year History of The Web Front-end

“Web side instant messaging basic knowledge remedial lesson: an understanding of cross-domain all the problems!”

Web Im Practice Tips: How to get Your WebSocket to reconnect faster?

More similar articles…

(This article was published simultaneously at www.52im.net/thread-3098…)