The long connection and the heartbeat

introduce

Long connection First of all, the connection refers to the connection established by three-way handshake using TCP protocol at the network transport layer. A long connection means that the established connection is maintained for a long time, regardless of whether data packets are sent. There are both long and short connections. Short connections are established when both parties have data to send. After several requests are sent, the two parties actively or passively disconnect the connection.
Heartbeat Heartbeat, like human heartbeat, is used to check whether a system is alive or whether the network link is normal. In general, heartbeat packets are sent to the detected system periodically. The detected system responds to the heartbeat packets.

Heart and long together is the introduction of the reason is, the heart can keep alive for long connection function, can detect long connection whether normal (here keep alive not understand simply to ensure alive, in particular should be once the link is dead, and is not available, to know as soon as possible, then do the other high availability measures, to ensure the normal operation of the system).

advantage

Advantages of long connections

Reduce connection is established process of time everyone know the TCP connection is established to three-way handshake, three-way handshake also said three interaction is needed to establish a connection channel, in the same city is about ms level of delay between machines, impact is not big, if it is in Beijing and Shanghai room, walk the shuttle time takes about 30 ms, if you use a long connection, This optimization is still quite impressive.
Facilitate the realization of push data data interaction – push mode is the premise of the network long connection, with long connection, the two ends of the connection is very convenient to push data each other, to interact.

doubt

What exactly is a TCP connection? A TCP connection is not a physical connection but a logical connection established through three-way handshake for reliable data transmission. Both communication parties need to maintain such connection status information. For example, Netstat often sees that the connection state is ESTABLISHED, indicating that the connection is in the ESTABLISHED state. (Note that the ESTABLISHED connection is only in a state that the operating system thinks is currently connected.)
Do you have a long connection and you can rest easy? After a long connection is established, both operating systems maintain the connection status. Is it possible to send data to the peer end? The answer is no. At this time, the link may be disconnected, but the TCP layer has not sensed this information. The status displayed at the operating system layer is still the connection state. In addition, the TCP layer considers the connection ESTABLISHED, so the application layer cannot detect the current link failure. What problems can this cause? If there is a want to transmit data at this time, apparently, the data cannot be transferred to the side, but the TCP protocol in order to guarantee the reliability, retransmission request, if the problem is just cable connector is loose, lead to network impassability, as if in a timely manner will be connected and the cable joint data or can reach to the end of normal, and TCP connection is ESTABLISHED, There won’t be any change. But not all the time, sometimes a link is down, or the host is down, the system is abnormally shut down and so on. It’s dangerous if the application doesn’t sense it.
How to keep long connection alive? In the TCP implementation, there is a KeepAlive mechanism, that is, the KeepAlive mechanism of TCP (this mechanism is not included in the TCP protocol specification and is implemented by the operating system). After the KeepAlive mechanism is enabled, the KeepAlive mechanism lasts for a certain period of time (generally 7200s). Parameter tcp_keepalive_time) In the case of no data transmission on the link, the TCP layer will send the corresponding KeepAlive probe to determine the connection availability, and retry 10 times (parameter tcp_keepalive_probes) after the probe fails. The current connection is considered unavailable only after all probes fail at an interval of 75 seconds (tcp_keepalive_intvl). These parameters are machine level and can be adjusted.
Does the application layer need to do something? According to the TCP KeepAlive mechanism, the default parameters cannot meet the requirements. Is it ok to turn it down? Adjust parameters, of course, is useful, but the first parameters of the machine level, it is not convenient to adjust, change the machine also have to remember to adjust parameters, for the use of the system, rather increased maintenance costs, and is likely to forget; Secondly, the KeepAlive mechanism works only when the link is idle. If data is sent and the physical link is disconnected, what happens when the link state of the operating system is still ESTABLISHED? The TCP retransmission mechanism is used naturally. The default TCP timeout retransmission and the exponential backout algorithm are also lengthy processes. Therefore, for a reliable system, the survival of long connections must depend on the heartbeat of the application layer to ensure. The heartbeat of the application layer is taken as an example. For example, the client sends a heartbeat request to the server through the long connection channel every 3s and disconnects after five consecutive failures. After 15 seconds, you can find that the connection is unavailable. If the connection is unavailable, you can reconnect it or perform other failover operations, such as requesting another server. Application layer heartbeat and benefit, such as a server for some reason lead to high load, CPU spike, or thread pool played, and so on, can’t respond to any business, if the mechanism can not find any problems of using TCP, but for the client, then the best choice is to break even after to connect to other server, Instead of always thinking the current server is available and sending requests to the current server that are bound to fail.

Design error

No heartbeat No heartbeat design, is also very common, to save trouble, long connection disconnection, TCP transport layer has a notification, as long as the application process this notification, once found abnormal connection, reconnect. But such notifications can come very late, in cases where the machine crashes, applications exit unexpectedly, links fail, and so on.
Heartbeat detection by the connected party Heartbeat detection consists of heartbeat sending and heartbeat detection. The heartbeat can be sent by either party or both parties. However, heartbeat detection is secure only on the end that initiates the connection. The connection is disconnected, reconnected, or reconnected to another server only when the initiator detects the heartbeat and knows that the link is faulty. For example, when the client attempts to connect to the server, the client periodically sends heartbeat messages to the server. When the server detects that the client does not receive heartbeat messages for a period of time, the server considers that the link to the client is faulty or the client itself is faulty. At first glance, it seems that there is no problem, but if only the link between the client and the current server is faulty, as a highly available system, should there be another server as an alternative? The problem is that in a short period of time, the client does not know that there is a problem with the first server. Therefore, it does not actively connect to the second server.
Third party heartbeat There is also a type of heartbeat that uses third party keepalive, that is, in addition to the client and server, there is another machine that periodically sends heartbeat to detect the survival of the server. This kind of probing method can be used to check whether the system is alive or not, but this kind of design cannot be used to check whether the link between client and server is good or not.

The reference solution

The simplest policy in Scheme 1 is that the client periodically sends heartbeat packets within n seconds. After receiving heartbeat packets, the server responds to the heartbeat packets of the client. If the client does not receive heartbeat packets for m consecutive seconds, the server disconnects from the client and then reconnects to the server to stop sending normal service requests.
Scheme two may feel that this is not to send some invalid data packets a little too much, can be optimized, to tell the truth, personally think in fact not much. Is, of course, you can do some optimization, because the heart is a kind of detecting the request, the normal request on the business in addition to do business with, can also be used as a detection function, such as a request needs to be sent to the server at this time, the request can be regarded as a heartbeat, the server receives the request, reply after processing, as long as the server has a reply, that link or pass, If there is no reply from the server when the client requests are idle, the server uses heartbeat for detection. In this way, normal requests are effectively used as heartbeat functions and invalid data transmission is reduced.

If you feel that you have gained something after reading it, please click “like”, “follow” and add the official account “Ingenuity Zero” to check out more wonderful history!!

The long connection and the heartbeat

introduce

advantage

doubt

Design error

The reference solution

Related Posts

Micro service fault-tolerant components Hystrix design analysis | August more challenges

After seeing off my good friend, Aobin will talk with you about the topic of resignation

(Practical) Linux Sending mail (Mailx)