There is a heartbeat between the Dubbo client and the Dubbo server to maintain a long link between the provider and consumer. Initiated by the Dubbo client, see Dubbo source code HeartbeatTimerTask and ReconnectTimerTask.

When it comes to RPC, TCP communication is inevitable, and the mainstream RPC framework relies on Netty and other communication frameworks. At this time, we should consider whether to use long connection or short connection. The mainstream RPC framework will pursue performance and choose to use long connections, so how to keep the connection alive is an important topic, this article will focus on this to introduce the survival strategy.

KeepAlive mechanism

Communication in Dubbo is based on TCP, which itself does not distinguish between long and short connections.

  • In a short connection, the Socket is created for each communication. When that communication ends, socket.close() is called and the connection needs to be re-created for the next communication. The advantage is that there is no need to manage the connection, no need to live the connection; The downside is that it takes time to create a connection each time.

  • In the long connection, after each communication, the connection will not be closed, so that the connection can be reused to ensure the performance; The advantage of a long connection is that it saves the time needed to create a connection. The disadvantage is that connections need to be managed uniformly and need to be kept alive.

KeepAlive is not part of TCP, but most operating systems implement it. If there is no data transfer over the link for a certain period of time, The TCP layer will send the corresponding Keepalive probe to determine the connection availability. Keepalive has several kernel parameters:

  • Tcp_keepalive_time: indicates the length of time during which no probe request is sent. The default value is 7200s (2h).
  • Tcp_keepalive_probes: the default value is 10.
  • Tcp_keepalive_intvl: The default retry interval is 75 seconds.
Dubbo heartbeat mechanism

The KeepAlive mechanism is often not enough. It also needs to be used in conjunction with the heartbeat mechanism. The heartbeat mechanism simply means that the client starts a timer to send requests periodically. The server responds to the request. If no response is received for several times, the client considers that the connection has been disconnected and can disconnect the half-opened connection or reconnect.

Next we use Dubbo source code to analyze how Dubbo’s heartbeat mechanism is implemented.

  • The first step is to know the entry to the provider binding and consumer connection:
public class HeaderExchanger implements Exchanger {

    public static final String NAME = "header";

    @Override
    public ExchangeClient connect(URL url, ExchangeHandler handler) throws RemotingException {
        return new HeaderExchangeClient(Transporters.connect(url, new DecodeHandler(new HeaderExchangeHandler(handler))), true);
    }

    @Override
    public ExchangeServer bind(URL url, ExchangeHandler handler) throws RemotingException {
        returnnew HeaderExchangeServer(Transporters.bind(url, new DecodeHandler(new HeaderExchangeHandler(handler)))); }}Copy the code
  • Consider the first scheduled task: sending a heartbeat requestHeartbeatTimerTask#doTask
@Override
protected void doTask(Channel Channel) {try {// lastRead time Long lastRead = lastRead(Channel); // lastWrite time Long lastWrite = lastWrite(channel); // Heartbeat sends a heartbeat message if the last read interval or last write interval is longer than heartbeat. // Heartbeat defaults to 60sif((lastRead ! = null && now() - lastRead > heartbeat) || (lastWrite ! = null && now() - lastWrite > heartbeat)) { Request req = new Request(); req.setVersion(Version.getProtocolVersion()); req.setTwoWay(true);
            req.setEvent(HEARTBEAT_EVENT);
            channel.send(req);
            if (logger.isDebugEnabled()) {
                logger.debug("Send heartbeat to remote channel " + channel.getRemoteAddress()
                        + ", cause: The channel has no data-transmission exceeds a heartbeat period: "
                        + heartbeat + "ms");
            }
        }
    } catch (Throwable t) {
        logger.warn("Exception when heartbeat to remote channel "+ channel.getRemoteAddress(), t); }}Copy the code
  • Take a look at the second scheduled task: dealing with reconnections and disconnectionsReconnectTimerTask#doTask
@Override
protected void doTask(Channel Channel) {try {// lastRead time Long lastRead = lastRead(Channel); Long now = now(); // The link cannot be initializedif(! channel.isConnected()) { try { logger.info("Initial connection to " + channel);
                ((Client) channel).reconnect();
            } catch (Exception e) {
                logger.error("Fail to connect to "+ channel, e); } // If the last read interval is greater than idleTimeout, retry // default idleTimeout time 60s * 3}else if(lastRead ! = null && now - lastRead > idleTimeout) { logger.warn("Reconnect to channel " + channel + ", because heartbeat read idle time out: "
                    + idleTimeout + "ms");
            try {
                ((Client) channel).reconnect();
            } catch (Exception e) {
                logger.error(channel + "reconnect failed during idle time.", e);
            }
        }
    } catch (Throwable t) {
        logger.warn("Exception when reconnect to remote channel "+ channel.getRemoteAddress(), t); }}Copy the code
  • Finally, take a look at the timed task to close the connection:CloseTimerTask#doTask
@Override
protected void doTask(Channel Channel) {try {// lastRead time Long lastRead = lastRead(Channel); // lastWrite time Long lastWrite = lastWrite(channel); Long now = now(); // If the last read interval is longer than idleTimeout, or the last write interval is longer than idleTimeout, the link is disconnected. // The default idleTimeout interval is 60s * 3if((lastRead ! = null && now - lastRead > idleTimeout) || (lastWrite ! = null && now - lastWrite > idleTimeout)) { logger.warn("Close channel " + channel + ", because idleCheck timeout: "
                    + idleTimeout + "ms");
            channel.close();
        }
    } catch (Throwable t) {
        logger.warn("Exception when close remote channel "+ channel.getRemoteAddress(), t); }}Copy the code
  • Create timer when connection is established (client)

In HeaderExchange Lient, HashedWheelTimer is used to start heartbeat detection, which is a time wheel timer provided by Netty. HashedWheelTimer performs better than Schedule when there are many tasks and the task execution time is very short. Especially suitable for heartbeat detection.

public class HeaderExchangeClient implements ExchangeClient { private final Client client; private final ExchangeChannel channel; Private static final HashedWheelTimer IDLE_CHECK_TIMER = new HashedWheelTimer(new) private static final HashedWheelTimer IDLE_CHECK_TIMER = new HashedWheelTimer NamedThreadFactory("dubbo-client-idleCheck".true), 1, TimeUnit.SECONDS, TICKS_PER_WHEEL);
    private HeartbeatTimerTask heartBeatTimerTask;
    private ReconnectTimerTask reconnectTimerTask;

    public HeaderExchangeClient(Client client, boolean startTimer) {
        Assert.notNull(client, "Client can't be null");
        this.client = client;
        this.channel = new HeaderExchangeChannel(client);

        if(startTimer) { URL url = client.getUrl(); // The client initiates a reconnection with startReconnectTask(url); // The client sends the heartbeat startHeartBeatTask(url); }} // omit... private void startHeartBeatTask(URL url) {if(! client.canHandleIdle()) { AbstractTimerTask.ChannelProvider cp = () -> Collections.singletonList(HeaderExchangeClient.this); // Default is 60s int heartbeat = getHeartbeat(url); // The frequency of reconnection and disconnection is 1/3 of the detection period. The heartbeat sending task is responsible for sending heartbeat packets to the peer end when the connection is idle. The task of scheduled reconnection and disconnection checks whether the lastRead has not been updated within the timeout period. // If the lastRead has not been updated within the timeout period, the client reconnects and the server disconnects. long heartbeatTick = calculateLeastDuration(heartbeat); This. heartBeatTimerTask = new heartBeatTimerTask (cp, heartbeatTick, heartbeat); IDLE_CHECK_TIMER.newTimeout(heartBeatTimerTask, heartbeatTick, TimeUnit.MILLISECONDS); } } private void startReconnectTask(URL url) {if(shouldReconnect(url)) { AbstractTimerTask.ChannelProvider cp = () -> Collections.singletonList(HeaderExchangeClient.this); // idleTimeout should be at least twice as much as heartBeat because the client may retry. int idleTimeout = getIdleTimeout(url); long heartbeatTimeoutTick = calculateLeastDuration(idleTimeout); this.reconnectTimerTask = new ReconnectTimerTask(cp, heartbeatTimeoutTick, idleTimeout); IDLE_CHECK_TIMER.newTimeout(reconnectTimerTask, heartbeatTimeoutTick, TimeUnit.MILLISECONDS); }}}Copy the code
  • The logic on the server side is similar to that on the client side, so I won’t go into details.
Summary of Dubbo’s heartbeat plan

For each connection, Dubbo starts two timers on both the client and server. One timer is used to periodically send heartbeat messages and the other timer is used to periodically reconnect or disconnect. Both timers are executed at one third of the detection interval. The heartbeat sending task is responsible for sending heartbeat packets to the peer when the connection is idle. The task of scheduled reconnection and disconnection checks whether the lastRead is not updated within the timeout period. If the lastRead is timeout, the client reconnects and the server disconnects.