“This is the sixth day of my participation in the August More Text Challenge.

Remember before

Both micro service, and common API server, they are all processes, service at the time of release, is bound to restart, this time will kill the old process, to enable the new process, but the old service also in communication, then forced to kill the old process, can cause dirty data, and client end connection to the service connection interruption. To solve these problems, the service can be gracefully shut down or restarted without downtime.

The RPC framework writing Practice series of articles is my thoughts and summary of writing framework RAP.

1. Implementation of API services

My implementation logic is based on a reference to Uvicorn, which is used for Python’s asynchronous API services, so here is also a common API service publishing process. The common API service architecture is as follows:

The service is simple: the user accesses the Nginx Server specified by the client, and Nginx forwards the data to the corresponding Api Server 1 or Api Server 2 based on UpStream configuration.

If we use kill -9 to kill the old process and start the new process during the update or restart of Api Server 1 or Api Server 2, the following problems will occur:

  • 1. The request is in process, maybe only half of the changes are executed, and then an error occurs.
  • 2. The old request is not finished. If the server stops and exits, the client is interrupted.
  • 3. The new request is approvedNginxThe service is not restarted when the distribution is sentNginxReturn an error directly.

So how do you solve the problem?

1.2.Nginx changes dynamically

When publishing, we usually use rolling update, which is to restart Api Server 1 and then Api Server 2. This step is as follows:

  • 1. ChangeNginxtheUpStreamConfigure, the traffic can only be sent toApi Server 2And then restartApi Server 1.
  • 2.Api Server 1Change after the restart is completeNginxtheUpStreamConfigure, the traffic can only be sent toApi Server 1And then restartApi Server 2.
  • 3.Api Server 2Change after the restart is completeNginxtheUpStreamConfigure traffic to be sent toApi Server 1andApi Server 2.

As you can see, this action is very tedious, even the text I described has a lot of repetition, so we seek to automate this step. For example, we reference the combination of Nginx + etcd + confd, but when the service needs to restart frequently, we will find that this combination performance is not good. This is when you turn to OpenResty+ ETCD or Nginx + ETCD + Upsync. The general principle is the same, relying on ETCD (or any other configuration center) to provide configuration services, and other tools such as CI/CD to control the configuration. Conf, Upsync, or OpenResty Lua to dynamically update the Nginx UpStream configuration as the configuration center changes.

1.3. Exit gracefully

Restarting inevitably involves starting and exiting, which is as simple as ensuring that the service registers its information with the registry when it starts, while exiting the service requires several more steps.

If you are familiar with Supervisor, you will know that it has a stopWaitSECs configuration, which is the maximum wait time (unit: second) for the process to close. In order for the program to exit healthily, we need to configure this parameter according to our business requirements, and this parameter will be used in the shutdown logic of the process.

The implementation principle of Supervisor’s closing process is to send signal SIGINT or SIGTERM to the process, the process will start to stop receiving the connection after receiving the signal, and then wait for the completion of closing the existing connection before quitting. However, there will inevitably be some special circumstances, which will lead to the connection closing time being too long. These situations are very unusual, and we cannot wait forever, so the Supervisor waits for StopWaitSecs and then forces the process to shut down.

Uvicorn. Server is the uvicorn service code, only responsible for starting and closing the service, very simple.

The uvicorn.server code starts with SIGINT and SIGTERM, which are triggered by Ctrl+C and Kill, respectively:

# 35-38 lines
HANDLED_SIGNALS = (
    signal.SIGINT,  # Unix signal 2. Sent by Ctrl+C.
    signal.SIGTERM,  # Unix signal 15. Sent by `kill <pid>`.
)
Copy the code

This signal is enabled using install_signal_Handlers in the serve method, and the corresponding trigger function handle_exit is mounted:

def handle_exit(self, sig: signal.Signals, frame: FrameType) - >None:

    if self.should_exit:
        self.force_exit = True
    else:
        self.should_exit = True
Copy the code

This function is as simple as changing the should_exit and force_exit properties of the class.

The serve method is called when the service is started, This means that it listens for a signal at startup. In addition, serve will execute main_loop after initialization. This method uses on_tick to determine whether to return sell. should_exit or force_exit status according to initialization requirements. If the state is True the loop will continue, otherwise the loop will exit (of course on_tick does other things that are not relevant to the current analysis) :

async def main_loop(self) - >None:
    counter = 0
    should_exit = await self.on_tick(counter)
    while not should_exit:
        counter += 1
        counter = counter % 864000
        await asyncio.sleep(0.1)
        should_exit = await self.on_tick(counter)
Copy the code

After main_loop completes, the shutdown method is executed:

    async def shutdown(self, sockets: Optional[List[socket.socket]] = None) - >None:
        logger.info("Shutting down")

        # Stop accepting new connections.
        for server in self.servers:
            server.close()
        for sock in sockets or []:
            sock.close()
        for server in self.servers:
            await server.wait_closed()

        # Request shutdown on all existing connections.
        for connection in list(self.server_state.connections):
            connection.shutdown()
        await asyncio.sleep(0.1)

        # Wait for existing connections to finish sending responses.
        if self.server_state.connections and not self.force_exit:
            msg = "Waiting for connections to close. (CTRL+C to force quit)"
            logger.info(msg)
            while self.server_state.connections and not self.force_exit:
                await asyncio.sleep(0.1)
        # -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
        # The following is irrelevant to this analysis
Copy the code

It’s clear from the comments in the method that this method does a few things:

  • 1. Stop receiving new connections
  • 2. Call the method that closes the connection (this method does not close the connection immediately, but gradually waits for the request to complete)
  • 3. Wait for all requests to complete

The final integration of the whole process is shown as follows:

2. How to implement it

The logic described above can be integrated into the following image, the background control is CI/CD, when the code is submitted to merge intomasterWhen branching, follow the CI/CD script to control the configuration center to change the configuration and restart the corresponding service process:

However, when I implemented the RPC framework, I thought of a scenario where multiple services called each other, and they only depended on a third-party system in the configuration center, so I needed to change the above logic, the logic diagram is as follows:

First for the client, the client provides a connection selector, which can be thought of as a simplified gateway that synchronizes the corresponding service information from the configuration center. When a client sends a request, it responds to the request and, based on the connection selector, picks the useful connections (i.e., adaptive load balancing) from which it sends the request to the corresponding server.

As for the server, it registers an exit notification event at startup, registers a signal notification callback, registers its connection information with the configuration center, and provides the service. At this point, the client connection selector can synchronize the information to the server and then establish a connection with the server.

When a server process receives a SIGINT or SIGTERM signal, a callback is triggered. The socket that the server is listening to stops receiving careful requests and broadcasts its impending closure to all client connections (for clients that are not using configuration center). Then wait until all connected requests have been processed or timed out before invoking an exit event to make the service exit. At this point, the whole elegant restart logic is done. (The code is scattered and, like Uvicorn, will not be posted)