During server update or restart, if we kill -9 to kill the old process and start the new process, the following problems may occur:

  1. The old request is not completed, and if the server process exits directly, the client connection will be broken (receivedRST)
  2. A new request came in, and the service wasn’t ready to restart, soconnection refused
  3. Even if you have to exit the program, justkill -9Still interrupts the request being processed

During the restart process, normal services cannot be provided for users for a period of time. At the same time, shutting down services in a rude manner may also pollute status services, such as databases, on which businesses depend.

Therefore, in the process of service restart or re-release, we should make seamless switch between new and old services, and guarantee zero downtime of changed services!

As a microservices framework, how does Go-Zero help developers exit gracefully? Let’s see.

Graceful exit

The first problem that needs to be solved before a graceful reboot can be implemented is how to gracefully exit:

The general idea for HTTP services is to turn off FD listen, process incoming requests without making any new ones, and exit.

Server.shutdown () is provided in HTTP in the go native. Let’s see how it works:

  1. Set up theinShutdownmark
  2. Shut downlistenersMake sure no new requests come in
  3. Wait for all active links to become idle
  4. Exit function, end

Here’s what each of these steps means:

inShutdown

func (srv *Server) ListenAndServe(a) error {
    if srv.shuttingDown() {
        return ErrServerClosed
    }
    ....
    // The actual listening port; Generate a Listener
    ln, err := net.Listen("tcp", addr)
    iferr ! =nil {
        return err
    }
    // Do the actual logical processing and inject the listener
    return srv.Serve(tcpKeepAliveListener{ln.(*net.TCPListener)})
}

func (s *Server) shuttingDown(a) bool {
    returnatomic.LoadInt32(&s.inShutdown) ! =0
}
Copy the code

ListenAndServe is the required function to start the HTTP Server. The first sentence of ListenAndServe is to determine whether the Server is shut down.

InShutdown is an atomic variable that is not zero if it is turned off.

listeners

func (srv *Server) Serve(l net.Listener) error{...// Add the injected listener to the internal map
    // Facilitate subsequent control of requests linked to from this listener
    if! srv.trackListener(&l,true) {
        return ErrServerClosed
    }
    defer srv.trackListener(&l, false)... }Copy the code

The listeners are registered with the Serve on an internal Listeners map, which can be obtained directly from the listeners on ShutDown, and then run listener.close (). After TCP has waved four times, no new requests will come in.

closeIdleConns

Simply put: change the active links currently recorded in the Server to idle state, return.

Shut down

func (srv *Server) Serve(l net.Listener) error{...for {
    rw, err := l.Accept()
    // An error occurs when the listener is accepted because the listener is already closed
    iferr ! =nil {
      select {
      // Another sign: doneChan
      case <-srv.getDoneChan():
        return ErrServerClosed
      default:}}}}Copy the code

The doneChan channel is pushed when the listener is closed in getDoneChan.

To summarize: Shutdown can gracefully terminate a service without breaking an already active link.

But at some point after the service is started, how does the program know that the service is interrupted? How do you notify the program when the service is interrupted and then call Shutdown to handle it? Next, take a look at the system signal notification function

Service disruption

At this point, you rely on the signal provided by the OS itself. In contrast to the GO native, the Notify of Signal provides system signal notification capabilities.

Github.com/tal-tech/go…

func init(a) {
  go func(a) {
    var profiler Stopper
    
    signals := make(chan os.Signal, 1)
    signal.Notify(signals, syscall.SIGUSR1, syscall.SIGUSR2, syscall.SIGTERM)

    for {
      v := <-signals
      switch v {
      case syscall.SIGUSR1:
        dumpGoroutines()
      case syscall.SIGUSR2:
        if profiler == nil {
          profiler = StartProfile()
        } else {
          profiler.Stop()
          profiler = nil
        }
      case syscall.SIGTERM:
        // Where graceful closure is being performed
        gracefulStop(signals)
      default:
        logx.Error("Got unregistered signal:", v)
      }
    }
  }()
}
Copy the code
  • SIGUSR1 -> dumps the goroutine condition, which is useful for error analysis

  • SIGUSR2 -> Turn on/off all metrics monitoring to control the profiling duration

  • SIGTERM -> Really open gracefulStop, gracefully close

The gracefulStop process is as follows:

  1. Cancel the listening signal, after all, to exit, do not need to repeat listening
  2. wrap upTo close the current service request, as well as resources
  3. time.Sleep(), wait for the resource processing to complete, and then the shutdown to complete
  4. shutdown, notify exit
  5. If the main Goroutine has not quit, it actively sends SIGKILL to exit the process

In this way, the service does not accept new requests, and active requests wait for processing to complete, as well as for resources to be shut down (database connections, etc.), or forced to exit if there is a timeout.

The whole process

At present, all go programs are running in docker container, so during the service publishing process, K8S will send a SIGTERM signal to the container, and then the program in the container receives the signal and starts to execute ShutDown:

At this point, the whole gracefully closed process is sorted out.

But there are smooth restart, which depends on K8S, the basic process is as follows:

  • old podBefore exiting, startnew pod
  • old podContinue processing accepted requests and do not accept new requests
  • new podHow new requests are received and processed
  • old podexit

If the new pod does not start successfully, the old pod can also provide services without affecting the current online services.

The project address

Github.com/tal-tech/go…

Welcome to Go-Zero and star support us!

Wechat communication group

Pay attention to the public account of “micro-service Practice” and click on the exchange group to obtain the QR code of the community group.