This article refers to The Context where GRACEFULLY RESTARTING A GOLANG WEB SERVER enters the message. You can also get the version of the code to add the remarks here. I split it so you can read it.

The problem

Because Golang is compiled, when we modify the configuration of a service written in Go, we need to restart the service, or even recompile and publish it. If a large number of requests flood in during a restart, you can do little more than divert or block the requests. Either way, it’s not elegant, so Slax0r and his team tried to find a smoother, easier way to reboot.

In addition to the handsome typesetting of the original article, the text content and explanations are relatively few, so I hope to add some explanations.

The principle of

The root of the problem is that we can’t have two services listening on the same port at the same time. The solution is to copy the current LISTEN file and transfer parameters and environment variables directly between the old and new processes over sockets. New on, old off, simple as that.

In case you don’t understand the instructions

Unix domain socket

Everything is file

To play first

Run the program, open a new console in the process, type kill -1 [process number], and you should see the gracefully restarted process.

Code thinking

func main(a){main function, initial configuration call serve()}func serve(a){core run function getListener()1. Obtain the listener
    start()         // 2. Start the server service using the obtained listener
    waitForSignal() // 3. Listen for external signals to control whether the program is fork or shutdown
}

func getListener(a){get the port object being listened on (first run new)}func start(a){Run HTTP server}func waitForSignal(a) {
    for{wait for an external signal1.The fork child processes.2.Close the process}}Copy the code

The above is a description of the code idea, and basically we are filling in the code around this outline.

Defining structure

We abstract out two structures that describe the common data structures in the program

var cfg *srvCfg
type listener struct {
	// Listener address
	Addr string `json:"addr"`
	// Listener file descriptor
	FD int `json:"fd"`
	// Listener file name
	Filename string `json:"filename"`
}

type srvCfg struct {
	sockFile string
	addr string
	ln net.Listener
	shutDownTimeout time.Duration
	childTimeout time.Duration
}
Copy the code

A listener is our listener, which contains the listener address, file descriptor, and file name. The file descriptor is simply an index of the file that the process needs to open, a non-negative integer. When creating a process, Linux will open three files by default: stdin for standard input, stdout for standard output, and stderr for standard error. These three files occupy 0, 1, and 2 file descriptors respectively. So if your process wants to open the file later, you have to start at 3. The listener is the data to be transferred between our processes.

SrvCfg is our global environment configuration, including socket file path, service listening address, listener object, parent process timeout, child process timeout. Since this is global configuration data, let’s var it first.

The entrance

What does our main look like

func main(a) {
	serve(srvCfg{
		sockFile: "/tmp/api.sock",
		addr:     ": 8000",
		shutDownTimeout: 5*time.Second,
		childTimeout: 5*time.Second,
	}, http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
		w.Write([]byte(`Hello, world! `}}))))func serve(config srvCfg, handler http.Handler) {
	cfg = &config
	var err error
	// get tcp listener
	cfg.ln, err = getListener()
	iferr ! =nil {
		panic(err)
	}

	// return an http Server
	srv := start(handler)

	// create a wait routine
	err = waitForSignals(srv)
	iferr ! =nil {
		panic(err)
	}
}
Copy the code

Very simple, we have the configuration all set up, and then we register a handler that prints Hello, world!

The serve function is the same idea as before, but with a few more errors.

Next, let’s look at the functions one by one…

Access to the listener

This is our getListener() function

func getListener(a) (net.Listener, error) {
    // The first execution does not importListener
	ln, err := importListener()
	if err == nil {
		fmt.Printf("imported listener file descriptor for addr: %s\n", cfg.addr)
		return ln, nil
	}
    // The first execution will create atelistener
	ln, err = createListener()
	iferr ! =nil {
		return nil, err
	}

	return ln, err
}

func importListener(a) (net.Listener, error){... }func createListener(a) (net.Listener, error) {
	fmt.Println("Create a listener for the first time", cfg.addr)
	ln, err := net.Listen("tcp", cfg.addr)
	iferr ! =nil {
		return nil, err
	}

	return ln, err
}
Copy the code

Because the importListener is not executed the first time, we do not need to know how it is implemented in the importListener for now. CreateListener returns a listener.

And then there’s our start function

func start(handler http.Handler) *http.Server {
	srv := &http.Server{
		Addr: cfg.addr,
		Handler: handler,
	}
	// start to serve
	go srv.Serve(cfg.ln)
	fmt.Println("After the server is started, the configuration information is:",cfg.ln)
	return srv
}
Copy the code

Obviously, start passes in a handler, and the coroutine runs an HTTP server.

Monitoring signal

Listening signals should be the entrance to the most important part of this article. Let’s first look at the code:

func waitForSignals(srv *http.Server) error {
	sig := make(chan os.Signal, 1024)
	signal.Notify(sig, syscall.SIGTERM, syscall.SIGINT, syscall.SIGHUP)
	for {
		select {
		case s := <-sig:
			switch s {
			case syscall.SIGHUP:
				err := handleHangup() / / close
				if err == nil {
					// no error occured - child spawned and started
					return shutdown(srv)
				}
			case syscall.SIGTERM, syscall.SIGINT:
				return shutdown(srv)
			}
		}
	}
}
Copy the code

First, a channel is created to receive commands sent by the system to the program, such as kill -9 myprog, and the 9 is sent to the channel. We use Notify to limit the signal that generates a response. Here we have:

  • SIGTERM
  • SIGINT
  • SIGHUP About signal

If you really don’t know the difference between these three signals, just know that by distinguishing the signals, we leave the process to its own discretion.

Then we start a loop listening, which is obviously a system signal. When the signal is syscall.SIGHUP, we need to restart the process. When the signal is syscall.SIGTERM, syscall.SIGINT, we directly shut down the process.

So we’re going to look at what’s going on inside the handleHangup.

A conversation between father and son

Restart process between grace, we can be seen as a pleasant conversation, father to son opened a hotline, dad now listen port information through the hotline to tell son, son, after receive the necessary information to follow, open a new empty process, told my father, my father retired.

func handleHangup(a) error {
	c := make(chan string)
	defer close(c)
	errChn := make(chan error)
	defer close(errChn)
    // Open a hotline channel
	go socketListener(c, errChn)

	for {
		select {
		case cmd := <-c:
			switch cmd {
			case "socket_opened":
				p, err := fork()
				iferr ! =nil {
					fmt.Printf("unable to fork: %v\n", err)
					continue
				}
				fmt.Printf("forked (PID: %d), waiting for spinup", p.Pid)

			case "listener_sent":
				fmt.Println("listener sent - shutting down")

				return nil
			}

		case err := <-errChn:
			return err
		}
	}

	return nil
}
Copy the code

SocketListener starts a new Unix socket channel, listens for the channel, and performs the corresponding processing. To put it bluntly, there are only two kinds of cases:

  1. The channel opens, which means I can fork. The son comes to pick up the father’s message
  2. Dad passed on the wiretap files to his son. Dad’s done his job

HandleHangup is a bit full, don’t panic, let’s take a look at it one by one. SocketListener:

func socketListener(chn chan<- string, errChn chan<- error) {
	// Create socket server
	fmt.Println("Create a new socket channel")
	ln, err := net.Listen("unix", cfg.sockFile)
	iferr ! =nil {
		errChn <- err
		return
	}
	defer ln.Close()

	// signal that we created a socket
	fmt.Println("Channel is open, ready to fork.")
	chn <- "socket_opened"

	// accept
	// block and wait for the child to connect
	c, err := acceptConn(ln)
	iferr ! =nil {
		errChn <- err
		return
	}

	// read from the socket
	buf := make([]byte.512)
	nr, err := c.Read(buf)
	iferr ! =nil {
		errChn <- err
		return
	}

	data := buf[0:nr]
	fmt.Println("Get message child process message".string(data))
	switch string(data) {
	case "get_listener":
		fmt.Println("The child process requests a listener message. Start sending it.")
		err := sendListener(c) // Send the file description to the new child process, which is used to import the Listener
		iferr ! =nil {
			errChn <- err
			return
		}
		// The transmission is complete
		fmt.Println("The listener message has been sent.")
		chn <- "listener_sent"}}Copy the code

SockectListener creates a Unix socket channel and sends socket_opened information after creating it. The case “socket_opened” in handleHangup will react. Meanwhile, the socketListener blocks on Accept waiting for a signal from a new application to send file information from the original listener. HandlerHangup listener_sent is not told until it has been sent.

Here is the code for acceptConn, which has no complicated logic, such as waiting for a subroutine request, handling timeouts, and errors.

func acceptConn(l net.Listener) (c net.Conn, err error) {
	chn := make(chan error)
	go func(a) {
		defer close(chn)
		fmt.Printf("Accept new connection %+v\n", l)
		c, err = l.Accept()
		iferr ! =nil {
			chn <- err
		}
	}()

	select {
	case err = <-chn:
		iferr ! =nil {
			fmt.Printf("error occurred when accepting socket connection: %v\n",
				err)
		}

	case <-time.After(cfg.childTimeout):
		fmt.Println("timeout occurred waiting for connection from child")}return
}
Copy the code

Remember the listener structure we defined earlier? This is where it comes in handy:

func sendListener(c net.Conn) error {
	fmt.Printf("Send the old listener file %+v\n", cfg.ln)
	lnFile, err := getListenerFile(cfg.ln)
	iferr ! =nil {
		return err
	}
	defer lnFile.Close()

	l := listener{
		Addr:     cfg.addr,
		FD:       3.// File descriptor, process initialization descriptor is 0 stdin 1 stdout 2 stderr, so we start with 3
		Filename: lnFile.Name(),
	}

	lnEnv, err := json.Marshal(l)
	iferr ! =nil {
		return err
	}
	fmt.Printf("Write %+v\n to connection \n".string(lnEnv))
	_, err = c.Write(lnEnv)
	iferr ! =nil {
		return err
	}

	return nil
}

func getListenerFile(ln net.Listener) (*os.File, error) {
	switch t := ln.(type) {
	case *net.TCPListener:
		return t.File()
	case *net.UnixListener:
		return t.File()
	}

	return nil, fmt.Errorf("unsupported listener: %T", ln)
}
Copy the code

SendListener first makes a copy of the TCP listener file (everything file) we are using, inserts the necessary information into the listener structure, serializes it and transmits it to the new child process using the Unix socket.

Having said that, we skipped the creation of the child process. Now let’s look at fork, which is also a big part of the process:

func fork(a) (*os.Process, error) {
	// Get the original listener file descriptor and package it into the metadata
	lnFile, err := getListenerFile(cfg.ln)
	fmt.Printf("Get the listener file %+v\n, start creating a new process \n", lnFile.Name())
	iferr ! =nil {
		return nil, err
	}
	defer lnFile.Close()

	// Several files that must be fortified when creating the child process
	files := []*os.File{
		os.Stdin,
		os.Stdout,
		os.Stderr,
		lnFile,
	}

	// Get the program name of the new process. Since we are restarting, this is the name of the program currently running
	execName, err := os.Executable()
	iferr ! =nil {
		return nil, err
	}
	execDir := filepath.Dir(execName)

	// The baby has been born
	p, err := os.StartProcess(execName, []string{execName}, &os.ProcAttr{
		Dir:   execDir,
		Files: files,
		Sys:   &syscall.SysProcAttr{},
	})
	fmt.Println("Child process created successfully")
	iferr ! =nil {
		return nil, err
	}
	// the parent process will be shutdown if it returns nil
	return p, nil
}
Copy the code

The moment you execute StartProcess, you realize that the execution of the child process will go back to where it started, which is main. The importListener method in which we get the listener is activated:

func importListener(a) (net.Listener, error) {
	// Set up a connection to the prepared Unix socket, which the parent process set up earlier
	c, err := net.Dial("unix", cfg.sockFile)
	iferr ! =nil {
		fmt.Println("no unix socket now")
		return nil, err
	}
	defer c.Close()
	fmt.Println("Ready to import the original listener file...")
	var lnEnv string
	wg := sync.WaitGroup{}
	wg.Add(1)
	go func(r io.Reader) {
		defer wg.Done()
		// Read the contents of conn
		buf := make([]byte.1024)
		n, err := r.Read(buf[:])
		iferr ! =nil {
			return
		}

		lnEnv = string(buf[0:n])
	}(c)
	/ / write get_listener
	fmt.Println("Tell dad I want 'get-listener'.")
	_, err = c.Write([]byte("get_listener"))
	iferr ! =nil {
		return nil, err
	}

	wg.Wait() // Wait for dad to send us the parameters

	if lnEnv == "" {
		return nil, fmt.Errorf("Listener info not received from socket")}var l listener
	err = json.Unmarshal([]byte(lnEnv), &l)
	iferr ! =nil {
		return nil, err
	}
	ifl.Addr ! = cfg.addr {return nil, fmt.Errorf("unable to find listener for %v", cfg.addr)
	}

	// the file has already been passed to this process, extract the file
	// descriptor and name from the metadata to rebuild/find the *os.File for
	// the listener.
	// We have got the information about the listening file, we are going to create a new file ourselves and use it
	lnFile := os.NewFile(uintptr(l.FD), l.Filename)
	fmt.Println("New filename:", l.Filename)
	if lnFile == nil {
		return nil, fmt.Errorf("unable to create listener file: %v", l.Filename)
	}
	defer lnFile.Close()

	// create a listerer with the *os.File
	ln, err := net.FileListener(lnFile)
	iferr ! =nil {
		return nil, err
	}

	return ln, nil
}
Copy the code

The importListener is executed after the parent process creates a new Unix socket channel.

At this point, the child process starts a new round of listening, and the service...

The end of the

It’s a small amount of code, but it delivers a nice and elegant idea for a reboot, and some of it takes a bit of practice to understand (for a novice like me). There are plenty of other elegant ways to reboot online, so Google them. I hope my brief explanation above can help you. Please point out any mistakes and I will correct them.

You can also get the version of the code to add the remarks here. I split it so you can read it.