“This is the 7th day of my participation in the Gwen Challenge in November. Check out the details: The Last Gwen Challenge in 2021.”

The introduction?

Shut down? How elegant is that?

There’s actually another term for graceful closing, called “smooth exit.” If you’re going to build your own wheel, graceful closing will be the first thing you’ll learn.

In life, if a noun is really too difficult to understand, let’s look at the opposite of the noun.

Let’s take a simple example

  • Graceful shutdown: Shut down your computer using the operating system shutdown function
  • Less elegant shutdown: Power off and restart

There may be some students will have a question: my usual computer stuck time is directly power off restart ah, also did not see what big problem ah?

If your computer is running Linux and it happens to be a server, you will probably lose some data if forced to restart. If it is a production environment, you should be ready to run.

Why is that?

If you’ve ever thought about tuning MySQL or Redis, you’ve come across the following parameters:

  • MySQLthesync_binlogParameter to control the behavior of the storage device that persists binlog data
  • RedistheappendfysncParameter, used to controlRedisThe act of persisting AOF logs to a data store device.

The reason for this is that persisting data to a storage device is a time-consuming activity. Linux optimizes that when you add data to a file, it will be temporarily stored in the system cache, waiting for an opportunity to batch persist data to the storage device. Unless the process specifies DirectIO or calls fsync, the operating system actively writes data to the storage device.

So both MySQL and Redis have opened up tuning parameters to control log persistence behavior, throwing the blame back to the programmers.

Sync_binlog typically has a value of 1, which means binlog data is persisted to the storage device every time a transaction is committed.

Elegant closed

Now, the less-elegant closure example shows what elegant closure does:

  • Let the program complete unfinished work (e.g. commit transactions, persist logs, etc.)

However, we need to add one more qualification:

  • When the program decides to gracefully close, no more requests can be received.

It will never end if you don’t stop processing new requests

Graceful closing of thread pools

Thread pooling (ThreadPoolExecutor) plays an important role in the development of JDK packages, and we can take a look at how such a fundamental component handles graceful closing.

This class opens up to the programmer whether to gracefully close or not, and provides two methods:

  • ThreadPoolExecutor.shutdownThis method sets the state of the thread pool toSHUTDOWNAnd no new tasks are submitted, but threads in the thread pool are allowed to run all submitted tasks
  • ThreadPoolExecutor.shutdownNowThis method sets the state of the thread pool toSTOP, and no longer accept new tasks, and immediately interrupt all threads in the thread pool, and ignore tasks that have been submitted to the thread pool but not yet run.

Closing processes gracefully

How do I gracefully close a process? First we need to figure out when a process will shut down:

  • Active shutdown (usually not active shutdown for a process that provides services externally)
  • A program crashes. For example, an exception thrown by a service is not handled properly. As a result, the exception is thrown to the outermost layer and the program crashes because it is not handled properly
  • The process receives a shutdown signal from the operating system (e.g., Ctrl+C)

In enterprise applications, a process is usually more than business logic, and around the business and the development of log service/MQ service/operational service, etc., so when a business can result in collapse process of problems, we need to close up the process of news broadcast to other services, and closing method calls these services provided by grace, Exit the process after all the above steps have been taken, such as graceful shutdown of the log service to ensure that the log falls, graceful shutdown of the MQ service to ensure that the message is delivered or consumed, and so on.

There is no need to talk about graceful closing if you are killing the process directly (kill-9)

Using Golang as an example to describe how to gracefully Shutdown a process, we first need to abstract the services in the process to achieve lifecycle management, with each service provider providing Serve and Shutdown methods.

type Service interface {
	Serve(ctx context.Context) error
	Shutdown() error
}
Copy the code

Next, we define a ServiceGroup to manage the Service life cycle. When any Service fails to run or receives SIGINT(Ctrl+C) and SIGTREM(kill without arguments), ServiceGroup is responsible for shutting down the managed Service and calling the Shutdown method.

type ServiceGroup struct {
	ctx      context.Context
	cancel   func(a)
	services []Service
}

func NewServiceGroup(ctx context.Context) *ServiceGroup {
	g := ServiceGroup{}
	g.ctx, g.cancel = context.WithCancel(ctx)
	return &g
}

func (s *ServiceGroup) Add(service Service) {
	s.services = append(s.services, service)
}

func (s *ServiceGroup) run(service Service) (err error) {
	defer func(a) {
		if  r := recover(a); r ! =nil {
			err = r.(error)
		}
	}()
	err = service.Serve(s.ctx)
	return
}

func (s *ServiceGroup) watchDog(a) {
	signalChan := make(chan os.Signal, 1)
	signal.Notify(signalChan, syscall.SIGINT, syscall.SIGTERM)
	for {
		select {
		case <- signalChan:
			// A system signal is received to stop the service
			s.cancel()
			goto CLOSE
		case <- s.ctx.Done():
			// The context is cancelled

			goto CLOSE
		}
	}
CLOSE:
	for _, service := range s.services {
		iferr := service.Shutdown(); err ! =nil {
			fmt.Printf("shutdown failed err: %s", err)
		}
	}
}

func (s *ServiceGroup) ServeAll(a) {
	var wg sync.WaitGroup
	for idx := range s.services {
		service := s.services[idx]
		wg.Add(1)
		go func(a) {
			defer wg.Done()
			iferr := s.run(service); err ! =nil {
				fmt.Println("Abnormal service, exit process!")
				s.cancel()
			}
		}()
	}
	wg.Add(1)
	go func(a) {
		defer wg.Done()
		s.watchDog()
	}()
	wg.Wait()
}
Copy the code

Next, we define a random panic business service and logging service.

type BusinessService struct{}func (b *BusinessService) Serve(ctx context.Context) (err error) {
	times := 0
	for {
		fmt.Printf("Service running %d\n", times)
		select {
		case <- ctx.Done():
			fmt.Printf("BusinessService receive cancel signal\n")
			return
		default:
			if n := rand.Intn(256); n > 200 {
				panic(fmt.Errorf("random panic on %d", n))
			}
		}
		time.Sleep(time.Millisecond * time.Duration(rand.Intn(1000)))
		times++
	}
	return
}

func (b *BusinessService) Shutdown(a) error {
	fmt.Println("Business services, shut down!")
	return nil
}

type LogService struct {
	buffer []string
}

func (l *LogService) Serve(ctx context.Context) (err error) {
	for {
		select {
		case <- ctx.Done():
			return
		default:
			// Post logs to message queues
			time.Sleep(time.Millisecond * time.Duration(rand.Intn(500)))
			l.buffer = append(l.buffer, fmt.Sprintf("Time: %d", time.Now().Unix()))
		}
	}
}

func (l *LogService) Shutdown(a) (err error) {
	fmt.Printf("Log service, shut it down! There are [%d] logs to be sent \n".len(l.buffer))
	if len(l.buffer) == 0 {
		return
	}
	for _, log := range l.buffer {
		// Send logs or persist them to hard disks
		fmt.Printf("Send Log [%s]\n", log)
	}
	fmt.Println("Buffer log cleared.")
	return
}
Copy the code

run

func main(a) {
	rand.Seed(time.Now().Unix())
	ctx := context.Background()
	g := NewServiceGroup(ctx)
	g.Add(&LogService{})
	g.Add(&BusinessService{})
	g.ServeAll()
}
Copy the code

Run the output as follows:

The above code still has many optimizations, readers can improve. For example, errorGroup can be used to manage services, and Shutdwon can also be used to pass context for timeout management.

conclusion

What is gracefully closed?

  • Let the program finish the work that has been committed but not completed
  • No new requests are received