A. Elegant start

  1. What is startup preheating

    The startup warm-up is to let the service just started, not directly bear all the traffic, but let it slowly increase the number of calls as time moves, and finally let the traffic run gently for a period of time to reach the normal level.

  2. How to implement

    First of all, for the caller, we need to know the startup time of the service provider. There are two ways to obtain the startup time: one is that the service provider actively sends the startup time to the registry when it is started. The other is for the registry to detect, using the service provider’s request registration time as the startup time. There will be some difference between the two, but it does not matter, because the time of the whole warm-up process is a rough value, even if there is a 1-minute error between multiple machine nodes, it does not affect, and in the real environment, the NTP time synchronization function will be enabled on the machine to ensure the consistency of all machine time.

    The caller discovers from the service that, in addition to the LIST of IP addresses, it can also get the corresponding startup time. The weights are adjusted dynamically according to a weight-based load balancing strategy, slowly increasing the number of calls to the service provider over time.

    Through this mechanism, the service provider is demoted to reduce the probability of being selected by load balancing and avoids the application being in a high load state at the beginning of startup. In this way, the service provider can warm up after startup.

    In the frame of the Dubbo also introduced “warmup” characteristics, the core source code is in com. Alibaba. Dubbo. RPC. Cluster. Loadbalance. AbstractLoadBalance. In Java:

    protected int getWeight(Invoker
              invoker, Invocation invocation) {
        // Get the Provider weights first
        int weight = invoker.getUrl().getMethodParameter(invocation.getMethodName(), Constants.WEIGHT_KEY, Constants.DEFAULT_WEIGHT);
        if (weight > 0) {
            // Get the startup timestamp of the provider
            long timestamp = invoker.getUrl().getParameter(Constants.REMOTE_TIMESTAMP_KEY, 0L);
            if (timestamp > 0L) {
                // The provider has run time
                int uptime = (int) (System.currentTimeMillis() - timestamp);
                // Get warmUp, default is 10 minutes
                int warmup = invoker.getUrl().getParameter(Constants.WARMUP_KEY, Constants.DEFAULT_WARMUP);
                // Provider running time is less than the preheating time, then need to recalculate the weight.
                if (uptime > 0&& uptime < warmup) { weight = calculateWarmupWeight(uptime, warmup, weight); }}}return weight;
    }
    
    static int calculateWarmupWeight(int uptime, int warmup, int weight) {
        // Slowly increase the weight as the provider takes longer to start
        int ww = (int) ((float) uptime / ( (float) warmup / (float) weight ) );
        return ww < 1 ? 1 : (ww > weight ? weight : ww);
    }
    
    Copy the code

    Dubbo2.7.3 version, reference source “org. Apache. Dubbo. RPC. Cluster. Loadbalance. AbstractLoadBalance”

    According to the implementation of calculateWarmupWeight() method, as the startup time of the provider is longer and longer, the weight is gradually increased, and the minimum weight is 1. The specific implementation strategy is as follows: 1) If the provider runs for 1 minute, then the weight is 10, which is only 10% of the traffic that the provider is ultimately responsible for; 2) If the provider has been running for 2 minutes, the weight is 20, which is only 20% of the traffic that the provider is ultimately responsible for; 3) If the provider has been running for 5 minutes, then the weight is 50, which is only 50% of the traffic that the provider is ultimately responsible for;

Two. Gracefully close

  1. Why gracefully close

    For the caller, the following situations may occur when the service is shut down:

    • The target service is already offline when the caller sends the request. For the caller, it is immediately aware and the node is removed from its health list so that it is not selected for load balancing.

    • When the caller sends the request, the target service is being closed, but the caller does not know that it is in the closed state, and the connection between the two is not broken, so this node will still exist in the health list, so this node will still be called in a certain probability, resulting in the call failure problem.

  2. How to achieve elegant closure

    You may have questions, RPC has service registration and discovery function, the role of the registry is used to manage the status of the service, when the service is closed, will first notify the registry for offline, and then remove node information through the registry, so that the service can not be invoked?

    Let’s take a look at the closing process:

    The entire shutdown relies on two RPC calls, one by the service provider to notify the registry of the offline operation and one by the registry to notify the service caller of the offline node operation. In addition, the registry notifies the service callers asynchronously, which cannot guarantee complete real-time performance, and the application can not be lossless shut down through service discovery.

    Is there a good solution?

    When the service provider has entered the closing process and many objects have been destroyed, we can set up a request “baffle”. The purpose of the baffle is to tell the caller that the service provider has entered the closing process and can no longer process other requests.

    This is just like when we go to the supermarket to check out. At the end of shift or work, the cashier will put a reminder board on the counter, telling us that “the channel is closed” and the checkout cannot be carried out. At this time, customers can only transfer to other available counters for checkout.

    Processing process:

    When the service provider is closing, if a new business request is received, the service provider directly returns a specific exception to the caller. This exception is to tell the caller “I was closed, unable to process this request”, then the caller after receiving this exception response, RPC framework to the node from the list of health away, and the other requests automatically retry to other nodes, because this request is not handled by the provider, so you can safely retry to other nodes, This allows for almost lossless processing of the business. If we want to improve, we can also add active notification mechanism, so as to ensure real-time performance, but also to avoid client retry situation.

    How do you catch a close event?

    A Java application will call runtime. addShutdownHook to trigger the closing hook when it receives an end signal. When the RPC service is started, we pre-register the closing hook, add handlers to it, open the baffle first, and then notify the caller that the service is offline. When a new request is received, the baffle intercepts it and throws a specific exception. In order to complete as many requests as possible, we can add a counter mechanism, count the remaining requests into the counter, reduce the count by one after each request is processed, and finally terminate the service after all the remaining requests are processed.

    In the Dubbo framework, graceful shutdown is triggered in the following scenarios:

    JVM shutdown (system.exit (int);

    JVM exits due to resource issues (OOM);

    The application receives a normal process termination signal: SIGTERM or SIGINT.

    Graceful downtime is enabled by default, and the downtime wait time is 10 seconds. Can be configured dubbo. Service. Shutdown. Wait to modify the waiting time.

    Graceful shutdownhook-based shutdowns do not ensure that all shutdowns will be completed, so Dubbo has introduced multiple shutdowns to keep the service intact. Before shutting down an application, run the offline command of QOS (online o&M command) to offline all services, and wait for a certain period of time to ensure that all requests have been processed. As the service has been offline from the registry, no new requests will be sent to the application. At this point, the actual shutdown (SIGTERM or SIGINT) process is performed to ensure that the service remains intact.

    Dubbo gracefully closed source:

    • DubboShutdownHook. Register method

      Register the closing hook:

      /** * Registers the closing hook to trigger the execution of */ when the service is closed
      public void register(a) {
          if(! registered.get() && registered.compareAndSet(false.true)) { Runtime.getRuntime().addShutdownHook(getDubboShutdownHook()); }}Copy the code
    • DubboShutdownHook doDestroy method

      Destroy all related resources:

      /** * Closes the logout of all resources, including registerers and protocol handlers. * /
      public void doDestroy(a) {
          if(! destroyed.compareAndSet(false.true)) {
              return;
          }
          // Destroy all registrators, including Zookeeper, ETcd, Consul, etc.
          AbstractRegistryFactory.destroyAll();
          // Destroy all protocol handlers, including Dubbo, Hessian, Http, Jsong, etc.
          destroyProtocols();
      }
      Copy the code

This article was created and shared by Mirson. For further communication, please add to QQ group 19310171 or visit www.softart.cn