Today we are going to discuss a topic: the correct posture for a project

A slightly more formal company will have an automated offline process, because the offline process seems simple, only two steps, “stop the application”, “restart the application”, but there are actually a lot of tricks, such as:

  1. How to gracefully go offline, involving elegant downtime of Dubbo, JVM parameter configuration when services come online, etc.
  2. How can I ensure that an application can be quickly rolled back after problems are discovered or the impact of new functions can be minimized after application is released

Let’s take a look at the elegant release process using a SpringBoot project deployment as an example

Let’s start with the first question

How to get off gracefully

There are two aspects to this

  1. How to gracefully offline running services
  2. How to gracefully launch a service to be released

Let’s start with the first question

How to Stop the machine gracefully

Dubbo is currently used in most microservices architectures in the industry, and our project is no exception.

We know that the consumer is aware of the existence of the provider through Registry, so it is also necessary to notify the consumer of taking the provider offline through Registry

Registries are usually ZK or NACOS, so the first step is to have the service provider call unregister to delete its temporary node in the registry, so that the consumer is aware of the temporary node because it is always listening, so that the service consumer will not make calls to the provider that is about to go offline. But is it possible to kill the provider’s process immediately? No, because it is possible that the provider is still executing the request initiated by the consumer. If it kills the request, it is likely to interrupt the executing request and cause an error. Therefore, we need to wait about 10 seconds for the service provider to complete the task

In our offline script, we still manually execute unregister and specify sleep 10s before we kill the service process. Dubbo’s 2.5.x and 2.6.x versions of elegant downtime have some flaws and use conditions, and the elegant downtime solution is relatively perfect in 2.7, which we have been using for a long time.

After executing unregister and sleeping for 10 seconds, execute “kill service process ID” (kill (-15) instead of kill -9). Using kill -9 will kill the JVM process immediately. The JVM can then receive the kill signal and use the shutdown hook to clean up the system resources, as follows

After executing kill, we check to see if the JVM process is still alive every 10 seconds

Voice-over: 10s is enough for detection. Plus the previous 10s, there is a total of 20s for the service provider to perform tasks and clean up resources. In addition, dubbo’s graceful downtime defaults to 10s, which will force the shutdown, so 20s is theoretically enough

If you find that the process is still running after 10s, you need to kill -9 directly to kill the process. In this case, there is no problem to kill the process directly, because the service has been removed for 20 seconds. Basically, it does not affect online running.

To summarize, the service offline process is as follows

After the above steps we have done the elegant shutdown, then we look at how to gracefully on the line

Elegant online

Some people may say that on-line is not simple, through the following form to start SpringBoot is not over

Java -jar jar package path --spring.config.location= XXX --spring.pid.file= XXXCopy the code

If the business is just starting, there is no problem, after all, there is no business volume, but if your business volume rises to a certain scale, such a simple start will not work, otherwise there will always be YGC/FullGC or OOM without logs, so you need to press according to your business volume. Set parameters for a JVM similar to the following

-server -Xmx5g -Xms5g -Xmn2g -XX:MetaspaceSize=256m -XX:MaxMetaspaceSize=512m -Xss256k -XX:SurvivorRatio=8 \
-XX:+UseParNewGC  -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 \
-XX:+CMSParallelRemarkEnabled -XX:+UseCMSCompactAtFullCollection -XX:+UseFastAccessorMethods -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=${APPLICATION_LOG_DIR} -Djava.security.egd=file:/dev/./urandom
Copy the code

Egd =file:/dev/./urandom

In engineering, there may be scenarios where random numbers are used. SecureRandom is widely used in various Java components and can reliably generate random numbers. However, when random data is generated in large quantities, the performance will be reduced, so the above setting will speed up the generation process of random numbers.

It is not enough to set JVM parameters. You should monitor the health of your application. This will need to capture the relevant metrics of your machine and present them visually, as shown below

So you need to install the probe, which we used Skywalking, which will need to be launched as a Java Agent, as follows

-javaagent:$SW_AGENT_JAR -Dskywalking_config=$SW_AGENT_CONFIG
Copy the code

To sum up, the Java startup parameters are as follows

Java -jar Jar package path --spring.config.location= XXX --spring.pid.file= XXx-server-xmx5g-xMS5g-xmn2G-xx :MetaspaceSize=256m -XX:MaxMetaspaceSize=512m -Xss256k -XX:SurvivorRatio=8 \ -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 \ -XX:+CMSParallelRemarkEnabled -XX:+UseCMSCompactAtFullCollection -XX:+UseFastAccessorMethods -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=${APPLICATION_LOG_DIR} -Djava.security.egd=file:/dev/./urandom -javaagent:$SW_AGENT_JAR -Dskywalking_config=$SW_AGENT_CONFIGCopy the code

After the configuration is complete, it can finally be started. When the project is large, it is usually slow to start, so we can wait for 20 seconds and then check whether it is successfully started. If it is not successful, the developer will be alerted by the form of alarm. No, you also need to have a health check, and generally, the program will have the following health check logic

@Service(protocol = {"rest"})
public class HealthCheckServiceImpl implements HealthCheckService {

    @Resource
    private TestDAO TestDAO;

    @Override
    public String getHealthStatus(a) {
            List<TestDO> testDOS =
                    TestDAO.getResult(123); Assert.isTrue(testDOS ! =null."rebateMemberDOS null");
            / / redis detection

            // Omit other tests here

            return "health"; }}Copy the code

After the JVM startup script by calling the curl http://ip:port/service/health/deepCheck to determine whether the service launched, real results back “the health” to explain the service really useful

Scroll releases versus turquoise releases

Let’s take a look at two publishing patterns commonly used by businesses: rolling publishing and blue-green publishing

What is rolling publishing

Rolling publishing: Typically taking one or more servers out of service, performing updates, and putting them back into use. Again and again, until all instances in the cluster are updated to the new version, a picture is worth a thousand words

Rolling publishing should be a common solution in the industry. This solution is simple, but it would be troublesome to roll back once a problem is found. You have to stop the service one by one and start the old package, which will have a longer impact

What is a blue-green deployment

In order to solve the problem of slow rollback of rolling publishing, blue-green publishing is proposed. First, the new package is deployed to the new cluster. After the new cluster is successfully deployed, traffic is sent to the new cluster based on the Dubbo routing in the gateway

In this case, if the new function is found to be problematic, the gateway can immediately cut the traffic back to the old cluster according to the tag, equivalent to real-time rollback, which is really powerful, the only disadvantage is also obvious, cost money! Because we need to prepare the same number of machines as the original cluster, so it is suitable for local rich players. Of course, our group used to do it, but later the cash flow was tight, so we stopped it

The above are some points to pay attention to when publishing services up and down. Although this release is based on Java projects, the points to pay attention to other project releases are similar, nothing more than two points

  • The first is to pay attention to graceful downtime when offline and allow enough time for the service to do some resource cleaning
  • 2. Pressure test should be done in advance according to your business volume when the service is released and launched, and relevant monitoring work should be done to improve service availability