Introduction: Nacos 2.0 improves performance by about 10 times by upgrading communication protocols and frameworks and data models, addressing performance issues that have been exposed since the RELEASE of Nacos 1.0. In this paper, a comprehensive performance comparison of Nacos 2.0 was made by testing Nacos 1.0 and upgrading Nacos 2.0 from Nacos 1.0 to visually show the performance improvement brought by Nacos 2.0. By Xi Weng

Nacos 2.0 improves performance by about 10 times by upgrading communication protocols and frameworks and data models, addressing performance issues that have been exposed since the RELEASE of Nacos 1.0. In this paper, a comprehensive performance comparison of Nacos 2.0 was made by testing Nacos 1.0 and upgrading Nacos 2.0 from Nacos 1.0 to visually show the performance improvement brought by Nacos 2.0.

Pressure test preparation

Environment to prepare

In order to facilitate Nacos deployed to upgrade and show the core performance indicators, we from ali cloud micro service engine MSE (_https: / / cn.aliyun.com/product…). A two-core CPU+4G memory three-node Nacos cluster purchased in the

Pressure measurement model

In order to demonstrate the performance of the system at different scales, we used step-up pressure measurement, divided the pressure into three batches for step-up, and observed the performance of the cluster under each batch. At the same time, a Dubbo service Demo will be added outside the pressure cluster, and Jmeter will be used to constantly invoke the pressure of 100 TPS, so as to simulate the possible impact on the actual business invocation under different pressures.

During the commissioning process, the server and client are upgraded when appropriate. The server is upgraded using the one-click upgrade function provided by the MSE, and the client is upgraded by restarting in batches.

The pressure measurement process

Nacos1.X Server + Nacos1.X Client

The MSE Nacos1.2.1 is put under pressure by starting the first pressure cluster. With 6000 Providers, the cluster is stable with 25% CPU and 6000 instances.



Then start the second batch of pressure cluster, adding 4000 providers and assembling 10000 providers. At this time, the peak CPU of the cluster has reached 60%, and the stable running CPU is around 45%. The cluster can run stably.

Under the stress of the first two batches, the cluster had no stability problems, so the Dubbo calls stayed normal and no errors occurred.

When the third pressure cluster is started, the pressure totals 14,000 providers. At this point, the cluster registered 13,000 instances for a short period of time, and soon the number of instances fell and the CPU ran out. And when you narrow down the time range, you can see that the instance after the drop still jitter in a small range.



As can be seen from the Consumer log, Dubbo Provider was removed because the server could not support this level of pressure. Therefore, No Provider error occurred during the invocation.



Nacos2.X Server + Nacos1.X Client

The number of instances stored on the server during the upgrade is twice the number of actual instances because the instance is doubly written during the upgrade. Based on the above test results, you need to either roll back the number of instances to the first batch of 6000 instances or upgrade the configured expansion machine before attempting the upgrade. This article uses the roll-back pressure approach, which is to stop and then start the pressure cluster. Perform the upgrade only after the cluster is restored.

As you can see from the monitoring chart, after the last two batches of stress, the cluster was back to normal very quickly, running stably, and the Dubbo calls were back to normal. Then use the MSE upgrade function to upgrade. During the upgrade, the CPU has a large jitter due to dual-write performance loss. And because the number of instances doubled due to the double write, which is actually equivalent to the maximum pressure of 12,000 instances, the server still has some jitter, which leads to some Dubbo errors. This effect will not occur if you upgrade under non-extreme pressure.



With the completion of the upgrade, the server stops dual-write, eliminating the performance loss caused by dual-write, and the CPU usage decreases and becomes stable. At the same time, the number of instances no longer jitter, and the Dubbo call fully recovers. As with the 1.x server, start the pressure cluster in two batches and compare the performance of the two versions under the same pressure.



As the client is still using the 1.x client, the usage water level of the server is still very high. After the full pressure starts, the CPU reaches almost 100%. Although there is no large-scale instance drop like 1.x server, there is still a small amount of instance jitter after running for a period of time, indicating that upgrading Nacos server to version 2.0 can improve to some extent, but the performance problem has not been completely solved.



Nacos2.X Server + Nacos2.X Client

To fully release the performance of Nacos 2.0, the cluster-pressing clients also need to be upgraded to 2.0 or higher. Similarly, the server will be replaced in three batches. During this period, because the Provider restarts, it is normal that the server instances fall and then recover. As the pressure cluster is upgraded, you can see a very significant decrease in CPU, eventually reaching a plateau where the CPU is down from nearly 100% initially to 20% and the cluster is running steadily for 14,000 instances.



Pressure test results

As mentioned above, we can get the performance difference of a three-node cluster with 2-core CPU and 4G memory under different versions:



​​​​​​

It can be seen that Nacos 2.0 does greatly improve the performance. New users suggest to directly adopt Nacos 2.0, while old users suggest to upgrade the Server first, and then release the dividends after gradually upgrading the client. Finally, from the monitoring perspective of the whole pressure measurement, to intuitively feel the performance of different versions in different stages:

The original link to this article is ali Cloud original content, shall not be reproduced without permission.