Brief introduction:Nacos 2.0 improves performance by about 10 times by upgrading communication protocols and frameworks and data models, addressing performance issues that have been exposed since the release of Nacos 1.0. In this paper, Nacos 1.0 was pressed and tested. In the process of upgrading Nacos 2.0 from Nacos 1.0, Nacos 2.0 was compared for overall performance, and the performance improvement brought by Nacos 2.0 was intuitively demonstrated.

The author is BBB 0

Nacos 2.0 improves performance by about 10 times by upgrading communication protocols and frameworks and data models, addressing performance issues that have been exposed since the release of Nacos 1.0. In this paper, Nacos 1.0 was pressed and tested. In the process of upgrading Nacos 2.0 from Nacos 1.0, Nacos 2.0 was compared for overall performance, and the performance improvement brought by Nacos 2.0 was intuitively demonstrated.

Pressure test preparation

Environment to prepare

To facilitate NACOS deployment and upgrade and to demonstrate core performance metrics, Us from ali cloud service engine MSE (\ _https: / / cn.aliyun.com/product/aliware/mse\_) to buy a 2 core CPU + 4 gb of memory three-node Nacos cluster.

Pressure measurement model

In order to demonstrate the performance of the system at different scales, the pressure test was carried out in a step-by-step pressurization way. The pressure was divided into 3 batches for step-by-step start-up, and the operation performance of the cluster under each batch was observed. At the same time, a Demo of Dubbo service will be added in addition to the pressure cluster, and JMeter will be used to make continuous calls at the pressure of 100 TPS, so as to simulate the possible influence on the actual business calls under different pressures.

During the test, the server and client will be upgraded at the appropriate time. The server upgrade will use the one-click upgrade feature provided by MSE directly, and the client upgrade will be restarted in a batch rotation.

The pressure measurement process

Nacos1.X Server + Nacos1.X Client

Start the first batch of pressure clusters and apply pressure to MSE Nacos1.2.1. At 6000 Providers, the CPU is about 25% when the cluster is stable and 6000 instances are maintained.



Then the second batch of pressure clusters will be launched, with 4000 providers added and 10,000 providers collected. At this point, the peak CPU of the cluster has reached 60%, and it is around 45% when running stably, and the cluster is running stably.

Under the pressure of the first two batches, the cluster did not experience stability problems, so calls to Dubbo remained normal and no errors occurred.

When the third batch of pressure clusters was launched, the pressure totals 14,000 providers. At this point, the cluster briefly registered 13,000 instances, but soon after the number of instances dropped and the CPU ran out. And narrowing down the time range shows that the instance is still jitter in a small range after the drop.



As can be seen from the Consumer log, the Dubbo Provider was removed because the server could not support this level of pressure. Therefore, the “No Provider” error occurred when the Dubbo Provider was called.



Nacos2.X Server + Nacos1.X Client

The number of instances stored by the server is twice the actual instance value during the upgrade because the instance is double-written during the server upgrade. Based on the above test results, it is necessary to roll the number of instances back to the first batch of 6000 instances or to try upgrading after configuring the scaled-up machine. This paper uses the method of rolling back the pressure, stopping before starting the pressure cluster. Let the cluster return to normal before performing the upgrade.

As you can see from the monitor diagram, the cluster quickly returned to normal with stable operation and Dubbo calls after the last two pressure runs were stopped. Then use the upgrade function of MSE to upgrade. Due to the performance loss of double write in the upgrade process, CPU has a large jitter. And because the double write doubled the number of instances, which was actually the maximum pressure of 12,000 instances, the server still had some jitter, which caused some Dubbo errors. If you upgrade at non-limit pressure, this effect will not be affected.



With the completion of server upgrade, double write is stopped, performance loss caused by double write is eliminated, CPU usage is reduced and becomes stable, instance number is no longer jitter, and Dubbo calls are completely restored. As with the 1.x server, start the pressure cluster in two batches and compare the performance of the two versions under the same pressure.



Since the client is still using the 1.x client, the usage level of the server is still very high. After the full pressure is started, the CPU reaches almost 100%. Although there is no massive instance drop like the 1.x server, there is still a small amount of instance jitters after running for some time, which indicates that upgrading the Nacos server to version 2.0 can provide some improvement, but it does not completely solve the performance problem.

Nacos2.X Server + Nacos2.X Client

In order to fully unlock the performance of Nacos 2.0, you also need to upgrade the client of the pressure cluster to 2.0 or above. It will also be replaced in 3 batches. During the period, due to the reboot of the Provider, it is normal for the server to recover after the instance drops. With the pressure on the cluster to upgrade, it can be found that there is a very significant decrease in CPU. Finally, when the cluster reaches stability, the CPU decreases from the initial close to 100% to 20%, and the cluster runs 14000 instances stably.



Pressure test results

As mentioned above, we can get the performance difference of a 3-node cluster with 2 cores of CPU and 4 gigabytes of memory in different versions:

Server version Client version Stress scale Cluster stability The CPU usage
Nacos1.X Nacos1.X 14000 Complete instability 100%
Nacos2.x (Upgrade) Nacos1.X 6000 There’s some jitter 100%
Nacos2.X Nacos1.X 14000 There’s some jitter 100%
Nacos2.X Nacos2.X 14000 stable 20%

Thus, it can be seen that Nacos 2.0 does have a great performance improvement. New users suggest directly adopting Nacos 2.0, while old users suggest upgrading the Server side first, and then releasing the dividends by gradually upgrading the client side. Finally, we can intuitively feel the performance of different versions in different stages from the monitoring of the whole pressure test perspective:



For more information

MSE Nacos 2.0 click https://www.aliyun.com/product/aliware/mse, to learn more about information.

Copyright Notice:The content of this article is contributed by Aliyun real-name registered users, and the copyright belongs to the original author. Aliyun developer community does not own the copyright and does not bear the corresponding legal liability. For specific rules, please refer to User Service Agreement of Alibaba Cloud Developer Community and Guidance on Intellectual Property Protection of Alibaba Cloud Developer Community. If you find any suspected plagiarism in the community, fill in the infringement complaint form to report, once verified, the community will immediately delete the suspected infringing content.