Brief introduction:With the advent of the era of cloud native, micro-service has become the mainstream of application architecture, and NaCOS has become the preferred registration center and configuration center in the field of micro-service in China with its core competitiveness of being simple to use, stable and reliable, and excellent performance. Nacos2.0 is to achieve the ultimate performance, so that users with rapid business development do not have to worry about performance issues; At the same time, Ali cloud MSE also provides Nacos2.0 hosting service, one key to open to enjoy all the capabilities of Ali ten years of precipitation micro service!

Microservice engine MSE Professional Edition released, support for Nacos 2.0, compared to the basic version, the Professional Edition has a higher SLA guarantee, performance up to ten times, 99.95% availability, further enhanced configuration capabilities, new users 20% off the first purchase, click “see details”, to learn more about the relevant information.

The author | wind qing

preface

MSE released version 1.1.3 of the NaCOS engine in January 2020, which supports the use of NaCOS as a registry in a fully hosted public cloud environment. Nacos1.2.1 released in July 2020 supports meta-configuration data management and enables micro-service applications to dynamically modify configuration information and routing rules at runtime. As users continue to use the Nacos1.x version, performance issues are becoming apparent. Through the kernel transformation of 1.x version, the performance of Nacos2.0 Professional version has been improved by 10 times, which can basically meet the performance requirements of users for micro-service scenarios.

In addition to improved performance, the Professional Edition has higher SLA protection and greater security in configuration data, and is connected to the Istio ecosystem through the MCP protocol, serving as the registry for Istio.

MSE Nacos1.x base architecture

1.X architecture can be roughly divided into five layers, namely, the access layer, the communication layer, the function layer, the synchronization layer and the persistence layer.

  • Users access Nacos through an access layer, such as SDK, SCA, Dubbo, Console, and Nacos also provides open API access to the HTTP protocol.
  • The communication layer includes HTTP and UDP. NaCOS mainly communicates through HTTP, and UDP is used for a small part of service push function.
  • The functional layer currently has two parts, Naming and config-ing, which provide service discovery and configuration management capabilities, respectively.
  • The synchronization layer includes Distro protocol (service registration) in AP mode, RAFT protocol (service meta-information) in CP mode, and Notify synchronization mode for configuration notifications
  • Nacos data persistence is useful for MySQL, Derby, and local files. Configuration data, user information, and permission data are stored in MySQL or Derby, and persistent service data is stored in local files

MSE Nacos1.x infrastructure issues

There are several problems with the current 1.X architecture:

  • Each service instance is renewed through the heartbeat. In the Dubbo scenario, each interface corresponds to a service. When the number of application interfaces in Dubbo is large, the TPS required for heartbeat renewal will be high.
  • The perception time of heartbeat renewal is prolonged, and the instance can only be deleted after reaching the renewal timeout time, which generally takes 15S and has poor timeliness
  • Pushing change data through UDP is not reliable, and the client needs to make regular data reconciliation to ensure the correctness of the data. A large number of invalid queries result in high QPS of the overall service
  • The communication mode is based on HTTP short link. NACOS side release connection will enter the state of TIME\_WAIT. When QPS is high, there will be a risk of connection exhaustion resulting in an error
  • The configured long polling mode causes data to be requested and freed from memory in the Old part of the JVM, causing frequent CMS GC

MSE NACOS 2.0 Pro architecture and new models

1. The core problem of the X architecture lies in the connection model. The 2.0 architecture is upgraded to the long-connection model. In the communication layer, GRPC and RSocket are used to realize long-connection data transmission and push capability

Issues addressed by the 2.0 architecture:

  • Application POD performs heartbeat renewals in the long connection dimension, eliminating the need to repeat requests at the instance level
  • Long connection breaks are quickly sensed and instances can be removed without waiting for a renewal timeout
  • The NIO streaming push mechanism is more reliable than UDP and can reduce the frequency of application reconciliation data
  • There is no overhead of connection creation repeatedly, greatly reducing the TIME\_WAIT connection multiple problems
  • Long connections also solve the configuration module long polling CMS GC problem

Problems with the 2.0 architecture:

  • Compared to the Tomcat HTTP short connection model, the long connection model requires the management of connection state itself, which adds complexity
  • Long-connection GRPC is based on HTTP2.0 Stream, which is less observable and easier to use than HTTP’s Open API

Overall, the 2.0 architecture reduces resource overhead and improves system throughput, resulting in significant performance gains but also increased complexity

MSE OS 2.0 Professional Performance

NACOS is divided into service discovery module and configuration management module. Here, the performance test of service discovery scenario is carried out first.

Using 200 pressors, each pressor simulates 500 clients, each client registers 5 services, subscribes 5 services, and can provide up to 10W long connections, 50W service instances, and subscriber crush scenarios

Service discovery pressure test mainly has two scenarios of pressure changed state and stable state:

  • Change state: The pressure press phase will involve a large number of connections to the Nacos registration and subscription services. In this phase, the pressure on the server side will be relatively high, depending on whether the overall registration and subscription are finally fully successful.
  • Stable state: When the pressor requests are successful, it will enter a stable state. Only a long connection heartbeat needs to be maintained between the client and the server. At this stage, the pressure on the server will be relatively low. If the pressure of the server in the changed state is too large, the request timeout and connection disconnection will occur, and the server cannot enter the stable state

Service discovery will also upgrade the lower version on the MSE to compare the performance curve before and after the upgrade, so that the performance comparison is more intuitive

Configuration management module in be used actually is to write less and read more scenarios, the main bottleneck of points on a single machine performance, pressure test scenario is mainly based on the performance of single machine and connect support number using 200 pressure machine, pressure machine can simulate each 200 clients, each client subscription 200 configuration, a configuration request to subscribe to and read configuration

Compare the Basic and Professional performance data at 2C4G, 4C8G, and 8C16G specifications in the service discovery scenario.

The maximum number of TPS and instances in this case is the number of instances that the service can run reliably with high availability, which is about half or two-thirds of the maximum, meaning that a single machine can run without being hung up.

Stable runtime support scale increased by 7 times, and actually maximum support scale increased by 7-10 times

Another scene is the comparison of the 3 nodes before and after the 2C4G MSE NaCos upgrade, which is mainly divided into three stages:

  • In the first stage, the client uses the 1.x version, and the MSE Nacos uses the basic version. The number of instances goes from 0 to >6000 to >10000, and finally to 14000, the maximum cannot be increased any more. The Server CPU reaches 80-90%, and the client keeps complaining errors, and then reduces the number of instances to 6000
  • In the second stage, we upgraded the basic version of MSE NaCos to the professional version. When the number of instances reached 14000, it could not continue to increase, and there was little difference in the performance pressure test performance curve
  • In the third stage, when the number of instances was maintained at 14000, the client was upgraded to version 2.0 in batches. The CPU index curve decreased to about 20% continuously, and the whole was in a stable state without any error

From the performance curve before and after the upgrade, we can see that the performance of MSE NaCS 2.0 Pro has improved greatly. In the final overall test, compared to the base version, the Professional version showed a 10-fold improvement in service discovery performance and a 7-fold improvement in configuration management

MSE NACOS Smoothing Upgrade Pro

For new users, you can directly create a Professional instance, and for old users, you can upgrade by clicking the MSE” Instance Change “button. MSE will upgrade POD in the background. Due to the different V1V2 data structure, at the beginning, the default NACOS data will be double-write. During the upgrade process, the data will synchronize from V1 to V2, and after the upgrade, the data will synchronize from V2 to V1.

The SLB service port will also be added to the GRPC 9848 port. At this time, the application SDK can be upgraded from version 1.x to version 2.0, and the overall client server can be upgraded to the 2.0 architecture

The overall compatibility principle is that the higher version server is compatible with the lower version client, but the higher version client may not be able to access the lower version server:

  • 1. The X client can access the Basic Edition and the Professional Edition
  • The 2.0 client can access the Professional version, but not the Basic version

NACOS configures security management

In the last issue, Daofeng explained the configuration authority control. The overall MSE NACOS does the authority control through AliCloud RAM master and child account system. In this issue, I will mainly talk about the configuration and encryption function of NACOS.

When using the configuration data, users may store user information, database password and other sensitive information in NACOS. However, NACOS storage configuration data is transmitted and stored in clear text, which will lead to leakage of sensitive configuration data items in the case of database leakage or packet capture by the transport layer, and the overall security risk is very high.

The commonly used HTTPS protocol can solve the transmission security, but can not solve the storage security, here directly in the client to encrypt, so in the process of transmission and storage data are encrypted.

The third party encryption system (such as Ali Cloud KMS) is used to strengthen the security of encryption. Symmetric encryption (AES algorithm) is used to encrypt the encryption fast. As the key is transmitted with the ciphertext, the key is encrypted at the same time, and the overall two-level encryption is adopted.

When the SDK releases data, it will first get the key and encrypted key from KMS, then use the key to encrypt the data, and then transfer the encrypted data and encrypted key to Nacos storage. The SDK will obtain the encrypted data and encrypted key from NACOS, and then obtain the plaintext key from KMS through the encrypted key, and then decrypt the encrypted data to obtain the plaintext data through the plaintext key, which solves the data security problem in the overall transmission and storage.

In order to be compatible with the old logic, and only sensitive data needs to be encrypted, Nacos encrypts data with a fixed prefix DATAID, and on the open source side it is implemented through SPI plug-in, allowing users to extend it themselves

Users can encrypt and decrypt sensitive data through the SDK and MSE console. The overall SDK and MSE console will first access KMS and then encrypt storage configuration data, and then decrypt and then display plaintext, using the same process as the plaintext storage before

Users need SDK 1.4.2 version or above to access the encryption and decryption function using SDK, and need to introduce the nacos-client-mse-extension encryption and decryption plug-in implemented internally by MSE.

    com.alibaba.nacos

    nacos-client

    1.4.2

    com.alibaba.nacos

    nacos-client-mse-extension

    1.0.1

When initializing SDK, you need to fill in the sub-account AK/SK, and authorize KMS encryption and decryption permissions. For details, you can refer to the creation and use of configuration encryption

  Properties properties = new Properties();

  properties.put(“serverAddr”, “mse-xxxxxx-p.nacos-ans.mse.aliyuncs.com”);

  properties.put(“accessKey”, “xxxxxxxxxxxxxx”);

  properties.put(“secretKey”, “xxxxxxxxxxxxxx”);

  properties.put(“keyId”, “alias/acs/mse”);

  properties.put(“regionId”, “cn-hangzhou”);

  ConfigService configService = NacosFactory.createConfigService(properties);

  String content = configService.getConfig(“cipher-kms-aes-256-dataid”, “group”, 6000);

conclusion

MSE Pro 2.0 offers significant performance, usability, and security improvements over the base version. The base version is recommended for testing and the professional version is recommended for production. For user identity, password and other configuration sensitive information, it is recommended to enable the permission control ability and encrypt and save to strengthen data security.

More MSE features, welcome to the MSE group communication, MSE micro service engine user traffic group (group 2) group number: 34754806

Copyright Notice:The content of this article is contributed by Aliyun real-name registered users, and the copyright belongs to the original author. Aliyun developer community does not own the copyright and does not bear the corresponding legal liability. For specific rules, please refer to User Service Agreement of Alibaba Cloud Developer Community and Guidance on Intellectual Property Protection of Alibaba Cloud Developer Community. If you find any suspected plagiarism in the community, fill in the infringement complaint form to report, once verified, the community will immediately delete the suspected infringing content.