preface

In 2021, Lindorm has been developed in Ali for ten years, from Lindorm 1.0 version based on HBase in-depth transformation to Lindorm 2.0 version with comprehensive reconstruction and greatly upgraded architecture. From a single wide table engine to a multi-module engine that supports search, timing, files and other structured data processing, Lindorm maintains a rapid pace of iteration and upgrade to meet the data storage needs of alibaba Group’s various businesses. Currently, Lindorm is one of the largest and most extensive database products in the company.

Last year, under the concept of making it visible and affordable, Lindorm upgraded its brand again and pioneered the concept of multi-mode super-converged database. Lindorm is not just a stack of broad tables, timing, search engines, etc. It is built on top of a unified distributed storage engine that interconnects with each other and provides unified access to multi-mode databases through a unified SQL portal.

In a single database, users can perform streaming computing, wide table storage, column indexing, spatio-temporal indexing, temporal forecasting, data subscriptions, and complex analysis on various models. In the face of complex and changing business and applications, there is no need to select, operate and maintain multiple complex databases. The entire data life cycle can be completed in each component of Lindorm to meet the needs of users for writing, searching, analyzing, and monitoring.

On November 11, 2021, Lindorm will escort core systems such as interactive marketing for mobile shopping, intelligent risk control, media large screen, business staff, spending decision making, consumption record, etc., provide transparent transmission of cluster water level and status to product, business can scale on demand, improve preparation efficiency, and reduce business support cost by 80%. The upgrade of cloud native Serverless architecture greatly promotes the flexible scaling of resources on demand, improves the efficiency of resource management by more than 10 times, and reduces cost and increases efficiency. Based on the storage pooling and transparent compression technology, the storage cost can be reduced by up to 53%. The distributed 3AZ architecture enables second-level recovery across equipment rooms and supports financial high availability scenarios.

As an important part of the Lindorm multi-mode database, the wide meter engine has been through the development of Lindorm for ten years, and many innovations in functions, performance, stability and other aspects have been tested by large-scale practice for a long time. We have provided services for dozens of BU, including Taobao, Tmall, Ant, Cainiao, Mom, Youku, Autonavi, And Dawenyu.

Lindorm wide watch integrates Alibaba’s technical capability and experience in the field of large-scale wide watch technology in the past ten years, and makes use of cloud infrastructure to realize cloud biosynthesis, and makes some innovations and breakthroughs in the direction of low cost, and further constructs the competitiveness of massive data processing scenarios. Over the course of ten years, we have implemented cross-AZ Dr And supported multi-consistency to meet the requirements of various services. Support integrated hot and cold separation, high compression algorithm to reduce user costs. The distributed global secondary index is realized, and the search engine is combined to launch SearchIndex to meet the user’s various complex query needs. For 10 years, our team focused on the wide watch area, polishing the Lindorm wide watch engine. This year we have upgraded and revamped some of the features of the Lindorm wide watch, bringing out the new ones and truly following the philosophy of keeping the simple for the customer and the complex for yourself.

Easier to use features

The Lindorm wide table accumulates a number of features for all types of users, such as SQL, secondary indexing, and so on. These features facilitate the use of users. However, with the increase of business scenarios, users put forward new requirements for these functions. For example, SQL does not support functions such as ORDER by before, so users have great limitations when using it. For users of these slots, the Lindorm wide watch has been further enhanced.

More powerful SQL capabilities

The Lindorm wide table engine has been supporting SQL access for a long time. SQL is much easier to use than using the API, and the Lindorm developers love it. Also, Lindorm’s wide table engine supports an inverted index of specified columns in the search engine, accessed using uniform SQL. However, the previous Lindorm SQL only supported simple SQL DML operations, and syntax such as ORDER by, Group BY, and JOIN were not supported. This year, we reconfigured the entire SQL module to serve as a unified SQL portal for all Lindorm engines, and introduced the Complex Executor module to support previously unsupported SQL syntax such as Group by. As the new SQL engine continues to evolve, our goal is to use a unified SQL access layer to access the Lindorm models, routing requests for different loads to the appropriate components.

More secure data

Data security is the lifeline of an enterprise. Many customers on Lindorm store a lot of sensitive data in the Lindorm wide table, especially financial customers. Due to regulatory requirements, all data related to users and orders must be transmitted and stored encrypted. Therefore, Lindorm adds multiple encryption, audit log, and other functions to meet the needs of enterprise users, in addition to the existing user name and password rights.

Transparent encryption

One of the differences between customers on the cloud and corporate customers is the rich industry characteristics. The financial sector and national institutions show a strong interest in the security of database when selecting database products. In addition, in the field of cloud computing, Azure cosmosDB, AWS DynamoDB, Ali Cloud OSS and RDS all have the ability of static data encryption, and the lack of security features will sometimes directly lose the entrance ticket into a certain industry.

Today’s database security threats can be roughly divided into eight categories, and static data encryption is not a whole family barrel security solution, it is mainly committed to solving one of the many threats — insecure storage media. Data in a persistent database is stored in a file such as a hard disk. If the data is stored in plain text, users can obtain service data by parsing the file directly.

Transparent database encryption (TDE) is a way to achieve static data encryption. Compared with client encryption, transparent database encryption has the advantage that the entire encryption is completed by the database itself. Users of the database do not perceive this process, so there is no need to change it. Compared with file system encryption, database transparent encryption has the advantage of more fine-grained control over the encryption category, achieving a better balance between security and performance.

In the design process of Lindorm, considering the implementation complexity, performance cost, and use threshold and other factors, choose table granularity to support transparent encryption, and support internationally recognized block encryption algorithm AES and national quotient encryption algorithm SMS4 in encryption algorithm. Welcome to the data security requirements of the business contact us.

Other Encryption support

In addition to transparent encryption supported by the Lindorm wide table kernel, Lindorm also supports some other encryption methods, such as cloud disk encryption, which encrypts the entire data disk based on block storage, preventing data backup from being decrypted even if it is leaked. In addition, we implement transmission encryption based on the Thrift protocol and SSL, so that the entire access link of the user is encrypted, which further ensures user security. Next, we will implement encryption for Lindorm’s own protocol.

The audit log

Many security-conscious organizations need to log every Lindorm operation, such as table creation, table deletion, user authorization, and so on. Some companies that store sensitive data even require access logs of each record to see when and who read which user’s information for compliance audits and follow-up. In response to these customer needs, the Lindorm wide table engine developed audit logging capabilities. The ability to record detailed DDL and even DML operation information. At present, our audit logs are being connected to the SLS. After the connection is connected, our audit logs can be delivered to the SLS specified by users by LogTail. Users can customize the retention time of audit logs and their query requirements.

More efficient operation and maintenance

Lindorm has tens of thousands of machines in the group and thousands of instances in the cloud. The businesses running on these instances vary widely, with different loads and models, making it difficult to have a single configuration that meets all users’ needs. For example, on a cluster with high write traffic, the default Compaction configuration can cause backlogs and small files to affect performance. Compacting a Compaction requires a team of experienced kernel fellows, and it takes a lot of time and effort. In addition, although Lindorm is a distributed database, hot issues often occur when the user designs the table structure or when sudden traffic comes, which requires SRE to handle manually, which is slow and causes stability problems. This year, Lindorm optimized backlogs and hot issues that occur online to resolve automatically, improving Lindorm’s self-healing ability, reducing pressure on o&M personnel, and strengthening system stability.

Offload Compaction

Distributed database based on LSM-Tree architecture does not update data in place. Instead, data is written into memory first and then periodically refreshed into read-only storage files. Therefore, reading data often requires traversing multiple files to find the correct value currently in effect. As the number of stored files increases, the read performance deteriorates as the NUMBER of I/OS increases. This implementation typically periodically compacts a Compaction operation to combine multiple files to stabilize the number of files, thereby stabilizing the NUMBER of I/OS that can be read, and keeping latency within a range.

In the existing architecture of Lindorm, a Compaction operation and read/write request service are coupled in a single process because a Compaction operation generates bandwidth, IO, and CPU and memory consumption, which affects the response time of read/write requests. When operations compaction occurs, operations that read and write compaction frequently or write compaction compaction that requires little or no compaction (metadata storage) or that respond to read and write delays (risk control) require more compaction. It is difficult to unify and manage the parameters associated with a compaction task. When there is a large backlog of files, due to coupling factors, it is impossible to independently expand and quickly digest files to reduce business risks, showing the lack of flexibility of the current design.

A read and write operation can be treated as an offline task that executes periodically, while a read and write operation is a real-time online service. The root cause of this problem is that the offline task and the real-time online task are strongly coupled, which affects each other. Based on this insight, Lindorm implements an Offload Compaction that runs while operating a Compaction process on a single machine. A machine that runs a Compaction operation can utilize all of its resources without compromising online services. Additionally, a DB operation can create a massive Compaction cluster that can scale back or shrink as needed, simplifying operations.

Quota limit flow

For database system, regardless of whether they are single Mysql or distributed Lindorm, in server hardware size must be at the bottom of the case, the service ability must be limited, under the scenarios of multi-tenancy, if a tenant’s request traffic over the database under the limit, is likely to cause the loss of database service ability, Affecting other tenants on the database, or even causing the server to go down, is very dangerous. Therefore, common database systems have a traffic limiting scheme. When the traffic of a tenant’s request exceeds the service capacity, the traffic limiting scheme is implemented to prevent the request from affecting other tenants.

In the case of not considering the distribution, traffic limiting is relatively simple. The traffic limiting system does not need to coordinate among different machines. It only needs to record the requests to access the local machine and enable traffic limiting when the requests exceed the threshold. However, in distributed systems, the schemes of limiting traffic are often complicated and involve distributed coordination and other problems. At the same time, for a high-throughput distributed system like Lindorm, how to limit traffic without affecting normal request response is also a difficulty.

Lindorm has developed a distributed traffic limiting system that has the following unique features:

  1. Using Quota as a traffic limiting unit, the more data a user requests, the more Quota is consumed. Therefore, a Quota is a more accurate reflection of the amount of pressure a request puts on the system than a QPS
  2. The central QuotaCenter manages Quota consumption uniformly and can perform limiting logic even if the user’s traffic is unevenly distributed across the machine
  3. Quota consumption is reported asynchronously and regularly by QuotaProxy. QuotaCenter does not become a single point of the system
  4. The user does not directly interact with QuotaCenter during a request. Each request only checks the local cache information of QuotaProxy, which does not affect the response time of the user
  5. QuotaCenter weak dependency, QuotaCenter down, does not affect user requests

future

We build Lindorm product from zero with a 10-year mentality. This year is the tenth year of Lindorm in Ali. Lindorm is in its prime. Business drive is the constant evolution principle of Lindorm wide table engine. We will continue to provide users with seamless expansion, high throughput, continuous availability, millisecond level stable response, strong and weak consistency adjustable, low storage cost, rich index data real-time offline mixed access capabilities. In the future, the Lindorm wide watch engine will continue in several directions.

Lower cost of use: Low cost of use is a key feature of the cloud native database Lindorm. There is no lowest cost, only lower cost, and we will continue to work on low cost. After introducing OSS standard storage as cold storage, we will also introduce OSS archive storage to further meet users’ storage and query needs for colder data. In addition, the multi-AZ deployment of Lindorm provides users with high availability across the availability zones. However, the fragmented Replica data between the multi-availability zones is not shared at present. We hope to use Region Replica technology to share the underlying storage in the multi-availability zones to further reduce the cost.

** Easier user experience: ** Now Lindorm has fully embraced SQL, and we hope to use SQL to provide a more unified, easy-to-use user experience. Lindorm wide table will continue to improve SQL capabilities by integrating existing wide table capabilities such as FeedStream, WideColumn into SQL. At the same time, it is also planned to expose Lindorm to users in THE way of SQL, such as slow request query, hot key query, cluster operation and maintenance, so that Lindorm will truly become a user-oriented database.

** More diversified functions: ** With the diversification of Lindorm businesses, the business scenarios facing Lindorm are becoming more and more complex. Facing the challenges brought to Lindorm by these businesses, we must continue to enrich the functions of Lindorm to meet the needs of these customers. For example, we will implement Blob storage to solve the problem of poor performance of wide table model in large KV storage scenario before Lindorm, and introduce bitmap type to meet user portrait, crowd selection and other scenarios. Table snapshots are supported to meet users’ requirements for integrated query and analysis.

** More flexible ability: ** The storage separation architecture of Lindorm and the flexibility of the cloud infrastructure have brought Lindorm a very strong flexibility. The online expansion capacity and lifting capacity can meet the needs of most users, but limited by the start and stop speed of ECS and the expansion capacity of the cloud disk, Lindorm resilience is not what we would call “second level”. As a result, Lindorm is building a next-generation deployment architecture that leverages large storage pools and the power of ECI and ACK to truly achieve second-level resilience.

In addition, I am here to announce that the Lindorm wide watch engine will be open source, which will promote the accumulation of Lindorm wide watch technology to the industry, so that more people can use the advanced Lindorm technology. We also hope to attract more people to build Lindorm and develop the technology through open source. Lindorm is about to enter a new era of open source, but the original mission of Lindorm wide Table remains the same: to be the best wide table database product and to welcome anyone interested in the technology to join us through open source.

conclusion

Every functional RESEARCH and development direction of Lindorm comes from the real demand input of the business. Your understanding and trust is the driving force for us to move forward. Without your company and support, Lindorm would not achieve today’s results. Thanks to all the students who helped, supported, trusted, encouraged and encouraged us. We will strive to make the best big data NoSQL products, unite as one, stay true to our original aspiration and forge ahead!

The original link

This article is the original content of Aliyun and shall not be reproduced without permission.