OpenMLDB (Open Source Machine Learning Database), the latest joint research result of the Fourth Paradigm, National University of Singapore and Intel, was accepted by VLDB 2021, the top international conference on Database.

VLDB (Very Large Data Base) is an annual international conference for database researchers, manufacturers, application developers, and users. It, together with SIGMOD and ICDE, is recognized as the top three international academic conferences in the field of Data management and database.

 

This is the first time for a Domestic AI manufacturer to publish an innovative paper on machine learning database on VLDB Research Track. OpenMLDB (Open Source Machine Learning Database) is an Open Source Machine Learning Database developed by the fourth paradigm and the first Open Source Machine Learning Database in China. The accepted paper was published by the FOURTH Paradigm AI basic technology R&D team, the National University of Singapore and Intel. Entitled “Optimizing in-memory Database Engine For AI-powered On-line Decision Augmentation Using Persistent Memory”.

Note: FEDB, the feature engineering database in this paper, has been renamed as OpenMLDB, and is open source (github.com/4paradigm/O…

VLDB 2021 is in full swing this week in Denmark, the land of the beautiful Little Mermaid. Today, XIAobi will take you to analyze this paper in depth and introduce the database engine OpenMLDB for AI real-time decision system based on persistent memory optimization.

Real-time decision making system driven by artificial intelligence

With the rapid development of artificial intelligence, more and more online real-time decision systems use AI technology to help them make decisions. Typical applications include real-time credit card anti-fraud and personalized recommendations. A typical AI-powered real-time decision system consists of two subsystems: offline training and online estimation. As shown in Figure 1, we put massive historical data into the offline training system on the left, which will help us extract features from historical data and train ultra-high-dimensional AI models. We then deploy the trained model into an online estimation system. The online prediction system needs to extract the historical behavior information of users according to their real-time behavior (such as card transaction), and import it into the AI model for real-time scoring, so as to make prediction. The whole online prediction system has a high demand on real-time performance, and the general delay requirement is millisecond level. This paper focuses on the performance optimization of the database storage engine part (OpenMLDB in Figure 1) of the online estimation system, which is also the performance bottleneck of the entire online estimation link.


Figure 1. Typical ARCHITECTURE of online real-time decision system driven by AI

Feature and feature extraction

figure2. Examples of credit card anti-fraud features ****

As shown in Figure 2, we take credit card anti-fraud applications as an example. The POS machine generates a record when a credit card is swiped. From this record alone, it is impossible to accurately determine whether this swipe is a normal swipe or a stolen swipe. In the artificial intelligence system, in addition to using this new card swipe record, we also need to extract the hidden information behind the new record for model training. All of this information together is our feature, and the process of feature extraction is feature engineering. As shown in Figure 2, through the Card ID, we can know the information of this Card and the account information associated with it by querying the feature database. You can also learn more about the three stores with the most total credit card consumption in the last 10 seconds, 1 minute, 5 minutes and 10 minutes by calculating the time window. These features need to be extracted in real time from the user’s recent historical data. Generally speaking, the more features are extracted, the more accurate the model prediction is.

Feature extraction is an expensive operation. First, feature extraction involves multiple database query operations. Take anti-fraud as an example, a user’s card swiping behavior will trigger thousands of database query requests. At the same time, many real-time features of these features, such as the calculation of the amount difference between the latest card swipe record and the latest card swipe record, can only be extracted after the new user card swipe data is generated and transmitted to the back-end system, so data prediction cannot be carried out. At the same time, most real-time features involve a large number of time window queries in different dimensions. These features extract workload characteristics that are different from traditional OLTP, OLAP, or HTAP workloads. We selected two of the most advanced mainstream commercial in-memory databases (DB-X and DB-Y) and performed performance tests using typical real-time feature extraction loads. As shown in Figure 3, with the increase of the number of time Windows, the query delay of DB-X and DB-Y increases significantly, and when the number of Windows is greater than 4, the query performance of both databases has exceeded 100 ms, completely unable to meet our performance requirements. In real business scenarios, the number of time Windows is much larger than 4. Based on the above results, we need to design a new database engine specifically for artificial intelligence feature extraction.

figure3. Performance of mainstream commercial databases on real-time feature extraction load

Machine learning database optimized for AI load

In order to extract real-time features efficiently, we designed a machine learning database. As shown in Figure 4, OpenMLDB includes

figure4. Architecture diagram of machine learning database OpenMLDB

Feature extraction execution engine (FEQL) and storage engine. FEQL uses LLVM to optimize the compilation of query statements and optimize the processing of multi-window types of query, thus greatly increasing the efficiency of statement parsing execution of feature extraction. In addition to supporting SQL-like syntax, FEQL also defines special syntax for feature extraction operations. As shown in Figure 5, FEQL allows you to define multiple time Windows in the same SQL statement. During parsing and optimization, FEQL can maximize the reuse of results extracted from different Windows. For example, when we need to extract the total amount of user consumption in each time period of 10 seconds, 1 minute, 5 minutes, 30 minutes, 1 hour and 5 hours, we can define 6 time Windows in one FEQL. When the executor runs FEQL, it can only take all the data of the maximum window (5 hours) and process them respectively. Instead of repeatedly extracting the data of five Windows, the efficiency of multi-window TopN operation in feature extraction is greatly improved.

figure5. Multi-window feature extraction by OpenMLDB

In the storage engine section, shown in Figure 6, OpenMLDB uses a two-tier memory skip table data structure. The primary key of the feature is stored in the layer 1 hop table, and the time window corresponding to the primary key is stored in the layer 2 hop table. When we query the data within multiple time Windows under a primary key, we only need to locate the corresponding primary key in the first-layer hop table, and then continue to query the time window in the second-layer.

figure6. OpenMLDB storage engine two-layer hop table structure **** figure7. Performance comparison between OpenMLDB DRAM version and mainstream database

As shown in Figure 7, D-FEDB(D-OpenMLDB) represents the DRAM version of OpenMLDB. We transform the number of time Windows and the number of feature primary keys on OpenMLDB DRAM version, DB-X and DB-Y commercial memory databases, and run typical feature extraction queries to compare performance. As shown in the figure, the DRAM version of OpenMLDB offers significantly higher performance than traditional commercial databases, up to 84x performance improvements.


Persistent memory based optimizationOpenMLDB

OpenMLDB DRAM version can well meet the real-time requirements of feature extraction, but in the actual deployment process, we still find the following pain points.

1. Large memory consumption and high hardware cost

To ensure online service performance, OpenMLDB indexes and data are stored in memory. Taking the credit card anti-fraud scenario as an example, our test data stores 1 billion credit card swiping records over the last 3 months for feature extraction. This data occupies more than 3 TB of memory space. When we consider the multi-copy case, the data size exceeds 10 terabytes. However, when we need more historical data for feature extraction (such as one year or even three years), the hardware cost is very high.

2. Recovery time

When a node goes offline (such as a system failure or routine maintenance), OpenMLDB needs to read massive data from the disk to the memory and reconstruct the memory data structure. The whole process can take several hours and seriously affect the quality of online service.

3. Long tail delay

To ensure data consistency, traditional in-memory databases periodically write data from volatile DRAM memory to low-speed disks or flash memory through logs or snapshots. When the system load is high, this synchronous operation can easily become part of the upper-layer query and cause an extremely long delay, which is called the long tail delay.

We introduce persistent memory to provide a new idea to solve the above pain points. Persistent memory has the characteristics of low cost, large capacity and data persistence. The low cost and large memory feature helps solve OpenMLDB’s high hardware costs. Data persistence brings two benefits: 1) Greatly shorten the recovery time after the node goes offline, ensuring the quality of service on the line; 2) Because of the in-memory data persistence feature, the database does not need to persist logs and snapshots through low-speed disk devices, so it can effectively solve the problem of long tail delay.

Persistent Memory can work in two modes: Memory Mode and App Direct Mode. In Memory Mode, persistent Memory is transparent to the user as part of Memory. The benefit of this pattern is that user programs can use large memory capacity without modification, but cannot use persistence features and do fine tuning. In App Direct Mode, persistent memory is sensed by the system as a separate block device. Programmers use the PMDK library provided by Intel for programming. In this mode, applications can take advantage of in-memory data persistence while enjoying large capacity.

figure8. OpenMLDB with different memory modes

As shown in Figure 8, the leftmost version of OpenMLDB is the original DRAM memory-based OpenMLDB, where querying data is done in DRAM memory but persisted by writing logs and snapshots to external memory (SSD/HDD). If you use Memory Mode (the middle figure in Figure 8), you directly replace DRAM Memory with large persistent Memory, but do not benefit from the data persistence feature. Therefore, in the right figure of Figure 8, App Direct Mode is used to reconstruct the storage engine at the lower level of OpenMLDB, and PMDK is used to realize the double-layer linked list structure based on persistent memory, removing the traditional log and snapshot mechanism.

The main challenge of developing a persistent memory-based storage engine is to ensure the correctness and efficiency of data persistence. The double-layer linked list, the underlying data structure of OpenMLDB, runs in multithreaded environment, and there are a large number of concurrent multithreaded read and write operations. In order to achieve more efficient concurrent read and write, the compact-and-swap (CAS) technology is usually used in DRAM environment. CAS is a common technology to realize concurrency security through hardware. At the bottom, CAS instructions of CPU are used to lock the cache or bus to realize atomic operations between multiple processors. However, when we apply CAS to persistent memory, there are data consistency issues.

figure9. Consistency of compare-and-swap (CAS) operations on persistent memory

As shown in Figure 9, a typical scenario is when a new credit card swipe record T1 is inserted into OpenMLDB through CAS on Thread 1, followed by another Thread 2 that extracts feature F1 based on the new record T1. Since the CAS and CPU flush instructions are independent, traditional DRAM based CAS have no persistence semantics. If the system persists F1 and T1 in the order shown in Figure 9 (the flush instructions in Figure 9) and a power failure occurs between the two flush commands, the new credit card record T1 will be lost on the system after the restart. But the feature F1 calculated based on T1 exists, which is obviously inconsistent.

In order to solve the problem of data consistency in Persistent memory, we propose persistent-compare-and-swap (PCAS) technology. First we use the flush-on-read rule to solve the correctness problem. When reading an in-memory data, we flush first to ensure that the read data is persisted. But too much persistent flush incurs additional performance costs, especially when it is unnecessary to continuously flush “clean” data that has not been modified. In order to efficiently judge whether the data is “clean”, we use a special pointer called smart pointer. Memory addresses that support 8-byte CAS atomic operations on 64-bit cpus under x86 must be 8-byte aligned. Since the base unit of memory access is one byte, 8-byte alignment means that the lower three bits of the CAS address are always zero.

 

As shown in Figure 10, the smart pointer cleverly makes use of the lower three bits of the memory address that is always 0 when CAS is used to mark data as dirty. With smart Pointers, we simply mark Pointers to unflushed data with dirty bits after each change. Then, on subsequent reads, the flush instruction is persisted only when the dirty bits are marked, avoiding unnecessary persistence overhead.

figure10. Persist CAS (PCAS) and smart Pointers

As shown in Figure 11, we optimized Pointers to persistent memory hop tables using PCAS technology. To further reduce the amount of writing on persistent memory, we place only the last layer of the hop table and data on persistent memory. Therefore, we also designed a new reconstruction process to ensure that the system can reconstruct the entire hop table according to the information of the last hop table after power failure. We also examine the performance impact of different levels of persistence strategy in our experiments.

figure11. Persistent hop table structure based on intelligent pointer

figure12. Database abbreviations and system configurations used in experiments

Experiments and conclusions on persistent memory optimization

We tested with data from a real anti-fraud application that contained 1 billion swipe letters over a three-month period, requiring about 10 TERabytes of memory space in a real deployment. We compared the DRAM version of OpenMLDB with OpenMLDB variants of various persistent memory versions, and Figure 12 shows the configuration of the various database systems under test. The experiment led to the following conclusions.

1. Comparison of TP-9999 (Figure 13) : OpenMLDB based on persistent memory for persistent data structure design abandons the persistence mechanism based on external storage device of original pure memory scheme, and achieves an improvement of long tail delay (TP-9999) of nearly 20%.

figure13. OpenMLDB performance comparison based on DRAM and persistent memory

2. Comparison of data recovery time (Figure 14) : Fast data recovery is achieved in the case of service interruption, the service recovery time is reduced by 99.7% and the impact on online service quality is comprehensively reduced. In the test scenario, the data recovery time is shortened from the original six hours to about one minute.

figure14. OpenMLDB based on DRAM and persistent memory recovery time comparison

3. Hardware cost comparison (Figure 15) : We compared the 10 TB credit card anti-fraud service deployed on different clusters, showing the machine configuration used in the experiment of the paper. In the 10 TB data business scenario, the hardware cost of OpenMLDB based on persistent memory is only 41.6% of that of the pure memory version.

figure15. OpenMLDB hardware cost comparison based on DRAM and persistent memory

conclusion

This paper analyzes the current challenges and solutions of real-time decision system driven by artificial intelligence, and focuses on:

1. After summarizing the architecture and load of ai-driven real-time decision system, we find that the existing commercial in-memory database cannot meet the requirements of high real-time performance of such applications.

2. OpenMLDB, a machine learning database for artificial intelligence online real-time decision system, is designed and optimized in execution engine and storage engine to meet the performance requirements of real-time decision.

3. In order to solve the pain points of large memory consumption, slow recovery after offline and long tail delay of online prediction system, we redesigned the storage engine of OpenMLDB based on persistent memory, and proposed persistent CAS (PCAS) and intelligent pointer technologies.

4. Experiments show that compared with DRAM based OpenMLDB, persistent memory based OpenMLDB can reduce the long tail delay by 19.7%, shorten the recovery time by 99.7%, and reduce the cost by 58.4%.

To learn more

Full text link: vldb.org/pvldb/vol14…

L paper speech video: www.zhihu.com/zvideo/1410…

Open source projects related to the thesis: \

  • Machine learning database OpenMLDB: github.com/4paradigm/o… \
  • Pskiplist: github.com/4paradigm/p… \
  • Persistent memory storage engine with Pskiplist integration: github.com/4paradigm/p…

L The paper project is led by the fourth paradigm MemArk technology community focused on modern storage architecture technologies: memark. IO /

More open source projects from the fourth Paradigm developer community \

  • Message queue System based on Persistent Memory Optimization Pafka: github.com/4paradigm/p… \
  • Seamless support for virtual video storage on the cloud: github.com/4paradigm/k…