Database design and practice for large-scale commercial systems

At present, relational database has been widely used since its birth in the 1970s, and various digital information systems can see the figure of relational database. In real scenarios, business systems have very simple requirements for the basic software of relational database, that is, high reliability and high performance. At the same time, they want to simplify the implementation of business layer functions with the help of complex SQL semantics as much as possible. While traditional database products such as Oracle, SQLServer, MySQL, and PostgreSQL are maturing, new generation cloud native database products such as Aurora, PolarDB, TiDB, and OceanBase are starting to attract more attention. So what database product is best suited to the business? What is the future of the database, an older software product? This paper mainly discusses the practice and thinking of database technology from the requirement of commercial product system.

The full text is 11241 words and is expected to take 18 minutes to read.

1. Characteristics of data storage facility requirements for commercial product systems

Baidu’s main business product matrix including the effect of advertising (search ads, information) and display advertising, brand advertising and the tail poly (AD) two categories advertising products, as well as the base plate of marketing tools, such as fish and stargazing commercial production system is a bridge connection baidu clients and advertising retrieval system, and help customers to express marketing appeal, to achieve marketing objectives.

The essence of commercial product system is a complex and huge advertising information management system, with a variety of toB, toC scenarios, diverse and rich needs and frequent iterations.

These business requirements focus on the data storage level, mainly including:

Placing, transactional requirements (OLTP, on-line Transaction Processing) for Transaction scenarios;
OLAP (Online Analytical Processing) for advertising effect Analysis scenarios
High QPS queries for specific scenarios, such as account structure, permission relationship, etc.
Positive and negative KV query in literal scenario, such as keyword literal and ID mutual check;
Fuzzy query in the material list scenario;

To cope with such diverse and disparate data storage needs in business scenarios, if you use traditional storage technologies, At a minimum, you need to use relational databases (such as MySQL), KV stores (such as Redis), OLAP stores (such as Palo), full-text retrieval (such as ElasticSearch), and storage in a customized memory structure.

So what are the requirements of a business system for a data storage facility?

First, it is stable and reliable, and unavailability means impaired customer experience and even direct economic loss.
Second, the data as consistent as possible, if the customer in different links to see the data is different, there will be misunderstanding and even lead to the wrong advertising operation;
Again, cope with growing data scales as cheaply as possible, without the need to buy a lot of hardware up front, and as easily as possible to scale later;
Finally, the comprehensive read and write performance is good, and the response level is as millisecond as possible, without affecting the customer’s operation experience.

For business development students, what kind of data storage products do they want to use?

The use of the interface is single, the cost of learning and migration is low, and different data stores also try to use the same interface form;
Understandable data change behavior, no data loss or overwriting, no concurrent introduction of abnormal data;
It has high scalability and can adapt to the change of data scale and traffic from 1 to N. Service awareness is best.
High availability, built-in high fault tolerance, the service is best unaware of database exceptions;
Low cost of Schema change;
It provides good performance for a variety of read and write modes.

In summary, it is best to do anything, what load can be carried, what operation and maintenance are not in charge!

Second, BaikalDB development process

The most core storage requirement of commercial system is advertising library. Advertising library stores all advertising material information, which is used to complete the management of the entire advertising life cycle, help customers complete all advertising functions, and obtain transformation. With the development of Baidu Phoenix Nest system, the storage facilities of advertising library have gone through two important stages: 1. MySQL cluster from single database to database and table 2. Heterogeneous composite storage cluster consisting of MySQL primary storage cluster and mirror secondary storage

2.1 MySQL cluster based on database and table

The earliest Phoenix Nest AD library used standalone MySQL, deployed on an independent disk array (high-performance disk array). This architecture is relatively old due to the limited hardware conditions at that time, but it is completely consistent with the popular cloud native architecture separated from storage and computing. AWS ‘Aurora or Alicloud’s PolarDB deploy stand-alone databases such as MySQL and PostgreSQL onto a distributed file system connected to EBS disks or RDMA high-speed networks to achieve 100% SQL compatibility.

With the development of the business, the single-node deployment of MySQL cannot support the expansion of data volume and read and write volume. Therefore, database and table partitioning has become the optimal choice at that time and even now. Through database and table partitioning, MySQL can achieve high scalability of capacity and performance.

Since 2010, Phoenix Nest advertising database has successively experienced the process of 1 to 4, 4 to 8, 8 to 16, 16 to 32, and developed from a set of single machine cluster to 33 sub-database (more than one sub-database is to solve the large number of key words purchased by individual customers), each sub-database of 1 master 11 sub-database cluster. Stored tens of TB of advertising material information, read and write PV reached billions every day. The service downtime of the repository can range from one day to six hours to the minute level.

2.2 Heterogeneous Composite Storage Cluster

Phoenix nest library business scenarios of advertising is to read more writing less, query scenario is diversiform, multiple depots MySQL cluster in satisfy some query scenarios more demanding, such as account – plan – unit – keywords in the hierarchy, access to account the number of keywords, like number of keywords of the plan under a full table scan count, literally high QPS query keywords, creative fuzzy search, Material list, screening, etc., these requirements are difficult to meet with MySQL.

To solve this problem, we use data flow to synchronize MySQL data to a mirrored memory store in real time. These mirror stores adopt memory structure for specific query scenarios to meet business performance. For business application development convenience at the same time, also specially developed SQL agent layer, according to certain rules in the case of SQL doesn’t change the routing to mirror the index, and translated into images storage needed request parameters, so even though we use different data sources, but business applications are still considered a MySQL database in providing the service of the agreement, And there is no need to pay attention to which data source should be queried, thus forming a heterogeneous composite storage form. The architecture is shown in the figure below:

This is a common architecture design. In other business scenarios, OLTP database data is synchronized to OLAP data warehouse to isolate offline analysis scenarios. Its advantage lies in that multiple sets of systems with the same data and different storage engines solve complex query scenarios by divide and conquer, and have certain service isolation.

Relying on SQL agent layer can effectively improve the use experience of business applications, and the application layer can be divided into libraries and tables logic down to this agent layer, the business application does not need to sense when the library is broken. For business applications, you’re looking at a stand-alone MySQL system without any performance or capacity concerns.

But there are obvious drawbacks to this architecture:

** Operation is more complex: ** In addition to focusing on MySQL itself, you also need to operate real-time synchronization flow of data, SQL proxy layer, mirror indexes and other systems.

** Real-time data synchronization is prone to failure or delay: ** Customers may perceive significant inconsistencies in data from mirror index queries versus MySQL queries. To reduce the impact of this difference, the SQL agent layer also needs to be designed with some degradation capability (switching to MySQL queries whenever possible when delays are found). You also need facilities to quickly correct mirrored index data.

** Resource redundancy: mirror indexes are actually data replication, MySQL requires a large number of slave libraries to support read performance and synchronization requirements.

2.3 Choices for 2017

Time came to 2017, phoenix Nest advertising library has 33 libraries, disk also uses NVME SSD, for limited scenarios read and write performance can meet business requirements, but if the library again, whether resource consumption or operation and maintenance costs are more huge.

By this stage, we were beginning to wonder if there might be a less costly solution. New information flow advertising business is also rapidly developing, if the formation of a phoenix nest AD similar to the storage architecture, the actual cost will be very considerable. Although four years later, Phoenix Nest advertising library relies on hardware upgrades, including CPU and memory upgrades, NVME SSD upgrade to a single disk 3T, still maintained in 33 library deployment architecture, but performance bottlenecks have begun to highlight, if advertising materials continue to grow at a high speed, it is expected that the end of 2022 will need to carry out a new library.

At that time, the core storage of Google AdWords, the industry benchmark of advertising system, was F1/Spanner. With global deployment, it could live multiple lives across remote data centers. It was equipped with atomic clock to realize distributed and strongly consistent transactions, with extremely high availability and automatic scalability. According to the design concept of Google storage system, there are two visible routes for AD storage system design:

2.3.1 Deep customization based on MySQL

MySQL is a single-machine architecture with millions of lines of code that is extremely difficult to control and modify. It is unlikely that MySQL will be able to transform itself internally into an F1/Spanner capable system.

There are usually two ways to solve this problem, both from the outside to seek a breakthrough:

Breaking out with file systems like Aurora and PolarDB, using EBS or building a distributed file system with a high-speed RDMA connection, is not developing a new database system. However, in order to obtain better performance, the storage engine and master-slave synchronization mechanism of MySQL still need to be customized and deeply optimized. Even so, the total capacity and performance can’t scale indefinitely, such as Aurora up to 128TB, which is 5 times the performance of MySQL, and PolarDB up to 100TB, which is 6 times the performance of MySQL.

Similar to phoenix Nest AD storage design idea, through data synchronization and expanded mirror index to improve query performance, but high redundancy cost, poor data consistency.

2.3.2 Use a new database system that meets the conditions of distributed + cloud native + diversified index architecture + strong consistency

In 2017, both Google’s F1/Spanner and OceanBase were closed-source systems that were heavily coupled to their internal facilities. There are two main schools of open source systems, one is SQL supported OLAP systems, such as Baidu Palo (now open source Doris), Impala (no storage engine), ClickHouse, etc., and the other is CockroachDB and TiDB which refer to F1/Spanner ideas. The OLAP system was certainly not very good for the master requirements of our TP (online transaction) scenario, and at that time CockroachDB and TiDB were also in their infancy and production scenario usage was almost non-existent.

At this time, there is no particularly mature solution, and the solution based on MySQL has also reached a bottleneck, so can we develop a new distributed database system? The decision was based on the team’s ability to develop from zero a highly available, high-performance, low-cost OLTP-focused OLAP database (HTAP,Hybrid Transaction and Analytical Process).

** team conditions: ** existing storage team (4 people) is C++ technology stack, developed SQL agent layer and customized storage, familiar with MySQL protocol, have actual engineering experience.

Technical conditions:

1. The distributed system needs an effective communication framework, and Baidu’s BRPC framework was very mature at that time. It was an industrial-grade RPC implementation and had super-large applications.

2. At that time, the mainstream solutions to ensure data consistency were Paxos and Raft. Baidu Braft framework was implemented based on Raft protocol of BRPC, and it developed rapidly with internal support.

3. A single storage node requires a reliable KV storage. RocksDB, a Facebook/Google collaboration, is a high-performance KV engine based on the LSM Tree.

After eight months of design and development, we got the 1.0 version of the database up and running, and the results proved that our decision worked.

2.4 BaikalDB, a next-generation storage system for commercial product systems

BaikalDB is a distributed database system designed for the requirements of commercial product systems, with three core goals:

** flexible cloud deployment mode: ** Container-oriented environment design, can be mixed with business applications, flexible migration, support for linear expansion of capacity and performance, low cost, no special hardware required.

**2. One-stop storage and computing capability: ** Has comprehensive adaptability to complex business requirements, mainly meeting OLTP requirements, taking into account OLAP requirements, full-text index requirements, high-performance KV requirements, etc.

Compatible with MySQL protocol: ** easy to use business, low learning cost.

BaikalDB is named after Lake Baikal. Lake Baikal is the largest fresh water Lake in the world, equivalent to the total volume of the Great Lakes of North America, more than the total volume of the Baltic sea water, fresh water reserves of more than 20% of the world. A total of 336 Siberian rivers feed into Lake Baikal. In winter, along Lake Baikal, the pale blue icicles look like rows of data in a distributed database, dense but surprisingly orderly.

BaikalDB is a distributed extensible storage system compatible with MySQL protocol, which supports random real-time read and write of pB-level structured data. The overall system architecture is as follows:

BaikalDB implements single-machine storage based on RocksDB, ensures the consistency of duplicate data based on Multi Raft protocol (BRaft library), and realizes node communication interaction based on BRPC

BaikalStore is responsible for data storage and is organized by Region. The three regions of the three stores form a Raft group to realize tri-copy and multi-instance deployment. When the Store instance breaks down, Region data can be automatically migrated.
BaikalMeta is responsible for Meta information management, including partition, capacity, permission, balance, etc. Raft guarantees the deployment of three replicas. Meta breakdown only affects data expansion and migration, but does not affect data read and write.
Baikaldb is responsible for front-end SQL parsing, query plan generation and execution, stateless all-isomorphic multi-instance deployment, and the number of instances down does not exceed the QPS bearing limit.

The core features of BaikalDB are:

Fully autonomous capacity management: Automatic capacity expansion and automatic data balancing, application awareness, easy to cloud, currently running on the Opera PaaS platform
High availability, no single point: Supports automatic fault recovery and migration
Query oriented optimization: Support for a variety of secondary indexes, including full-text indexes, support for multi-table join, support for common OLAP requirements
Compatible with MySQL protocol, support for distributed transactions: Provides an SQL interface for applications, supports high-performance Schema and index changes
Multi-tenant: Meta information is shared and data stores are completely isolated

In the process of system development, BaikalDB plans rapid iteration based on business requirements, deeply refines and optimizes in business use, and grows with the growth of business. The time nodes of key functions are as follows:

Since its launch in 2018, BaikalDB has deployed 1.5K+ data tables, 600+TB data scale and 1.7K+ storage nodes.

Three, BaikalDB key design thinking and practice

Distributed data storage systems generally have three architectural patterns: Shared Everthing, Shared Disk, and Shared Nothing.

1. Shared Everthing: A completely transparent sharing of CPU, memory, and disks for a single host. This architecture is typical of traditional RDMS products.

2. Shared Disk: Each processing unit uses its own private CPU and memory, but shares a Disk system that separates storage from computing. Examples are Oracle Rac(using SAN to share data), Aurora(using EBS), and PolarDB(using RDMA).

3, each processing unit has its own private CPU, memory, disk, etc., there is no resource sharing, similar to the MPP (massive parallel processing) mode, each processing unit can communicate with each other, parallel processing and scalability is better. The typical representative is Hadoop. Each node is independent of each other and processes its own data respectively. After processing, the data may be summarized to the upper layer or transferred between nodes.

To advocate the architecture of the Shared Disk is very cloudy manufacturers, cloud vendors want on the cloud to provide a fully compatible with traditional RDMS system cloud products, hope the broad masses of the basic database users without migration cost, but the realization of the various cloud vendors have more differences, the main competition performance, capacity and reliability, these are various cloud vendors selling point to attract the customers. However, the Scale Out capability of the architecture is limited, so cloud vendors often advertise 100 TERabytes of data.

Sharding MySQL cluster is also a Shared Nothing architecture, each of which works independently of the other. The biggest limitation of this type of architecture is the difficulty of ensuring consistency and availability at the same time, which is limited by the well-known CAP theory. Most NoSQL systems do not support transactions, so availability is a priority. But for OLTP scenarios, data consistency is very important and transactions are indispensable.

Since BaikalDB aims to be a distributed data storage with converged capabilities for business needs, and Scale Out capability is more important in large-scale data scenarios (only 100TB capacity is far from enough), it adopts the Shared Nothing architecture.

For a distributed data system, the design of storage, computing and scheduling is the most important aspect.

3.1 Storage Layer Design

The design of the storage layer is mainly concerned with what kind of data structure to describe the data storage. For distributed data systems, additional attention needs to be paid to the use of multiple nodes to store the same data cooperatively.

For large-scale data scenarios, disk storage should be considered as a priority over memory, which is more costly and prone to data loss. RocksDB is a prominent example of a disk-oriented storage engine whose core model is a key-value structure. If you are using RocksDB you need to consider how the structure of the data table maps to the key-value structure.

In order to spread data across multiple machines, BaikalDB also introduced the concept of Region, which describes the smallest data management unit. Each data table is composed of several regions, distributed across multiple machines. In such an architecture, you need to consider how to split the data. There are two types: Hash (select the machine based on the Hash value of the Key) and Range (a sequence of keys stored on a single machine).

The Hash problem lies in how to dynamically modify the Hash rule when a Region is large enough to be split. The change of the Hash rule involves the redistribution of a large amount of data, and the size of each Region is difficult to be balanced. Even the introduction of consistent Hash can only improve the Hash rule to a limited extent. Although Range is easy to achieve data splitting, but easy to have hot spots, but relatively easy to overcome. So BaikalDB uses the Range split.

Key-value is not the same as the database Table. It is necessary to map the primary Key index (also called clustered index, which stores the primary Key and all data), the query optimized secondary index (also called non-primary Key index, non-clustered index, which stores the index and primary Key), and the full-text index of the data Table to the key-value model.

To distinguish regions and index types, the primary key index must contain region_id and index_id in addition to the primary key value. Region_id can be uniquely allocated globally in the same cluster, and table_ID is not required. In addition, the joint primary key formed by multiple fields should be stored in the order of the size of the joint primary key. Memcomparable encoding for Key is also needed to improve Scan performance; The entire line of data can be encoded using Protobuf and stored in a Value, which gives some compression and makes it easier to add columns.
Secondary Index: Local Secondary Index or Global Secondary Index should be considered.
Local secondary index: indexes only the data of the local Region. The advantage is that the index and data are on the same node, the table return speed is fast, and distributed transactions are not required. But queries always need to be accompanied by primary key conditions, otherwise they can only be broadcast to all partitions.
Global secondary index: it can index all partitioned data of the entire table. The advantage is that it does not need to broadcast when there is no primary key condition, but because the global secondary index is an independent table, it cannot be on the same storage node with all primary key data, and it needs to introduce distributed transactions to work.

In the key-value model, whether local or global, Key is composed of region_id, index_id, index Key, and primary Key (not required if it is a secondary unique index), and Value is the primary Key. As you can see, using the secondary index to retrieve the entire row needs to be fetched from the primary key index again (i.e., a backtable operation). If the relevant data is in the index key, there is no need to return to the table.

BaikalDB did not introduce distributed transactions in its early days (it was too complex), so local secondary indexes were implemented first and global secondary indexes were implemented after distributed transactions were implemented. For business applications, local secondary indexes can be preferentially selected based on usage scenarios.

Full-text index, mainly related to index construction and retrieval:

Build: Cut the front row field into one or more terms. Build an orderly inverted zipper for term and store it in format.

Search: Boolean search for multiple inverted zippers based on search terms.

In the key-value model, Key is region_id, index_id, and term after word segmentation, and Value is the sorted primary Key Value.

Therefore, in the storage layer, including the above main core logic structure, as well as storage, HLL, TDdigest and so on, are the physical structure of KV. Index about more details please refer to the design of BaikalDB index (my.oschina.net/BaikalDB/bl…

After data is divided into multiple regions based on the primary key in Range mode, multiple replicas are required to store the same Region to improve the overall availability in the distributed scenario. In this case, data consistency between multiple replicas and multiple fragments should be considered.

Data consistency of multiple replicas: Multiple replicas require reliable data replication. In case of failure, new replicas can be generated without data confusion. This is achieved by Raft consistency protocol. Raft provides several important functions: Leader election, member changes, and log replication. Each data change is delivered as a Raft log. Through Raft’s log replication, data can be safely and reliably synchronized to the majority of nodes in the replica Group. In the case of strong consistency requirements, both reads and writes occur on the Leader node.
Data consistency of multiple regions: Operations involving multiple regions must be successful or failed at the same time to avoid the inconsistency caused by the failure of some regions in a modification. This mainly relies on the two-phase Commit (i.e. 2PC) of RocksDB single-node transaction to achieve pessimistic transactions. At the same time, combined with the idea of Percolator, the Primary Region is adopted to act as the transaction coordinator to avoid the single point of the coordinator. Distributed transaction details is more complex, in many database research and development company is a specialized team in investment, more details can refer to the distributed transaction implementation (my.oschina.net/BaikalDB/bl BaikalDB…

3.2 Design of computing layer

The computing layer needs to focus on how to parse SQL into specific query plans, or distributed computing processes, and how to optimize based on cost.

The SQL layer of BaikalDB is a distributed layered design with the following overall architecture:

At present, BaikalDB is not a complete MPP architecture, which is quite different from the design of OLAP system. The final calculation and summary of data only takes place on one BaikalDB node. Meanwhile, various Filter conditions will be pushed down to the BaikalStore module as far as possible to reduce the data that needs to be summarized at last. Given the limited size of the data returned in the predominantly OLTP scenario, this is sufficient. Therefore, The BaikalDB storage node has certain computing capability and can be executed in a distributed manner to reduce transmission pressure, so it is not strictly separated from storage and computing.

In the implementation of SQL engine, we use the classical volcano model, all operators, including Baikaldb and BaikalStore interaction is also operators, each operator provides open, next, close operations, between operators can be flexible concatenation, with good scalability. During query condition execution, if the data table has multiple indexes, in order to make the query better, also need to have

The ability to automatically select the most appropriate index. There are two main designs for query optimizers:

Rule-based Optimization (RBO) : ** This method determines the execution plan of SQL according to a set of rules hardcoded in the database. In the actual process, the magnitude difference of data will affect the performance of the same SQL, which is also the defect of RBO. Rules are constant, data is changing, and the final decision by rules may not be optimal.

** Cost-based Optimization (CBO) : ** In this direction, multiple execution plans will be generated according to optimization rules. Then, CBO will calculate the costs of various execution plans according to Statistics and Cost Model, namely Cost, and select the execution plan with the lowest Cost as the actual execution plan. The accuracy of statistical information will affect the CBO to make the optimal choice.

Query optimizer is also a very complex topic. Now there are also query optimizer based on AI technology, which is also a hot topic of academic research. In many database research and development companies, this is generally a specialized general direction. BaikalDB adopted RBO do query optimization and combination of the CBO, details about the CBO can reference BaikalDB price model to realize (my.oschina.net/BaikalDB/bl…

3.3 Design of scheduling layer

Distributed data system involves more work nodes, each node may have no hardware environment and software load, to give play to the performance of the cluster, as far as possible certainly hope every work nodes stored data size, processing the data of the load, but also need to consider the fault node of collision avoidance and recovery, balance the performance of the cluster.

The scheduling system basically has a Master role to make scheduling decisions by comprehensively evaluating the information of all nodes in the cluster. The Master role of BaikalDB is BaikalMeta module. BaikalStore collects information through heartbeat packets and reports it to BaikalMeta at regular intervals. BaikalMeta obtains the detailed data of the whole cluster and generates decisions according to the information and scheduling policy. These decisions will be sent to BaikalStore through the reply of heartbeat packet. BaikalStore will execute flexibly according to the actual situation, there is no need to guarantee the success of the operation. The BaikalMeta execution is then notified via the heartbeat.

Each decision of BaikalMeta only needs to be processed according to the results of heartbeat packets collected by all nodes in this round, without relying on previous heartbeat packets, so the decision logic is easier to implement. If BaikalMeta is faulty and BaikalStore heartbeat packet does not respond, all scheduling operations will be stopped and the whole cluster will be in an unadjusted state. At the same time, Baikaldb module will cache the cluster information returned by BaikalMeta to accurately know all Region information of each data table. The failed node can be removed or retried, so that even if BaikalMeta fails, the read and write will not be affected.

On the other hand, the decision of BaikalMeta does not need high timeliness. All Baikalstores can send heartbeats at a long interval to effectively control the request pressure on BaikalMeta. In this way, a group of BaikalMeta can manage thousands of BaikalStore nodes.

In the scheduling of storage nodes, the balance of leaders and peers should be paid attention to:

Leader balancing: The number of Region leaders of each BaikalStore node should be the same as possible. The Leader is the main bearer of read and write pressure. Balancing can make the CPU and memory load of each BaikalStore node close. When the load of BaikalStore is high (usually significantly higher in container environment), if other containers of the same machine consume a large amount of CPU and memory, other leaders of the same BaikalStore may also consume a large amount of resources, it is necessary to switch its Leader to another node. Avoid processing timeouts caused by hot spots.

Peer equilibrium: It means that the copies of each Raft Group are dispersed to each BaikalStore node as far as possible, so that the number of copies of each BaikalStore node is as consistent as possible. All Region sizes of each data table are basically the same, so that the storage capacity of each BaikalStore is relatively close. Avoid data skewness so that disk resources in the cluster can be reused. In addition, each copy should be on a different machine or even on a different network segment. In this case, most copies of a Region are unavailable due to machine failure or network failure. As a result, the Leader cannot be selected and cannot read and write. When a Peer node is faulty or actively migrated, you need to create new peers to synchronize data and delete unavailable peers to ensure the stability of the number of peers.

Region is a scheduling unit, and its splitability is also a basis of scheduling mechanism. When the size of Region exceeds the set threshold, BaikalDB will split the Range in the way of baseline + increment to generate new Region and its copy, and dispatch and balance by reporting information. This enables automated unbunching of data as it grows. Scheduling is also a complicated topic. Many scheduling policies can be introduced to improve resource utilization, disaster recovery, avoid hot spots, and ensure performance. This work is also the key direction of BaikalDB iteration.

Four,

This article summarizes the expectations of data storage facilities in business scenarios, starting with the requirements of large-scale commercial systems. By reviewing the development process of the database system relied on by phoenix Nest advertising database, this paper shows the iterative process of BaikalDB, a data storage system with lower cost, more reliable and more powerful, developed by the R&D department of commercial platform.

After four years of work, BaikalDB has integrated all the storage systems that existed in the history of commercial product systems and achieved great unification. In the process of combining with the needs of the business development, BaikalDB also rely on very little effort, as far as possible to quickly build core feature set, depending on the degree of urgent business needs iteration gradually, not only meet the needs of the advertising scene, also meet new including landing pages and electricity demand for new business scenarios, and still keep the rich function, optimize performance and reduce the cost, Polish the whole system.

Finally, some key design ideas of BaikalDB are summarized from the perspectives of storage, computing and scheduling for how to develop a database. Database, operating system and compiler are the three major system software, can be said to be the infrastructure of the entire computer software, database technology is also extensive and profound, this article is only from the perspective of business, it is inevitable that there are omissions, I hope you discuss and correct.

The last hope everybody attention github.com/baidu/BaikalDB BaikalDB open source project.

Recruitment Information:

The R&D department of Baidu Business Platform is mainly responsible for the platform construction of Baidu business products, including advertising, landing page hosting, whole-domain data insight and other core business directions. It is committed to making customers and ecological partners continue to grow with platforming technical services, and becoming the most dependent business service platform for customers.

Whether you are back end, front end, or algorithm, there are a number of positions waiting for you, welcome to submit your resume, Baidu business platform RESEARCH and development department looking forward to your joining!

Resume email: [email protected] (Note [Baidu Business])