Is NewSQL a false proposition or a real breakthrough? NewSQL System Overview

This text is adapted from the 2016 paper What’s Really New with NewSQL by Andrew Pavlo, associate professor of computer Science at Carnegie Mellon University, and Matthew Aslett, vice president of the 451 Institute. The following content is translated by the company’s technical team, which has been reorganized and edited by the recording software.

In recent years, a new class of database management systems (DBMSS) called NewSQL have emerged, boasting the ability to extend the workloads of modern online Transaction Processing (OLTP) systems in ways that previous systems could not.

Given that traditional DBMSS have been around for nearly 40 years, it’s worth thinking carefully about whether NewSQL’s advantages are what they say they are, or whether they are just a commercial exercise. If better performance is possible, then the next question is, are they really a technological breakthrough, or are they simply a bottleneck because of hardware developments?

To explore these issues, we first discuss the history of databases to understand the background and reasons for NewSQL’s emergence. It then discusses NewSQL concepts, features, categories, and NewSQL systems under each category in detail.

A brief history of database management system (DBMSS)

The world’s first database system, IBM IMS, was created in 1966. It was used to store information about parts and suppliers for the Saturn V and Apollo space exploration programs. The main contribution of IMS is to demonstrate the idea that application logic and data manipulation logic should be separated, and that application developers need only care about the logical changes to the data, not the implementation. After IMS came the first relational databases, mainly represented by IBM’s System R and the University of California’s INGRES, PostgreSQL’s predecessor. INGRES quickly became popular with other university information systems and was commercialized in the late 1970s. Around the same time, Oracle developed and released the first version of its DBMS using a Similar design to System R. In the early 1980s another wave of companies emerged, launching their own commercial database products, such as Sybase and Informix. After System R, IBM released a new relational database, DB2, in 1983, which reused some of the System R code, but the two are not open source.

From the late 1980s to the early 1990s, object-oriented languages became popular, which also gave birth to a number of object-oriented DBMS, in order to bridge the gap between database model and language. However, because there is no standard interface like SQL, these object-oriented DBMS has not been widely accepted in the market, but some of their design concepts are gradually integrated into relational databases, many popular relational databases have added support for Object, XML and JSON data. In addition, document-oriented NoSQL database is more or less an extension of object-oriented DBMS.

One of the big events of the 1990s was the release of two open source relational databases, MySQL and PostgreSQL. MySQL was born in Switzerland in 1995, mainly based on ISAM mSQL system development; PostgreSQL was launched in 1994 by two Berkeley students who redeveloped QUEL’s Postgres source code to add support for the SQL query language.

Since 2000, Internet applications have mushroomed, and these applications require far more resources than traditional software services. Internet applications need to support concurrent access by a large number of users and have high availability requirements. It is best to be always online. In practice, database has become the bottleneck of Internet applications. Many vendors try to expand vertically to improve the performance of single-machine hardware, but the improvement in this way is very limited, showing obvious diminishing marginal returns. And vertical scaling is often uneven, with moving data from one machine to another requiring long outages of service that are unacceptable to Internet users. To solve this problem, some companies customize middleware to shard data onto multiple common single-machine DBMSS:

A logical database is abstracted from the upper layer of the application, while the data is dispersed to different physical databases behind it. When an application initiates a request, this layer of middleware needs to forward or rewrite the request and distribute it to one or more nodes in the back database cluster. After these nodes complete the request and return the data, the former will aggregate the data back to the upper layer application. Two famous systems built from this idea are Oracle-based cluster of eBay and MySQL-based cluster of Google. Facebook later used a similar strategy to build its internal MySQL Cluster, which it still uses today. Although simple point-read and point-write operations can be handled by using the data sharding strategy of middleware, it becomes very difficult to update multiple data or join multiple tables in a transaction. Because of this, none of these early middleware supports this type of operation, and eBay requires users of these middleware to complete join logic in the application layer logic. This clearly contradicts the idea that application logic and data manipulation logic should be separated, exposing data manipulation logic back to application developers.

Finally, the sharding scheme based on middleware was gradually abandoned, and companies turned to developing their own distributed database. In addition to the problems exposed by middleware solutions, traditional database solutions expose two problems:

First, ** Traditional databases focus on consistency and correctness at the expense of availability and performance. ** But this tradeoff runs counter to the needs of Internet applications that focus on concurrency and availability.

Second, many functions of traditional relational databases are not applicable in Internet applications, and supporting these functions consumes additional resources. Using a lighter database might improve overall performance. ** In addition, the relational model may not be the best way to represent application data, and using full SQL for simple queries may seem like a “dead end”.

These issues were the genesis of the NoSQL movement from 2005 to 2010. ACID guarantees and relational models are commonly cited by NoSQL proponents as barriers to horizontal scalability and improved availability in traditional databases. Therefore, the core of the NoSQL movement is to abandon transaction strong consistency and relational models in favor of ultimate consistency and other data models (such as key/value, Graphs, and Documents). Two of the most famous NoSQL databases are Google’s BigTable and Amazon’s Dynamo, and since neither is open source, other organizations have started to launch similar open source alternatives. These include Cassandra for Facebook (based on BigTable and Dynamo) and Hbase for PowerSet (based on BigTable). A number of startups have joined the NoSQL movement, not necessarily inspired by BigTable and Dynamo, but echoing the NoSQL philosophy, most notably MongoDB.

In the late 2000s, there were already a number of distributed database products on the market for users to choose from. The advantage of using NoSQL is that application developers can focus more on the application logic itself, rather than the scalability of the database. But at the same time, many applications, such as financial systems and order processing systems, are rejected because they cannot waive the consistency requirement of transactions. Some organizations, such as Google, have found that many of their engineers focus too much on data consistency, which exposes database abstraction and increases code complexity. They have to either go back to traditional DBMS-scale up vertically with higher machine configurations, or go back to middleware. Develop middleware that supports distributed transactions. Both options were expensive, and the NewSQL movement was hatched.

2. The rise of NewSQL

This paper considers NewSQL to be a general term for a class of modern relational databases that provide scale-out performance for ordinary OLTP read and write requests while supporting ACID guarantee for transactions. In other words, ** these systems have the extensibility of a NoSQL database while maintaining the transactional nature of a traditional database. **NewSQL brings back the idea that application logic and data manipulation logic should be separated back into the world of modern databases, which proves that history always spirals upward.

In the 2000s, a number of data warehouse systems (such as Vertica, Greeplum, and AsterData) emerged that were designed to handle OLAP requests outside the scope of NewSQL defined in this article. OLAP databases are more focused on large, complex, read-only queries against massive amounts of data that can last seconds, minutes, or even longer. The NewSQL database is designed for read and write transactions with the following characteristics:

Take a short
Use index queries that involve small amounts of data (non-full table scans or large distributed joins)
It is highly repetitive and usually uses the same query statement and different query parameters

Some scholars also think that NewSQL system is a database that uses lock-free concurrency control technology and share-nothing architecture on ** implementation. ** All database systems we think of as NewSQL do have this characteristic.

3, NewSQL classification

Now that NewSQL has been defined, we can take a look at the picture of the entire NewSQL database today. NewSQL databases on the market can be divided into three categories:

The NewSQL database was completely redesigned and developed using the new architecture
A database that implements NewSQL features in the middleware layer
Database as a Service (DaaS) offerings from cloud computing platforms are also often based on new architectures

Prior to writing this article, both authors partitioned into NewSQL the solution of “Replacing a stand-alone database storage engine.” Typical examples of such solutions are some alternatives to MySQL and InnoDB, such as TokuDB, ScaleDB, Akiban, deepSQL, MyRocks and so on. The advantage of using a new storage engine is that the substitution is not noticeable to the application. Now, however, the authors backtrack and argue that extending stand-alone database systems by replacing storage engines and using plug-ins is not typical of NewSQL systems and is beyond the scope of this article. Often, solutions to improve database performance in OLTP scenarios by replacing the MySQL storage engine fail.

Next, we’ll discuss each of these three types of NewSQL databases.

1. New architecture

NewSQL systems built from scratch are most interesting because the project design has the greatest freedom without the design and architectural burden of older systems. All database systems in this category are shared-nothing based distributed architectures and contain the following modules:

Multi-node Concurrency Control
Multi-copy Data Replication
Flow control
Distributed Query Processing

Another advantage of using the new database system is that each component can be optimized for a multi-node environment, such as the query optimizer, communication protocols between nodes, and so on. As a concrete example, most NewSQL databases can pass data within the same intra-query directly between nodes, rather than routing data through a central node as some middleware based solutions do.

With the exception of Google’s Spanner, databases in this category typically manage their own storage modules, which means that these DBMSS are also responsible for distributing data to different nodes, rather than using distributed file systems like HDFS out of the box. Or Storage Fabric (such as Apache Ignite). This is an important design decision. Managing by yourself means “send the query to the Data”, while relying on third-party storage means “bring the Data to the Query”, where the former delivers the query commands and the latter delivers the Data itself. Obviously, the former is more friendly to network bandwidth resource consumption. Self-management of storage tiers also enables database systems to use more sophisticated and efficient replication strategies than block-based ones. In general, building your own storage tier can lead to higher performance gains.

Using the new architecture is not without its drawbacks. The biggest drawback is that the technology is so new that users worry that there are still many holes in the technology behind it, which further means that the user base of the new system is too small for the product itself to be polished. In addition, some widely accepted operation and maintenance, monitoring, alarm ecology also need to start from scratch. To avoid such problems, some database systems, such as Clustrix, MemSQL, TiDB, choose the communication protocol compatible with MySQL; CockroachDB, for example, chooses to be PostgreSQL compliant.

Examples: Clustrix, Cockroach DB, Google Spanner, H-Store, HyPer, MemSQL, NuoDB, SAP HANA, VoltDB, TiDB, etc.

Cockroach DB architecture

Transparent data sharding middleware

There are also products on the market that offer middleware solutions similar to those of eBay, Google, Facebook, and others, with ACID support. Each database node beneath the middleware typically:

Run the same stand-alone database system
Contains only a portion of the overall data
Not used to receive separate read and write requests

These centralized middleware are responsible for routing requests, coordinating transaction execution, distributing data, replicating data, and partitioning data to multiple nodes. There is usually a thin layer of communication modules within each database node that communicates with the middleware, performs requests on behalf of the middleware, and returns data. Together, all these modules provide a single logical unit database to the outside world.

The advantage of using middleware is that it is so easy to replace existing database systems that application developers are unaware of it. The most common standalone database in a middleware solution is MySQL, which means that in order to be compatible with MySQL, the middleware layer needs to support the MySQL communication protocol. Although Oracle provides MySQL Proxy and Fabric tools for compatibility, most companies choose to implement their own protocol layer compatibility to avoid potential GPL license issues.

The disadvantage of middleware based solutions is that they rely on traditional databases. ** Traditional databases generally use disk-oriented design architectures. They were born in the 1970s, so such solutions cannot use the memory-oriented design architectures used by some NewSQL systems. Thus, it cannot take advantage of its more efficient storage management module and concurrency control module. Some previous studies have shown that disk-centric architectural design somewhat limits traditional databases from using more CPU cores and more memory space more efficiently. At the same time, for complex queries, the middleware solution may introduce redundant query planning and optimization (middleware once, database node once), although this approach can also be regarded as partial optimization of the query.

Examples: AgilData Scalable Cluster, MariaDB MaxScale, ScaleArc, and ScaleBase.

Mariadb logical architecture

3. Database-as-a-service (DaaS)

Many cloud computing vendors offer NewSQL’s database-as-a-service offerings. With such services, the development team does not have to maintain the database system on proprietary hardware or on virtual machines purchased from a cloud service provider. Instead, the cloud service provider takes over the configuration, tuning, replication, backup, and so on. Users of a cloud service provider need only access the database system through a given URL, or manage the system using a control panel and API.

Consumers of DBaaS pay according to their resource utilization. Because different database queries can use very different computing resources, DBaaS vendors typically offer service guarantees by letting consumers determine their maximum resource usage limits (such as storage space, computing resources, and memory usage) rather than charging by number of queries, as with block storage services. DBaaS’s NewSQL system is Amazon’s Aurora, which is compatible with both MySQL and PostgreSQL. It uses a log-structured storage management module to improve I/O parallelism.

There are also companies building DBaaS solutions based on cloud platform services provided by large cloud providers, such as ClearDB, which can be deployed on top of the infrastructure of each cloud provider. The advantage of this solution is that the database can be distributed to different providers in the same area, reducing the risk of failure.

Examples: Amazon Aurora, ClearDB.

Deploy the architecture on Amazon Aurora cloud

4. The current status of NewSQL

In this section, we discuss the design ideas for each module of the NewSQL database system to see if there are any innovations in practice or theory.

1. MainMemory Storage

Traditional DBMS uses disk-oriented storage design, and data is mainly stored on block storage devices, such as SSDS or HDDS. Because the read and write speed on block storage devices is slow, these DBMSS will cache both the read data blocks and the data to be written in memory, improving efficiency by batch writing. Since memory is smaller and more expensive than disk, this design greatly reduces the cost of computers. Now, decades later, memory prices and capacity have improved, and most databases, with the exception of some very large OLTP databases, can be fully loaded into memory, which allows a DBMS to read data out of memory quickly. Modules like buffer Pool Manager and heavyweight concurrency control mechanisms are not even needed. The shift from disk-oriented to memory-oriented gives new possibilities for DBMS performance optimization.

Many NewSQL databases use memory-based storage architectures, whether it is academic experimental databases such as H-Store, HyPer; And the industry’s commercial databases, such as MemSQL, SAP HANA and VoltDB. All of these systems perform better under OLTP loads than external memory-based storage architectures.

In fact, the idea of storing entire databases in memory is not new. The University of Wisconsin-Madison laid the foundation for in-memory databases in the 1980s, covering indexing, query processing, data recovery, and so on. During the same decade, the first distributed memory database, PRISMA/DB, was developed. By the 1990s, the first commercial in-memory databases were available, including Altibase, Oracle TimesTen, and AT&TDataBlitz.

Altibase in-memory database architecture

The innovation in memory-oriented NewSQL databases is to try to reduce memory usage by removing less active data from the database. This allows these NewSQL databases to store more data than their own memory space without going back to the disk-oriented storage architecture. Broadly speaking, such practices require an internal trace mechanism to find inactive records in memory and flush them out to external storage if necessary. In the case of h-Store, the anti-Caching module clears out inactive records and places a tombstone record in its place. When a transaction wants to access the record, the transaction is aborted and a separate thread is started to load the record asynchronously and put it back into memory. Of course, a paging mechanism that directly uses the virtual memory of the operating system can achieve the same goal, as VoltDB shows. In order to avoid false negative in index lookup, all secondary indexes need to retain the key of the cleared record. If the data table used by the application has many secondary indexes, even if the record is cleared, it will still cause memory waste. Microsoft’s Siberia project addressed this problem by using the Bloom Filter to determine the presence of target records in order to reduce memory consumption. Although Siberia is not a NewSQL database, this approach is worth considering.

Instead of keeping track of recorded meta information, MemSQL, another NewSQL database, takes a different approach to the problem of having more data than memory. MemSQL organizes data in a log-structured format to improve data writing performance. In addition, the administrator can manually tell the database to store a table in columnar format.

Overall, memory-based storage schemes are not significantly innovative and can be seen as an extension of previous schemes.

2. Partitioning/Sharding

All NewSQL databases are scaled horizontally by partitioning the entire database into different subsets, known as Partitions or Shards.

Manipulating data on distributed databases is nothing new. PhiL Bernstein and his colleagues have been doing this since the late 1970s, and their SDD-1 project made many of the fundamental contributions to distributed transaction processing. In the early 1980s, the development teams of System R and INGRES built their respective distributed database systems: IBM’s R* adopted the share-nothing and disk-oriented design similar to SDD-1. The distributed version of INGRES is best remembered for a query optimization algorithm that dynamically splits queries into smaller queries recursively. Later, the GAMMA Project at the University of Wisconsin-Madison explored different sharding strategies.

However, none of these early distributed database systems took off in the long run for two main reasons:

First, computer hardware in the 20th century was too expensive for most companies to afford.

Second, high-performance, high-availability Internet applications did not exist in the 20th century, when the QPS of databases generally ranged from tens to hundreds

However, these two assumptions are no longer true, and with the help of various cloud infrastructures, open-source distributed systems and tools, building a data-intensive application has become much easier than in the past, bringing distributed databases back to the stage of history.

In a distributed database, a table is partitioned horizontally into segments based on a field or set of attributes, either through a hash function or based on a range. The associated data segments of multiple tables are often combined into the same shard (node), which is responsible for performing any requests that require access to data within that shard. DBaaS(Amazon Aurora, ClearDB) does not support this sharding strategy. Ideally, a distributed database should be able to automatically distribute queries to multiple shards for execution and then merge the results, which is supported by all NewSQL databases except ScaleArc.

The database schemas of many OLTP applications can be converted to a tree structure:

The primary key hash of the root table allows each query to be in a shard. Have a customer list, for example, database, according to the customer id divided the all customer order record, account information is stored on the same shard, so almost all transactions to be able to perform on the same shard, reduce the communication among the nodes of the database system, also do not need to undertake such as two submit (2 PC) costs, improve the overall performance.

There are some nodes in NewSQL databases that are not equivalent, such as NuoDB and MemSQL. NuoDB uses one or more nodes in the cluster to manage the storage Manager (SM). Each SM stores a data slice, within which the data is further divided into Blocks. The other nodes in the cluster are responsible for executing the TRANSACTION engine (Te), and each Te node caches the Blocks involved in the transaction. This is a typical computation-storage separation design. To execute a transaction, a Te node needs to acquire all Blocks of the transaction from the SM or other Te nodes. The Te then needs to acquire a write lock for the data to be modified and broadcast the data locally to the SM and other Te nodes. In order to reduce the problem of Blocks passing back and forth between TE nodes, NuoDB developed a load balancing strategy to distribute local Blocks on the same TE as possible. This approach enabled NuoDB to fragment data reasonably. You don’t need to make any assumptions about the data table (like the tree structure above). MemSQL also adopts the design of separation of computation and storage similar to NuoDB, but different from NuoDB, MemSQL’s computing nodes (Aggregators) do not cache any data, but break the complete query into small queries for each leafNode to execute. That is, compute nodes are stateless. Both NuoDB and MemSQL have independently scale-out capabilities for computing and storage. Whether heterogeneous node architecture is superior to homogeneous node architecture in performance, operation and maintenance complexity and other aspects has not been determined yet.

An important feature of the NewSQL database is live migration of data. It allows the database to rebalance data on different physical nodes, mitigate hot spots, and expand capacity without affecting the database’s own services. NewSQL is more difficult to do this than NoSQL, which also needs to keep ACID’s properties intact. In general, NewSQL has two scenarios to support online migration of data:

** First, the data is organized into coarse-grained logical fragments and hashed to physical nodes. ** Moves these logical shards between these nodes when rebalancing is required. This solution has been used in Clustrix, AgilData, Cassandra and DynamoDB.

The other is to rearrange data by value range at a finer granularity, such as on a tuple or a group of tuple. MongoDB, ScaleBase and H-Store adopt this scheme.

3. Concurrency Control

Concurrency control mechanism is the core and most important part of transaction processing in database system, because it involves almost all aspects of the whole system. Concurrency control allows different users to access the database as if it were a separate possession. It provides atomicity and isolation of transactions and affects the overall behavior of the system.

In addition to which concurrency control mechanism to use, another key design point for distributed database systems is whether to use a centralized or decentralized Transaction Coordination Protocol. Under the centralized protocol, the starting point of all transaction operations is the centralized coordinator, who decides whether to approve the operation or not. Under the decentralized protocol, each node maintains transaction state information to access its own data, and these nodes need to communicate and coordinate with other nodes to determine whether there are concurrent conflicts. A decentralized coordinator is more extension-friendly, but typically requires wall clocks on different nodes to be highly synchronized so that transactions can be fully ordered.

The first distributed database systems were born in the 1970s, and they generally used two phase locking (2PL) mechanism. Sdd-1 was the first database to support distributed transactions under a Share-nothing cluster, using a centralized coordinator. IBM’s R* is similar to SDD-1, but it uses a decentralized coordination mechanism, distributed 2PL, where each transaction locks the data it accesses. The distributed version of the INGRES database also uses distributed 2PL, but it relies on a centralized deadlock detection mechanism.

Almost all NewSQL databases have abandoned 2PL because of the complexity of resolving deadlocks. A popular variation of the concurrency control mechanism is timestamp ordering(TO), in which the database assumes that concurrent transactions will not be executed in an order that would violate the Serializable ordering. The most popular concurrency control protocol in NewSQL systems is the decentralized MVCC, where the database creates a new version for each record when an update operation occurs. Keeping multiple versions of a record allows a read transaction to block neither a write transaction nor a write transaction. NewSQL databases that use a decentralized MVCC mechanism include MemSQL, HyPer, HANA, and Cockroach DB. Although they all have more or less customized changes as needed, the concept at the heart of the solution is not innovative. MVCC was mentioned in an MIT PhD thesis as early as 1979, and in the early 1980s the first commercial databases using MVCC were created, including DigitaL’s VAX Rdb and Jim Starkey’s InterBase. The latter is also the author of the NuoDB and MySQLFalcon storage engines.

Classification of common concurrency mechanisms

Other database systems use a fusion of 2PL and MVCC. In this scheme, write transactions still need to acquire data locks according to the 2PL mechanism. Whenever a transaction modifies a record, the database creates a new version of the record in the manner of MVCC. Read transactions do not acquire locks and therefore do not block write transactions. The most famous implementation of this scheme is MySQL’s InnoDB, but it has also been adopted by Google Spanner, NuoDB, and Clustrix. NuoDB improves performance by broadcasting recorded version information over the Gossip protocol between nodes on top of the original MVCC. All middleware and DBaaS schemes inherit the concurrency control mechanism of the standalone database behind them, and since most of them use MySQL, they naturally adopt a hybrid mechanism of 2PL and MVCC as well.

This paper considers the concurrency control mechanism implemented by GoogleSpanner(including its descendants F1 and SpannerSQL) to be the most novel of all NewSQL systems. Although it is based on a hybrid mechanism of 2PL and MVCC, But what makes Spanner different is that it uses hardware devices (GPS, atomic clocks) to achieve high precision clock synchronization, and uses these highly synchronized clocks to generate timestamps for transactions and obtain the order of transactions, thus achieving data consistency across multiple versions of databases on a wan. CockroachDB supports the same transactional consistency, but it does not use these hardware devices, relying instead on a hybrid clock protocol based on low synchronization clocks and logical counters.

As of this paper, the only commercial NewSQL database not using a variant of MVCC is VoltDB, which still uses TO concurrency control, where transactions on each shard are scheduled one at a time, rather than interweaving operations in a transaction as MVCC does. In VoltDB, single-shard transactions are scheduled in a decentralized manner, while cross-shard transactions are scheduled in a centralized manner. VoltDB sorts transactions by logical timestamp and then executes them sequentially on shards. When a transaction is executed on a shard, it monopolizes all the data for the entire shard, so the system doesn’t have to deal with more fine-grained locking logic. The disadvantage of this shard based concurrency mechanism is that if a transaction involves multiple shards, and the network between the coordinator and these shards is delayed, these shards will idle away to handle other requests. Shard based concurrency control is not a new idea, either: Hector Garcia-Molina proposed a similar variant ina 1992 paper and implemented it in KDB and H-Store(VoltDB’s academic predecessor) in the late 1990s.

Overall, there is no significant innovation in the core concurrency control mechanism used by NewSQL databases, which is mainly the engineering of older methods in modern hardware and distributed environments.

4. Secondary Indexes

Supporting secondary indexes is easy for standalone databases because the data is all on the same node, but not easy for distributed databases. For example, suppose you have a customer table that is sharded to different physical nodes by customer iD. When a query is performed based on a customer’s email address, the query needs to be performed on each node to obtain the correct result.

For a distributed database to support secondary indexing, two design decisions need to be considered:

Where do I store the secondary index
How do I maintain secondary indexes in a transaction

If there is a centralized coordinator in a system, such as a middleware solution, secondary indexes can be placed on coordinator nodes and shard nodes. The advantage of this solution is that only one version of the index data needs to be maintained globally.

NewSQL databases with new architectures typically use a sharded secondary index scheme, where each node stores a portion of the index, rather than the entire index being stored on a separate node and copied to other nodes as needed. The tradeoff here is easy to understand:

Clustrix combines the above two schemes: the secondary index is sharded by scope, and each node stores the corresponding relationship between scope and sharding. When a query or update request is encountered, the request is first routed to the appropriate node, and then the latter performs the corresponding operation. This two-level design combines the best of both schemes.

If the NewSQL database does not support secondary indexes, it is common for developers to build their own secondary indexes and place them in a distributed caching system, but relying on external systems will result in inconsistent behavior between indexes and data, and developers need to be cautious.

5. Replication

Replication at the database level is the best way to ensure application availability and data persistence. All modern databases, including NewSQL systems, provide some form of data replication mechanism.

With database replication, there are two important design decisions:

** How to ensure data consistency across nodes? ** In a strongly consistent database system, newly written data must be persisted by all corresponding replication nodes before a transaction can be considered committed. In this way, all read requests can be sent to any replication node, and the data they receive can be considered as the latest data. A strongly consistent database system is all very well, but a DBMS maintaining such synchronization requires the use of atomic Commitment protocols such as 2PC to ensure data synchronization, and if a node fails or a network partition occurs during this process, the database service becomes unresponsive. This is also why NoSQL systems usually use weak weekly consistent or eventual consistent models, under which weak consistent guarantees, Usually the Master node can tell the transaction and commit without waiting for all replication nodes to persist data. All of the NewSQL systems we know of support strongly consistent data replication, but there is nothing new about how these systems implement strong consistency. State Machine Replication for database systems has been the basis of research since the 1970s, NonStop SQL, which came out in the 1980s, was the first distributed database system that used strong consistent replication.

** How to perform cross-node data propagation? ** has two main execution modes. The first is called active-active replication, where each replicated node performs the same request. For example, when a new request is received, the database system executes the same request on all replicated nodes; The second type is active-passive replication, in which the request is executed on one node and the state is transmitted to other replication nodes. Most NewSQL database systems adopt active-passive replication. This is mainly because each request reaches different nodes in different order. If active-active replication is directly used, data inconsistency may occur. In contrast, Deterministic DBMSs such as H-Store, VoltDB, and ClearDB all use active-active replication because transactions are guaranteed to be executed in the same order on different nodes in deterministic DBMSs.

NewSQL also allows for replication of wide area networks (Wans) in an engineering difference from previous database systems. In the era of cloud services, it is not difficult to deploy multi-location and multi-center applications. Although NewSQL can support wide area network data synchronization, users need to ensure the network quality between DCS. Spanner and CockroachDB are the only two databases to provide wan data synchronization consistency optimization until publication date. Spanner used a combination of atomic clocks and GPS hardware clocks, while CockroachDB used a hybrid clock scheme.

6. Crash Recovery

Another important feature of the NewSQL database system is the failover mechanism. In general, the fault tolerance of traditional databases mainly focuses on ensuring data persistence, but NewSQL also needs to provide higher availability, that is, ensure the normal use of database services when a failure occurs.

In traditional databases, fault recovery is usually implemented based on WAL. That is, the system is restarted after a fault occurs, and WAL is played back after loading the last checkpoint to restore the system to the correct state before the fault occurs. The scheme, called ARIES, was first invented by DATABASE researchers at IBM around 1990 and is now used by almost all database systems, or variants of it.

In a distributed database with Replicas, the traditional fault recovery scheme cannot be used directly because: After the master node crashes, the system selects a replica as the new master node. When the old master node comes back online, the online cluster has written a lot of new data, so checkpoint and WAL cannot keep up with the cluster. It needs to synchronize all new changes from the new Master node. At present, there are two main implementation schemes:

First, use local checkpoint and WAL to recover data and then pull new Logs from Master or other replication nodes. As long as the recovered node processes Logs faster than it can write data, it will eventually be able to keep up with the entire cluster. It is possible for databases that use physical/ Physiological logging to keep up with the entire cluster, as it is much faster to synchronize data updates locally than to execute SQL.

In the second. The advantage of this solution is that the same mechanism can be used to expand cluster nodes, instead of using local checkpoint and WAL mechanisms.

Typically, NewSQL systems based on middleware and DBaaS will add additional infrastructure, such as Leader elections, based on the failure recovery mechanism of a stand-alone database to achieve the required administrative capabilities. NewSQL systems based on the new architecture will choose to abandon the failover mechanism of stand-alone databases and use out-of-the-box log replication components such as ZooKeeper, Etcd, or implement the algorithms behind them such as Paxos, Raft.

The solutions and technical components described above have existed since the 1990s.

5. Future trends

Future database systems should be able to perform analytical queries and machine learning algorithms on newly generated data, often referred to as real-time Analytics or HyBRI DTransaction-Analytical Processing (HTAP). The ability to extract insights and knowledge from both historical and new data. Traditional business analytics and business intelligence can usually only be analyzed on historical data, but the value of data is usually highest at the time of production and gradually decreases over time, so if you can reduce the gap between data production and analysis, you can generate more value.

There are three ways to support HTAP workload:

** The most common method is to deploy two separate databases, one for processing THE OLTP workload, called the front-end database; The other set is for processing the OLAP workload and is called the back-end database. ** The front-end database processes the data generated by new transactions, while in the background, the system administration uses ETL tools to import the data from the OLTP database into the back-end database, usually a data warehouse. The application performs OLAP queries on the back-end database to avoid slowing down the front-end OLTP system, while new data generated in the OLAP system is pushed back to the front-end OLTP database, forming a closed loop.

** Another popular design proposal is Lambda Architecture. ** Uses another batch processing system, such as Hadoop and Spark, to compute composite views on historical data, while using a Streaming processing system, such as Storm and Spark Streaming, to compute quasi-real-time views on newly generated data. Batch systems typically need to periodically rescan the data set and feed the results back to the online service through the streaming system.

A common feature of the two schemes is that the data is directed to a heterogeneous system, which also brings many problems. First, it takes time to propagate the changes to another system, usually in minutes or hours, which directly results in the calculation of data in real time. Secondly, the operation and maintenance cost of deploying and maintaining the two systems is also high, usually accounting for about 50% of the total cost. Developers who want to analyze data from both systems need to write two sets of data retrieval logic. While using heterogeneous systems, there are some solutions that try to hide the fact that there are two systems and expose only one interface, this usually involves copying data from an OLTP system (such as Hbase) to an OLAP system (such as Spark) behind the scenes.

The third solution is to use an HTAP database, which supports THE OLTP workload with high throughput and low latency, and supports the illogically complex and long OLAP workload running on both historical (cold) data and new (hot) data. The main difference between the OHAP system and past universal databases is that The former combines some special OLTP implementation schemes (in-memory storage, lock-free execution) with OLAP (Columnar Storage, Vectorized execution).

SAP HANA and MemSQL were the first NewSQL databases to claim HTAP Workload support. HANA is implemented by internally using different execution engines. One engine stores data for rows and the other engine stores data for columns and the other engine stores data for OLAP workload. MemSQL uses different storage managers (SM) to store data, one for row storage and one for column storage, combining the two at the execution engine layer. HyPer transitioned from using a concurrency control, row storage data scheme similar to H-Store to using an MVCC, column storage scheme to support more complex OLAP queries. VoltDB has also shifted their marketing strategy from pure OLTP to streaming support. Similarly, S-Store attempts to increase streaming power based on the H-Store architecture. Even some olAP-targeted systems in the 2000s, such as Greenplum, began to add OLTP support.

Although the rise of HTAP database mean the end of the single large OLAP data warehouse, but in the short term it will not happen, because now most of the data warehouse is currently the company general backend database, to store all the historical data of the company, but one day, meet the OLAP workload will no longer be achieved through mobile data.

Six, summarized

The most important takeaway from this article is that ** NewSQL databases are not leapfrog innovations compared to existing database architectures, but merely the next phase in the ongoing development of database technology. ** Most NewSQL databases use implementation technologies that already exist in traditional databases developed in both industry and academia, but they are not centralized on a single system. Therefore, the innovation of NewSQL database is mainly reflected in integrating these technologies and ideas into the same system. This is not easy to achieve in engineering, they are the by-products of the cloud native era, but also the demand of Internet products.

In the future, NewSQL databases are likely to have a significant impact on the market. NewSQL systems will have to work harder to gain market share due to the maturity and deep pockets of some traditional database service providers. In the past 5 years after NewSQL was proposed, many companies with NewSQL as the core products gradually emerged, such as GenieDB, Xeround and Translattice. According to our communication with these companies, it was found that the industry was slow to accept NewSQL database. Nowhere near as fast as NoSQL has been adopted in this regard. This is because NewSQL databases are primarily designed to accommodate the OLTP workload of Internet products, and these Internet companies tend to be conservative in their technology selection.

Since 2000, startups with OLAP databases at their core have been acquired. In contrast, it wasn’t until 2016 that the first deal to buy a NewSQL startup took place, when Tableau acquired HyPer’s parent company. Instead, many large database vendors have chosen to improve their database systems rather than acquire startups. In 2014, Microsoft added the memory-based Hekaton engine to SQLServer to improve its performance on the OLTP workload. Oracle and IBM have been slower to add column storage extensions to their systems to compete with OLAP databases such as HP’s Vertica and Amazon’s Redshift.

In the longer term, we believe that the following four types of databases will gradually converge:

1980-1990 old traditional database
OLAP data warehouse after 2000
NoSQL databases after 2000
NewSQL database after 2010

When that convergence happens, the labels don’t matter.

schedule

The following table is the comparison of NewSQL database in multiple modules when the research was published:

END

For more information, please pay attention to the wechat official number “soft record number” ~