Brief introduction:Transparent distribution is an upcoming PolarDB-X capability that will allow applications to run on PolarDB-X as if they were running on a stand-alone database. Compared with the traditional middleware type “distributed database”, PolarDB-X with transparent and distributed capability no longer needs the application to consider the concept of partitioning key, and the application can completely migrate the table-building sentence and application code developed on stand-alone MySQL directly to PolarDB-X to run. This article will introduce you to the new experience of transparent distribution in PolarDB-X.

PolarDB – X 2.0 video interpretation: https://yqh.aliyun.com/live/polardbx2021

Transparent distribution is an upcoming PolarDB-X capability that will allow applications to run on PolarDB-X as if they were running on a stand-alone database.

Compared with the traditional middleware type “distributed database”, PolarDB-X with transparent and distributed capability no longer needs the application to consider the concept of partitioning key, and the application can completely migrate the table-building sentence and application code developed on stand-alone MySQL directly to PolarDB-X to run.

This article will introduce you to the new experience of transparent distribution in PolarDB-X.

Install a WordPress on PolarDB-X

WordPress is an open source blogging software that uses MySQL as its database. The operation is to install a WordPress on PolarDB-X to experience the transparent distribution capabilities of PolarDB-X.

We’ll follow three simple steps:

  1. Create a table without modifying the DDL
  2. Run without modifying the app
  3. Do pressure test, do down optimization

The summary is as follows:

  1. Using the official WordPress image, the installer automatically creates tables and initializes data on PolarDB-X without making any changes, using standard MySQL syntax.
  2. The monitoring data of PolarDB-X show that the load and data volume of each node are in a balanced state.
  3. Through the SQL analysis, DAS and other tools provided by POLARDB-X, you can easily find the hot SQL in the system.
  4. DBAs can directly create indexes, modify data distribution and other DDL statements to further optimize system performance, without modifying the application.

PolarDB-X implements transparent distributed weapons

Here’s how PolarDB-X is transparently distributed.

Transparent data partition

PolarDB-X is a typical distributed database with Share Nothing. Its simplified architecture is as follows:

Its core components are stateless computing node CN and stateful storage node DN.

To understand PolarDB-X’s transparent distribution capabilities, first understand how data is distributed across PolarDB-X.

In PolarDB-X, a table consists of multiple indexes, including primary keys, secondary indexes, and so on. PolarDB-X will partition each index independently with the key of the index.

For example, a typical e-commerce scene, the order table, has a primary key (ID) and two indexes (Seller \_id and Buyer \_id) :

CREATE TABLE ORDERS (ID BIGINT, BUYER_ID VARchar) CREATE TABLE ORDERS (ID BIGINT, BUYER_ID VARchar (BUYER_ID), SELER_ID VARchar (BUYER_ID), SELER_ID VARchar (BUYER_ID), PRIMARY KEY (ID), index sdx(seller_id), index bdx(buyer_id) )
  • For primary key indexes, they are partitioned by ID
  • For index SDX, it is partitioned by Seller \_id
  • For the index BDX, the Buyer \_id is partitioned

As shown in the figure below:

After sharding the index, PolarDB-X will shard these shards into different storage nodes, and load balance will be carried out according to data volume and other information, as shown in the figure below:

In PolarDB-X, the partitioning key can be ignored in the table building sentence, and PolarDB-X can also automatically sharde and load balance tables.

Therefore, when applying PolarDB-X migration, you can export the construction sentences in standalone MySQL and execute them directly in PolarDB-X without modification.

Transparent distributed transactions

Distributed transaction is the most important basic capability of POLARDB-X, which is widely used in business and avoids the transformation of transaction code by business. PolarDB-X also uses transactions internally for indexing.

PolarDB-X’s distributed transactions have the following characteristics:

  1. As with Spanner, external consistency is the strongest consistency level
  2. Syntax is fully compatible with MySQL and no application modification is required
  3. MySQL-compatible RC and RR levels are supported in behavior

The principle of PolarDB-X distributed transactions has been covered extensively in our column and won’t be repeated here. Those who are interested in its principle can refer to these articles:

https://zhuanlan.zhihu.com/p/329978215

https://zhuanlan.zhihu.com/p/338535541

https://zhuanlan.zhihu.com/p/355413022

Online DDL

PolarDB-X supports a wide variety of Online DDL types. Here are some representative DDL types.

Index maintenance

PolarDB-X is a global index that contains the following types of indexes:

  • Normal index
  • The only index
  • Clustering index

The cluster index is a new type of index in PolarDB-X relative to MySQL, which contains all the columns in the table, avoiding the cost of going back to the table.

The index creation in PolarDB-X is done through DDL and is Online without blocking the business.

Such as:

  • CREATE INDEX idx1 ON t1(name)
  • CREATE CLUSTERED INDEX idx1 ON t1 CLUSTERED INDEX idx1 CLUSTERED INDEX idx1 ON t1(name)

INSTANT ADD COLUMN

Column addition operations are the most common type of DDL in the business. In MySQL, the time taken to add columns is related to the amount of data (in MySQL8.0, adding columns at the very end of a table is Instant).

In PolarDB-X, adding columns anywhere is INSTANT, which means adding columns is a constant second time, independent of the amount of data, and has no impact on the business.

Partition resizing

PolarDB-X supports four table distribution strategies: Hash, Range, List and Broadcast. PolarDB-X uses the Hash policy by default because it avoids the hot spot of continuous writes, and in most cases, this policy meets the performance requirements of the system well.

However, if the business wants to choose the appropriate partitioning policy to improve the system performance during operation, PolarDB-X can easily make adjustments through DDL statements, and PolarDB-X will reorganize the table data according to the new partitioning policy.

Such as:

  • ALTER TABLE t1 PARTITION BY Hash (name) ALTER TABLE t1 PARTITION BY Hash (name)
  • ALTER TABLE t1 PARTITION BY HASH(name) PARTITION 32
  • ALTER TABLE t1 BROADCAST ALTER TABLE t1 BROADCAST
  • ALTER TABLE t1 PARTITION BY RANGE(id) ALTER TABLE t1 PARTITION BY RANGE(id)

Any two partitioning policies can be converted using DDL statements:

The backfill speed is adaptive

Must have a lot of students have such experience: a large table DDL operations, due to the amount of data is large, can’t fulfill the DDL operations within a day, in order to avoid impact on the business, human flesh business peak during the day comes, adjust parameters, slows the backfill of DDL, night at the business end of the peak, speed up the backfill of DDL.

The backfill in the POLARDB-X automatically adjusts the speed based on the current system load.

Such as:

In this example, there are four stages:

  1. At first there was no business load, and the DDL backfill rate increased to 25W rows/second
  2. The business load began to rise, and the DDL backfill rate quickly dropped to 13W rows per second
  3. The service TPS is stable at 1W5, and the DDL backfill speed is stable at 13W lines /s
  4. After DDL, the service TPS was stable at 1W6

From this example, we can see that the backfill rate of the POLARDB-X DDL is automatically adjusted according to the business load, and there is little impact on the TPS of the business during the DDL.

Make the Online more Online

To further reduce the impact on the business during DDL, PolarDB-X also uses a number of technologies, such as:

  • Metadata version more, see: https://zhuanlan.zhihu.com/p/347885003
  • Can be suspended, can be cancelled
  • MDL deadlock detection

We will in the next article detailed introduction of these technologies in detail, please pay attention to our zhihu column: https://www.zhihu.com/org/polardb-x

conclusion

PolarDB-X’s transparent distribution capabilities will significantly reduce the cost of migrating distributed databases from stand-alone databases. At the same time, we will also make it more transparent in the future. Some of the things we are working on include:

  • More sophisticated scheduling strategies
  • Visual display of hot data, intelligent diagnosis linked with SQL audit analysis
  • Partition-level truncate is supported in the presence of global indexes
  • Time scrolling and cleaning of data
  • , etc.

Copyright Notice:The content of this article is contributed by Aliyun real-name registered users, and the copyright belongs to the original author. Aliyun developer community does not own the copyright and does not bear the corresponding legal liability. For specific rules, please refer to User Service Agreement of Alibaba Cloud Developer Community and Guidance on Intellectual Property Protection of Alibaba Cloud Developer Community. If you find any suspected plagiarism in the community, fill in the infringement complaint form to report, once verified, the community will immediately delete the suspected infringing content.