• Review: Computing storage separation, advantages and disadvantages of local storage

  • MySQL implements zero data loss based on local storage

  • The performance comparison

  • Implementation based on Docker + Kubernetes

Computing storage separation

  • Architecture is clear

  • Independent expansion of computing and storage resources

  • Improve instance density and optimize hardware utilization

  • Simplify instance switchover process: Staid data is dropped to the storage layer. When Scheduler schedules data, it does not need to know the storage media of compute nodes and only needs to schedule data to nodes that meet computing resource requirements. When database instances are started, only the Mapping volume needs to be mounted to the distributed file system. This improves the deployment density and computing resource utilization of database instances.

    In MySQL, for example

  • More versatile, suitable for Oracle and MySQL, see: containerized RDS — “Split-brain” under computing storage Separation Architecture.

  • With the introduction of distributed storage, architecture complexity increases. Once distributed storage issues are involved, DBAs cannot solve them closed-loop.

  • Distributed storage selection:

    Choose commercial, Storage Verdor Lock In risk.

    By choosing open source, GlusterFS and Ceph were tested by most users (including Wallach) for a database Sensitive scenario, and the performance was completely unacceptable.

The local store

  • The physical capacity is limited by the single-node capacity.

  • Scheduling is more complex. If the storage type of a database instance (such as SSD) is selected, the database instance can only be scheduled to Physical nodes with SSDS once failover occurs. As a result, the scheduler needs to be Aware of the Physical Topology of the Physical nodes.

  • Density is difficult to increase, which is a side effect of “Physical Topology Aware”.

  • Because of the great difference between different database schemes, the universality can not be guaranteed.

  • Transaction capacity (TPS) per unit of time is inversely proportional to the number of cluster members

  • Adding cluster members significantly and unpredictably increases transaction response times

  • Increased the possibility of cluster member data replication conflicts and deadlocks

  • Change binlog-based to write-set based, which contains the modified data, Global Transaction ID (GTID), and Primary Key.

    45 eec521 GTID similar e0-0800-2-2 f34 11 a36050b826b: 94530586304

    94530586304 is a 64-bit signed integer that represents the position of a transaction in a sequence

  • Change traditional Synchronous Replication to Deferred Update Replication and roughly break the whole process into four phases, local, send, validation, and application, where:

    Local phase: Optimistically executed, assuming that the Transcation will not conflict when replicated in the cluster before the transaction Commit.

    Send phase: optimize the synchronization time window, remove global sort and obtain GTID for synchronous operation, conflict verification and transaction application are asynchronous, greatly optimize the replication efficiency.

    In the validation stage, only after all the pre-ordered transactions have been received can the transaction and all the pre-ordered transactions be validated concurrently. Otherwise, the Global Ordering cannot be guaranteed. Therefore, serialization should be introduced at the cost of efficiency.

    Need to wait for transaction 3

  • 3 Database node:

  • 3. Database nodes: Set the weights to avoid the “split-brain” (⅙ + ⅙), + one-third + one-third

  • 5 Database node:

  • 6 Database Node:

  • 7 Database node: Supports two topologies

  • Based on Corosync implementation (Totem protocol), plug-in installation, MySQL official native plug-in.

  • Cluster architecture, supporting multiple writes (single write is recommended)

  • Allow a few node failures, low synchronization delay, strong consistency, zero data loss

  • Volume per unit time is affected by flow control.

Vitess

  • The project, which is open-source by Youtube, is extremely powerful and highly productized in terms of documentation.

  • Join CNCF as the second store project (Rook), which is still in incubation level.

  • The author does not use experience, also do not know which domestic users, do not comment.

  • MGR 5.7.17 / PXC 5.7.14-26.17

  • MGR 5.7.17 / PXC 5.7.17-29.20 / MariaDB 10.2.5RC

  • Local storage/computing storage separation

Performance Comparison 1: MGR 5.7.17 / PXC 5.7.14-26.17

  • PXC 5.7.14-26.17 (Implemented based on Galera 3)

  • Load model: OLTP Read/Write (RW)

  • They offer: sync_binlog=1, Innodb_flush_log_at_trx_commit =1

  • Non-bug: Sync_binlog =0, Innodb_flush_log_at_trx_commit =2

Performance Comparison 2: MGR 5.7.17 / PXC 5.7.17-29.20 / MariaDB 10.2.5RC

  • Added MariaDB for comparison

  • PXC is updated to 5.7.17-29.20, which improves MySQL write-set replication layer performance [3].

  • Load model: Still use OLTP Read/Write (RW)

  • Durability: sync_binlog = 1

  • Non – durability: sync_binlog = 0

Performance Comparison 3: Local storage/compute storage separation

Test report of @Hatano and @Hanjie

Docker + Kubernetes + MGR / Galera Cluster

  • Docker + Kubernetes + PXC

  • Docker + Kubernetes + MGC

  • Docker + Kubernetes + MGR

Docker + Kubernetes + Vitess

  • O&m: deployment and backup

  • Elastic: Computing storage capacity expansion and cluster capacity expansion

  • High availability: For example, minor differences in failover affect services

  • Fault tolerance: Such as the impact of the network on the cluster, especially in the case of network jitter or significant latency

  • Community activity

  1. https://dev.mysql.com/doc/refman/5.7/en/group-replication-background.html

  2. http://mysqlhighavailability.com/performance-evaluation-mysql-5-7-group-replication/

  3. https://www.percona.com/blog/2017/04/19/performance-improvements-percona-xtradb-cluster-5-7-17/

  4. https://github.com/kubernetes/kubernetes/tree/master/examples/storage/mysql-galera

  5. https://github.com/kubernetes/kubernetes/tree/master/examples/storage/vitess

Specific training content