Database cluster

Introduction: Not long ago, I contacted a customized APP company through an online shopping platform, and asked it to customize an APP for the school campus meal ordering system. The technology used in the front end and back end was satisfactory, but when it came to the deployment mode of the back-end database, I immediately saw the advantages and disadvantages of the back-end function performance. Although the technology used by the front and back end is very high, but in the deployment mode of the back-end database, it is the simplest and most weak way of single-node deployment. Considering that the concurrent performance of APP on the backend may greatly reduce the user’s fluency, I decided not to choose this product. So this article will briefly summarize the deployment options for back-end databases.

1. Database of single machine and single node

A single-node database is the simplest back-end database deployment mode. The so-called single node is that all the data of a class of applications or services are stored in a database node, and a database node is a database system service. The number of concurrent requests handled by a database node is fairly limited. If network applications or services rely on only one node for data, they are likely to crash during heavy network traffic.

2, Single-node database multi-concurrent performance test ‘

--concurrency number of concurrent accesses --number-of-queries Number of CONCURRENT SQL statements for a single database.Copy the code
mysqlslap -hLocalhost (or host IP)-uRoot (database username)-p1234(password)-P3306(Port) --concurrency=50 --iterations=1 --auto-generate-sql
--auto-generate-sql-load-type=mixed
--auto-generate-sql-add-autoincrement
--engine=innodb
--number-of-queries=50
Copy the code

50 concurrent 50 SQL statements is not too much of a problem, obviously this standard is too low, let’s slightly adjust the parameter higher

— Concurrency 1000 –number-of-queries 1000

It can be seen that when the database node faces 1000 concurrent connections, it will face overload, which means that many connections cannot be served, which is the main factor causing the lag of App or other network products. How to solve the database overload caused by the lag? You can use database clustering technology to enhance the ability of the database to handle large concurrent connection requests.

Single-node database

1. Well-developed Internet applications have a large user base, so the architecture must maintain high reliability under extreme circumstances

2. The single-node database cannot meet the performance requirements of actual production

3. The single-node database does not have redundancy design and cannot meet high availability requirements

3. PXC database cluster

A single PXC database cluster

  1. This is a typical PXC cluster database. Compared with other database clusters, its biggest characteristic is that all nodes in the cluster keep strict synchronization, strong consistency of read and write, suitable for storing high-value and important data. The program reads the same data from each node at any time. The data written by any node to the database will be synchronized to other nodes and then returned to the client. If any node fails to write to the database, all operations will be rolled back.

  2. A cluster system also has a load balancing function to balance access operations at the network back end. This load balancer is a middleware common with HA. This load balancer middleware will pass SQL statements from the client to different nodes for them to execute separately.

Multiple database clusters

  1. In the actual application, a single cluster is far from enough. There is a performance bottleneck for the database, that is, the performance of the single table records exceeds about 2kW. This should be avoided when storing data. Since all the nodes in a PXC cluster are synchronized, additional PXC clusters are required to store data.

  2. As shown in the figure, two PXC clusters, it should be noted that there is no data synchronization between these two PXC clusters, and the synchronization operation exists within the PXC cluster. The operation between two PXC clusters is data sharding, and the function of performing this operation is also a middleware. The most commonly used is MyCat data sharding middleware based on Ali open source. MyCat will calculate the weights of each field in the records after receiving SQL statements and send them to which cluster for them to execute respectively. It works like a router on the Internet.

4. Replication cluster deployment

Compared with PXC’s strong consistency, mysql cluster has a non-strong consistency solution called Replication. It does not sacrifice performance to form a cluster, which also causes it to fail to ensure strong consistency of read and write. We can say its disadvantages even its advantages.

As shown in the above for the two Replication clusters, in one of the clusters of data written to a database node, and submit the transaction, the transaction as long as in the node submit successful can be returned to the client to submit information, don’t have to wait for all return the data synchronization database node, at this time if you have other synchronous write a database node failure, The synchronization failure information is not returned to the client, so the transaction committed by the client is not rolled back. This is the main difference from the PXC cluster solution, where the PXC cluster waits until all nodes are synchronized and then returns to the client. This also leads to a loss in PXC performance, but this is a sacrifice for PXC read and write consistency.

It’s a simple way to think about it

  • In a PXC cluster, concurrent read/write performance is sacrificed for the sake of high read/write consistency of cluster nodes.
  • A Replication cluster sacrifices strong read/write consistency of cluster nodes for the sake of cluster concurrent read/write performance.

5. How to select a database cluster

  • PXC clustering emphasizes read and write consistency, so it is suitable for storing critical data such as money, accounts, and goods, which are prone to financial disputes.

  • A Replication cluster does not emphasize read and write consistency and is therefore not suitable for storing valuable or critical data. It is suitable for processing large and robust data that is typically used for statistical purposes. For example, the relationship between goods and customers’ preferences.