Now do anything to see the input output ratio, corresponding to the database is actually cost-effective. POLARDB, as an Ali-developed database, is often asked the question: What is the performance? Can you support my business? Is it expensive? Obviously, in the early stage of investigation, when it is difficult to quantify the stability and reliability of the index, the performance of fast became a very critical decision factor.

POLARDB design in the first place the demand as a key performance indicators included in the product requirements specifications, from architecture design to the new hardware selection, and then to code implementation, from the drivers to distributed storage, and then to a distributed file system and database engine, through the entire technology stack do collaborative optimization, to ensure that there are orders of magnitude in performance finally.

The technology behind high performance



This architecture shared at the 2018 Hangzhou Computing Conference shows the internal details of POLARDB. From the bottom up, POLARDB consists of four parts: __ shared distributed storage PolarStore__, __ distributed file system PolarFS__, __ multi-node database cluster PolarDB__ and __ agent PolarProxy__ that provides unified entry.

PolarFS

The PolarFS design uses the following techniques to maximize I/O performance:

  • PolarFS uses a single-thread finite state machine with bound CPUS to handle I/O, avoiding the context-switching overhead of multithreaded I/O pipelines.
  • PolarFS optimized memory allocation, using MemoryPool to reduce the overhead of memory object construction and destruction, and giant pages to reduce the overhead of paging and TLB updates.
  • PolarFS uses a center plus local autonomy structure, and all metadata is cached in the memory of each component of the system, basically avoiding additional metadata I/O.
  • PolarFS uses a full user space I/O stack, including RDMA and SPDK, to avoid the overhead of the kernel networking stack and storage stack.

In comparison tests with the same hardware environment, the write performance of the block 3 copy in PolarFS was close to the delay performance of the single-copy local SSD. Thus, the single-instance TPS performance of POLARDB can be greatly improved while ensuring data reliability.

PolarDB

PolarDB pioneered the introduction of physical Log instead of traditional logical Log, which not only greatly improves the efficiency and accuracy of replication, but also saves 50% I/O operations. For databases with frequent writes or updates, the performance can be improved by more than 50%.



PolarProxy

The meaning of PolarProxy is to integrate the resources of multiple computing nodes at the bottom together, provide a unified entrance, let application access, greatly reduce the cost of application using database, but also convenient from the old system to POLARDB migration and switch. In essence, PolarProxy is a capacity adaptive distributed stateless database proxy cluster, dynamic horizontal expansion ability, POLARDB can quickly add and reduce the advantages of reading nodes to the extreme, improve the throughput of the entire database cluster, access to its ECS, the higher the concurrency, the more obvious advantages.

About the cost

Regardless of the cost of performance, are playing rogue.

First of all, POLARDB storage and computing separated architecture, can let THE CPU, memory and disk to get rid of the trouble of mutual restriction, so that computing and storage as a separate resource pool for management and allocation, greatly reduce resource fragmentation, improve the overall resource utilization rate. With different computing and storage models, we can also make more targeted customization and optimization to reduce the cost per unit of resources.

In a general sense, diminishing marginal costs from economies of scale will happen all the time. Building on alibaba’s massive infrastructure, we can continuously reduce our costs from multiple dimensions, including global supply chain, low energy data center, server research and development.

Cost performance

No matter how advanced the technology is or how low the cost is, it still needs user approval.

Therefore, from the user’s point of view, we are most concerned about the cost performance, that is, the same cost, whether we can get better performance.

Let’s take a look at the price/performance ratio of POLARDB and RDS MySQL.

To be fair, we used the same database configuration, tested the data set and the test method, and then calculated the price and performance of each separately.

Among them,

  • The database configuration is common in the production environment: 8-core 32 GB, 500 GB storage
  • Sysbench OLTP test cases (–oltp-tables-count=10 –oltp-table-size=500000)
  • QPS (Query per Second number of requests processed per Second) and TPS (Transaction per Second number of transactions processed per Second)

In addition, both RDS MySQL and POLARDB have the ability to Scale out by increasing the number of read-only nodes. The difference is that POLARDB does not require additional storage costs as the number of nodes increases. Therefore, we need to compare several architectures. The number of read nodes ranges from 1 to 3, as follows:

POLARDB monthly price (computing specifications + storage)RDS MYSQL monthly price POLARDB than RDS POLARDB performance (QPS)RDS MYSQL performance (QPS)POLARDB performance improved thousands of yuan performance indicators (RDS vs. POLARDB)1 primary/secondary 2000*2 +1575=55754100+400=4500+24%53879.4618625.49+190%4139 vs. 96641 Active/standby 2000*3+1575=7575(4100+400)+(5.13+0.001*400)*24*30=8481-10% 87951 Master 3 standby 2000*4+1575=9575(4100+400)+(5.13+0.001400)24302=12463-23%80268.2743087.33+86%3457 vs. 8383

Some basic prices in the table (taking the price of the day on November 8, 2018 as an example) :

  • POLARDB storage monthly price: 1575 YUAN/month = 500GB * 3.5 yuan /GB/ month
  • The monthly price of one POLARDB node is 2000 yuan
  • A high availability edition instance of MySQL (dual node) costs $4100 per month
  • MySQL high availability instance storage monthly price: 400 yuan/month = 500GB x 0.8 yuan/month
  • MySQL read-only instance price: 5.13 YUAN/hour
  • MySQL read-only instance storage hour price: 0.001 yuan /GB/ hour

The picture below shows it more clearly, with the gray “standby warehouse” not serving the public. This shows that POLARDB has a very high cost performance ratio and all nodes provide services, so resource utilization is also higher than RDS.



Easter Eggs – Complex SQL query acceleration for free use

In practical applications, the customer’s business is complex, and in many cases, the business access is mixed with a large number of complex SQL (Ad-Hoc Query) of statistical analysis classes, at which time MySQL’s single-threaded model will be overwhelmed.

POLARDB responds to this scenario by building in a parallel query engine that delivers up to 8 times better performance for large table complex queries (such as TCP-H benchmarks), especially for slow SQL (such as report queries) that take more than 1 minute to execute. It also supports advanced syntax such as set operation, WITH and window function OVER. This feature is currently in public beta and is free to use. For details, see SQL Acceleration.

In the chart below, we conducted a comparative test. The query efficiency in the case of SQL acceleration is more than 8 times that in the case of direct query without SQL acceleration. Specific test cases include,

  • Point Query on a single row query for a non-indexed column
  • Aggregate Query
  • TPC-H



Current SQL acceleration features provide additional join addresses to provide non-transactional complex query services. The underlying computing nodes and storage reuse POLARDB existing resources, a variety of access to data, eliminating the trouble of data migration, also do not need additional cost input.

Technical implementation, including the following points,

  • Plug-in computing engine, responsible for distributed operator computing
  • Single-table calculations are pushed down to the database engine
  • Large SQL parallel computing



The original link

This article is the original content of the cloud habitat community, shall not be reproduced without permission.