Abstract

How to implement high capacity and large concurrent database service? With the rapid development of the company’s business and the bottleneck of the single instance database, how to do a good distributed design and provide high concurrency and high-performance database services to support the growth of the business?

This share mainly includes database distributed architecture design ideas, split principle, transformation difficulties, solutions and so on, so that the database will no longer become the bottleneck of business development.

On June 4, 2017, Hongyi, chief database architect of Kangaroo Cloud, gave a speech in the “Road of enterprise Internet architecture optimization and upgrading”, “Let the database no longer become the bottleneck of business development — distributed Database Architecture Design”. IT said as an exclusive video partner, authorized by the organizers and speakers review and release.

Read the word count: 1705 | 3 minutes to read

Video playback of guest speech:
t.cn/RQCVFyS

Why do distributed?

High concurrency: Distributed applications result in a greater number of database requests.

High volume: Business growth, generating large amounts of online data.

Resources extend upward with ceilings.

Supports rapid service development and smooth capacity expansion.

Split principle: Step by step



At the beginning of the business, the number of customers is relatively small, so all services and data can be stored in one instance, which can support the development of the business.

After the development of customer volume, data volume and concurrency are up, then the database is easy to bottleneck. We suggest that you first carry out servitization transformation, combing different business modules vertically, isolating databases of different services from each other, and realizing the interaction between them by business. This allows the database to be distributed across different instances, enabling relatively high concurrency and capacity.

Moving forward, instances are still a bottleneck, and we should consider doing horizontal splitting. Distributing data from a service across different instances to support scalable, high-concurrency, high-capacity database services.

The split needs to step by step, and then do service comb, and finally do horizontal split.

Horizontal splitting introduces some complexity and there is a lot of research and development to consider, so it needs to be done carefully. The split of the database is so tightly tied to the business architecture that sometimes a small change in the business can keep the stress out of the database.

The difficulty of horizontal separation

A horizontal split of a service’s data will be distributed across different underlying databases, so there is some complexity.

The system architecture needs to adapt to the distributed database, so some changes need to be made. The technical challenge is that applications need to handle complex distributed logic, such as distributed transactions and cross-library queries. There will be some challenges in terms of stability, but not the main one. Distributed data is distributed on different databases. It does not support cross-library join, distributed transaction, and global sequence.

Solution: The client implements data routing

Do a configuration directly on the client to implement the routing function. The advantage is that no additional modules need to be introduced, and the overall architecture remains unchanged; The control force of the program is strong, the scene is simple and convenient to use; Intrusive to code; Configuration management is complex.

This solution does not introduce additional components. It is lightweight in architecture and is applicable to simple scenarios, such as complex configuration management. Therefore, this solution is not recommended.

Solution: database middleware

Realize automatic database table, transparent to the application; Very low threshold for use, small amount of application modification; Convenient dynamic horizontal expansion; Various customized functions for distributed, such as heterogeneous index, small table broadcast, etc. Most importantly, with database middleware, the application is still a single database.

The use of middleware maximizes the shielding of the complexity introduced by the distributed database and greatly reduces the threshold of research and development.







Introduction to DRDS functions

As the core function of DRDS, database segmentation and table partitioning support multi-dimensional data segmentation and routing access. The built-in read/write separation function enables flexible configuration of access weights. Built-in global unique ID component; The query engine identifies and pushes down complex queries, compatible with 98% MySQL syntax; Flexible capacity expansion components enable automatic online horizontal capacity expansion.



DRDS physical framework

Data splitting, can combine 1K MySQL; Distributed SQL query engine with high SQL compatibility; Smooth expansion of data stores; Elastic expansion of processing properties; Read/write separation (application transparency); Small table broadcast, cross-library join, global sequence.



The primary and read libraries are implemented through native replication of the database, and the data is strongly consistent. The DRDS automatically determines the request and then distributes it. All transactional operations are routed to the primary database, and some read operations are carried out by the read database.



Push join down from DRDS layer, implement it in MySQL layer, and avoid cross-library join in business design.



Distributed SQL specification design: Best practices

Queries should include database separation conditions whenever possible. If you split a table into ten underlying libraries and query it with a split criterion, DRDS can clearly route requests to the underlying libraries.

There are several solutions when joining. One is to join both tables with the same library key, thus limiting them to one library. Another is broadcast table, join the different fields, but each table with the condition of library separation, so will still be restricted to the same library.



EasyDB: Database automation management platform

Resources: Monitors database and server space usage in real time.

High availability: High availability architecture design in the cloud, automatic failover.

Backup: Regular full and incremental database backup, which can be flexibly configured.

Monitoring: Automatically captures and alerts exceptions, and supports SMS, email, and wechat notifications.

Performance: Collects the performance trend and SQL of more than 50 counters to monitor the database running status in real time.

Log: Collects database error logs.

Security: Audit of database accounts and operations, server-based security design.

That’s all for my share today. Thank you!