This article is from this year’s System Architect Conference, where I shared the first 3 PPT pages of database Engineering Architecture Practice, some basic concepts in database architecture design. Voiceover: __ shared nearly 4 hours at the meeting, see ten Years.

All concepts are exemplified by “user-centric”. Voice-over: This is a common service that provides users with registration, login, information query and modification. **

1. Single library architecture

Single library architecture is the most common database architecture in the early stage of business.

  • User-service: user-centric service that provides a friendly RPC interface for callers

  • User-db: a library for data storage

Second, grouping structure

Database grouping architecture, namely the most common one master, multiple slaves, master/slave synchronization, read/write separation database architecture:

  • User-service: indicates the user center service

  • User-db-m (master) : indicates the master database, providing database write services

  • User-db-s (slave) : provides the database read service

A database cluster consisting of master and slave databases is called a group.

Database clusters in the same group:

  • Data is synchronized between the master and slave using the binlog

  • Multiple instance databases have identical structures

  • Multiple instances store exactly the same data, essentially duplicating the data

What problem does the database grouping architecture solve?

Most Internet services read too much and write too little. Database reads are often the first performance bottleneck if you want to:

  • Linearly improves database read performance

  • Improves database write performance by eliminating read/write lock conflicts

  • “Read High Availability” of Data through Redundant Slave Libraries

The grouping schema can be used at this point, but it is important to note that in the grouping schema, the main library of the database is still a write single point.

In a word, grouping to solve the “database read and write high concurrency high” problem, often implemented architecture design.

Third, sharding architecture

Database sharding architecture is most commonly referred to as horizontal sharding:

  • User-service: indicates the user center service

  • User-db1: Split horizontally into the first of two halves

  • User-db2: Split horizontally into the second of two portions

After sharding, multiple database instances also form a database cluster.

Horizontal shard, is it separate library or table?

Separate repositories are strongly recommended because:

  • Partitioned tables still share a single database file, and there is still disk IO competition

  • Branch repositories can easily migrate data to different database instances, or even database machines, for better scalability

Voiceover: Of course, the number of database connections will increase after the repository is split.

How do I do horizontal segmentation?

Common methods are “range” and “hash” :

Scope method above, based on the user center service primary key UID as the partition, the data is horizontally shard into two database instances.

Hashing, for example, divides data horizontally into two database instances based on the user-centric service primary key UID. Voice-over: The hashing algorithm in this example is “modulo”. Hashing is widely used in Internet database architecture.

Shard architecture, the individual shards in the same cluster:

  • Multiple instances are not directly related to each other, unlike binlog synchronization between master and slave

  • The multiple instance database structure is exactly the same

  • There is no intersection between data stored by multiple instances, and data between all instances is combined to form global data

What problem does sharding solve?

Most Internet services have a large amount of data, and the single database capacity is easy to become a bottleneck. In this case, sharding can:

  • Linearly improves database write performance. Note that grouping architectures do not linearly improve database write performance

  • Reduce the data capacity of a single database

In a word, sharding solves the problem of “large amount of database data” and often implements the architecture design.

Group + Sharding architecture

If the concurrency of read and write services is high and the amount of data is also large, the grouping and sharding database architecture is usually required:

  • Sharding reduces the amount of data in a single database and linearly improves the write performance of the database

  • Groups are used to linearly improve the database read performance and ensure the high availability of the read library

Voiceover: The real architecture of most of the lines looks like this.

**

Five, vertical segmentation

Database vertical sharding is also a common database architecture design. Vertical sharding is closely combined with business.

Using user center as an example, we can do vertical shard like this:

User_Base(uid, uname, passwd, sex, age,…)

User_EX (uid, intro, signs, etc.)

  • For vertically split tables, the primary key is uid

  • Login name, password, gender, age and other attributes are placed in a vertical table (library)

  • Attributes such as self-introduction, personal signature, etc. are placed in a separate vertical table

How do you do vertical segmentation?

Two factors, “length” and “access frequency” of attributes, are generally taken into account when data is vertically segmented according to business:

  • Shorter in length and more frequently visited together

  • The ones that are longer and less frequently accessed are put together

This is because the database loads data into the buffer in the unit of row. In the case of limited memory capacity, the memory can load more data with short length and high access frequency, resulting in higher hit ratio, reduced DISK I/O, and improved database performance.

Vertical and horizontal segmentation are similar, but not quite the same:

  • There is also no direct connection between instances, that is, no binlog synchronization

  • Multiple instance database structures, all different

  • Data stored in multiple instances must have at least one intersection column, which is generally a service primary key. Data between all instances is combined to form global data

What problem does vertical segmentation solve?

Vertical sharding reduces the amount of data in a single library and reduces disk I/OS to improve throughput. However, vertical sharding is closely associated with services and not all services can be vertically sharded.

The article is longer and summarizes briefly:

  • At the beginning of business, single library is used

  • Read pressure, read high availability, use groups

  • Large amount of data, write linear expansion, with sharding

  • Attributes with short attributes and high access frequency are vertically split together

I hope you’ve had some fun. Voice-over: It’s really tiring to write so many powerpoint pages. What about the other 91?

Related article: Buffer Pool, Get it Once and for all! Change Buffer, Get it This time!