How to design a database architecture that supports millions of concurrent transactions?

This article will take about 2.8 minutes to read.

”

In this article, we will talk about how to design the database architecture of a high parallel system that supports millions of daily users.

See this topic, a lot of people’s first reaction is: sub-database sub-table ah! But in fact, I think many students may not be clear about what the sub-database sub-table is used for, and how its different functions respond to different scenarios.

Use the development of a startup as a background

If we’re a small startup now, we have 200,000 registered users, 10,000 daily active users, 1,000 daily table data, and 10 requests per second at most at peak times.

Oh my god! With this kind of system, you can find a senior engineer with a few years of experience, and then you can take a few young engineers and do whatever you want.

Because such a system, in fact, is mainly in the early rapid development of business functions, a single block system deployed on a server, and then connected to a database can be.

And then people just kept filling in all kinds of business code in a project, as quickly as possible to support the business of the company.

As shown below:

As a result, we were lucky enough to meet a great CEO who took us on the road to success!

The company’s business grew rapidly, and after a few months, the number of registered users reached 20 million! 1 million daily active users! The amount of newly added data in a single table reaches 500,000 pieces every day! Peak requests per second up to 10,000!

At the same time, the company also took two rounds of financing, the valuation reached a staggering several hundred million DOLLARS! The rhythm of a youthful unicorn!

Okay, now you’re feeling a little stressed out. Why? Because 500,000 data are added to a single table every day, that’s 15 million more data in a month, and that’s hundreds of millions of data in a single table in a year.

After a period of operation, now we have a single table has 23 million data, barely can support.

However, seeing how the performance of the system to access the database is getting worse and worse, the amount of single table data is getting larger and larger, dragging down the performance of some complex query SQL ah!

Then the peak requests are now 10,000 per second. Our system has 20 machines deployed online, and each machine can support 500 requests per second on average. It’s not a big problem. But what about the database level?

If you are still a database server supporting tens of thousands of requests per second, I am responsible to tell you that each peak will have the following problems:

The disk IO, network bandwidth, CPU load, and memory consumption of your database server can be very high, and the overall load on the database server can be very heavy, or even overwhelmed.
At peak times, your SQL performance is already poor due to the large amount of data in a single table, and then you will find your SQL performance is even worse due to the high load on your database server.
The most obvious feeling is that your system is running very slowly during peak hours, and the user experience is very poor. It may take tens of seconds to click a button to get the results.
If you are unlucky and the configuration of the database server is not particularly high, you may also experience database outages because the high load puts too much strain on the database.

Multiple server libraries support high concurrent read and write

Let’s start with the first question: how can a database support tens of thousands of concurrent requests per second?

To understand this, you need to understand the server configuration on which the database is typically deployed. Generally speaking, if you are using a common server to deploy the database, it is at least a 16-core 32GB machine configuration.

This is a very common machine configuration deployment of the database, the general experience on the line is: do not let it support more than 2000 requests per second, generally control around 2000.

Control at this level, the general database load is relatively reasonable, will not bring too much pressure, there is no great risk of downtime.

So the first step is to deploy five servers with one database instance on each server in a scenario of tens of thousands of concurrent requests.

Then create the same library for each database instance, such as the order library. At this point, there is an order library on each of the five servers with names like db_order_01, db_order_02, and so on.

Then each order library has the same table, for example, if there is an order information table in the order library, then each of the five order libraries has an order information table.

For example, the db_ORDER_01 database has a table tb_ORDER_01, and the DB_ORDER_02 database has a table TB_ORDER_02.

This has realized a basic idea of database and table, the original one database server becomes five database servers, the original one library becomes five libraries, the original one table becomes five tables.

Then when you write data, you need to use database middleware, such as Sharding-JDBC, or mycat.

For example, if you hash the order id and press 5, for example, if you add 500,000 data to the order table every day, 100,000 of these data will fall into the TB_ORDER_01 table of the DB_ORDER_01 library. Another 100,000 pieces of data will fall into the TB_ORDER_02 table of the DB_ORDER_02 library, and so on.

In this way, the data can be evenly distributed on 5 servers. When querying, you can also hash the model by order ID to go to the database on the corresponding server and query that data from the corresponding table.

The graph drawn according to this idea is as follows, you can have a look:

What’s the advantage of doing this? The first advantage is that the original order table is only one table, now there are five tables, so the data of each table becomes 1/5.

If the order table has 100 million entries a year, then each of the five tables has 20 million entries a year.

So let’s say we already have 20 million data in our current order table, and then we do this split, and we only have 4 million data in each table.

If 500,000 data is added every day, then 100,000 data is added for each table. Does this initially alleviate the problem that the large amount of data in a single table affects system performance?

In addition, 10,000 requests per second to 5 databases, each database is carrying 2000 requests per second, is it all of a sudden to reduce the concurrent requests of each database server to the safe range?

In this way, the peak load of the database is reduced and the peak performance is guaranteed.

A large number of sub-tables to ensure the query performance under massive data

However, there is a problem with the database architecture mentioned above, that is, the data volume of single table is still too large. Now the order table is divided into 5 tables, so if there are 100 million orders a year, each table has 20 million, which is still too large.

So we should continue to separate, a lot of separate. For example, the order table can be split into 1024 tables, so that the amount of 100 million data, scattered to each table is only 100 thousand magnitude of data, and then the thousands of tables scattered in 5 databases.

When writing data, you need to do two routes, first hash the order ID and then modulate the number of databases, which can be routed to a database, and then modulate the number of tables in that database, which can be routed to a table in the database.

By doing this, you can make the amount of data in each table very small, 100 million data per year growth, but only 100 million data per table growth, and this system runs for 10 years, maybe a million data per table.

In this way, the system can be fully prepared for the future operation of the system. Take a look at the following figure to feel:

How are globally unique ids generated

So one of the questions that you’re going to have to face after you split the tables, is how do I get my ID? If a table is divided into multiple tables, the id of each table increases from 1.

For example, if your order table is split into 1024 order tables, each table’s ID is accumulated from 1, this must be a problem!

Your system will not be able to query the order by the primary key of the table, such as the order id = 50, in every table!

Therefore, it is necessary to generate globally unique ID under the distributed architecture. After the sub-database sub-table, for the core ID inserted into the database, it cannot simply use the table to increment the ID directly. It is necessary to generate globally unique ID and then insert it into each table to ensure that an ID in each table is globally unique.

For example, the order table is split into 1024 tables, but the order id = 50 will only exist in one table.

So how do you implement globally unique ids? There are several options:

Solution 1: Add the ID of an independent database

This scheme means that every time your system generates an ID, it inserts a non-business entry into a separate table in a separate library and then retrieves an id from the database. Get this ID and then write to the corresponding sub-database sub-table.

For example, if you have an auto_id library, there is only one table in it, called the auto_id table, and one id is self-increasing.

So every time you want to get a globally unique ID, you just insert a record into this table, get a globally unique ID, and then that globally unique ID can be inserted into the order branch table.

The advantage of this scheme is that it is so convenient and simple that anyone can use it. The disadvantage is that the single library generates the increment ID, if the high concurrency, there will be a bottleneck, because the auto_ID library if bearing tens of thousands of concurrent per second, is certainly not realistic.

Solution 2: UUID

This, as everyone should know, is the use of UUID to generate a globally unique ID.

The advantage is that each system is locally generated, not database-based. The downside is that the UUID is too long to be used as a primary key.

If you want to randomly generate a file name, number, etc., you can use UUID, but you can’t use UUID for primary key.

Solution 3: Obtain the current system time

The idea is to get the current time as a globally unique ID. But the problem is, when you have a high number of concurrent requests, like thousands of concurrent requests per second, there will be duplication, which is definitely not appropriate.

Generally, if you use this scheme, the current time is concatenated with many other business fields, as an ID, if the business is acceptable to you, then it is also ok.

You can concatenate other business field values with the current time to form a globally unique number, such as an order number: timestamp + user ID + business meaning code.

Scheme 4: Analysis of SnowFlake algorithm

SnowFlake is an open-source distributed ID generation algorithm for Twitter. The idea is to use a 64-bit long number as a globally unique ID.

One of the 64 bits is unused and 41 bits are used as the number of milliseconds, 10 bits are used as the work machine ID, and 12 bits are used as the serial number.

To give you an example, consider the following 64 bit long:

The first part, which is a bit: 0, is meaningless.
The second part is 41 bits: the timestamp.
The third part contains five bits: the machine room ID, 10001.
The fourth part has five bits: the machine ID, 1 1001.
The fifth part is the 12 bits: the serial number, which is the serial number of the ID generated at the same time in a millisecond on a machine in a machine room, 0000 00000000.

①1 bit: No, why not?

Since the first bit in binary is a 1, it is a negative number, but the generated ID is a positive number, so the first bit is always 0.

②41 bit: indicates the timestamp in milliseconds.

41 bits can represent as many as 2^ 41-1 milliseconds, or 69 years in adult years.

③10 bit: Record the working machine ID, indicating that the service can be deployed on a maximum of 2^10 machines, i.e. 1024 machines.

However, five bits represent the machine room ID and five bits represent the machine ID. This means a maximum of 2 ^ 5 machine rooms (32 machine rooms), each machine room can represent 2 ^ 5 machines (32 machines).

④12 bit: This is used to record different ids generated in the same millisecond.

The largest positive integer represented by 12 bits is 2 ^ 12-1 = 4096, which means you can use the 12 bits to distinguish 4096 different ids in the same millisecond.

In simple terms, one of your services assumes that you want to generate a globally unique ID, so you can send a request to the system that has the SnowFlake algorithm deployed, and the SnowFlake algorithm system generates the unique ID.

The SnowFlake algorithm must first know the machine room id = 17 and machine ID = 12.

When the SnowFlake algorithm receives the request, it first generates a 64-bit long ID using binary arithmetic. The first of the 64 bits is meaningless.

This is followed by 41 bits to use the current timestamp (in milliseconds), 5 bits to set the machine room ID, and 5 bits to set the machine ID.

Finally, determine the number of requests in this millisecond on the machine in the current machine room, and add a sequence number to the request to generate the ID as the last 12 bits.

The result is a 64-bit ID that looks something like this:

This algorithm ensures that a unique ID is generated in the same millisecond on a single machine in a machine room. It is possible to generate multiple ids in a millisecond, but with the ordinal number of the last 12 bits to distinguish them.

Let’s take a quick look at a code implementation of the SnowFlake algorithm. This is an example. Once you understand what this means, you can try to modify the algorithm for yourself.

In a word, each bit of a 64 bit number is used to set different flag bits to distinguish each ID.

SnowFlake algorithm implementation code is as follows:

public class IdWorker { private long workerId; Private long datacenterId; Private long sequence; Public IdWorker(long workerId, Long datacenterId, long sequence) {// Sanity checkforWorkerId = 32; workerId = 0if (workerId > maxWorkerId || workerId < 0) {

      throw new IllegalArgumentException(
        String.format("worker Id can't be greater than %d or less than 0",maxWorkerId));
    }

    if (datacenterId > maxDatacenterId || datacenterId < 0) {

      throw new IllegalArgumentException(
        String.format("datacenter Id can't be greater than %d or less than 0",maxDatacenterId)); } this.workerId = workerId; this.datacenterId = datacenterId; this.sequence = sequence; } private long twepoch = 1288834974657L; private long workerIdBits = 5L; private long datacenterIdBits = 5L; Private Long maxWorkerId = -1l ^ (-1l << workerIdBits); private Long maxWorkerId = -1l ^ (-1l << workerIdBits); Private Long maxDatacenterId = -1L ^ (-1L << datacenterIdBits); private Long maxDatacenterId = -1L ^ (-1L << datacenterIdBits); private long sequenceBits = 12L; private long workerIdShift = sequenceBits; private long datacenterIdShift = sequenceBits + workerIdBits; private long timestampLeftShift = sequenceBits + workerIdBits + datacenterIdBits; private long sequenceMask = -1L ^ (-1L << sequenceBits); private long lastTimestamp = -1L; public longgetWorkerId() {return workerId;
  }
  public long getDatacenterId() {
    return datacenterId;
  }
  public long getTimestamp() {
    returnSystem.currentTimeMillis(); } // This is the core method that causes the snowflake algorithm on the current machine to generate a globally unique ID public synchronized long by calling nextId()nextIdLong timestamp = timeGen();if (timestamp < lastTimestamp) {
      System.err.printf(
        "clock is moving backwards. Rejecting requests until %d.", lastTimestamp);
      throw new RuntimeException(
        String.format("Clock moved backwards. Refusing to generate id for %d milliseconds", lastTimestamp - timestamp)); } // If a request is sent within the same millisecond to generate an ID, the seqence sequence number will be incremented by 1, at most 4096if(timestamp == timestamp) {// This means a maximum of 4096 digits in a millisecond. No matter how many digits you pass in, this operation is guaranteed to always be in the range of 4096. Sequence = (sequence + 1) &Sequencemask;if(sequence == 0) { timestamp = tilNextMillis(lastTimestamp); }}else{ sequence = 0; } // The last time the id was generated, the unit is milliseconds lastTimestamp = timestamp; // This is the core bit operation, generate a 64bit ID // first move the current timestamp left, 41 bit; [Fixed] Move machine room ID left to 5 bit [Fixed] Move the machine ID left to 5 bit Put the last 12 bits // finally concatenated into a 64 bit binary number, converted to base 10 is a longreturn ((timestamp - twepoch) << timestampLeftShift) |
        (datacenterId << datacenterIdShift) |
        (workerId << workerIdShift) | sequence;
  }
  private long tilNextMillis(long lastTimestamp) {

    long timestamp = timeGen();

    while (timestamp <= lastTimestamp) {
      timestamp = timeGen();
    }
    return timestamp;
  }
  private long timeGen() {returnSystem.currentTimeMillis(); } / / -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- test -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- the public static void main (String [] args) {IdWorker worker = new IdWorker,1,1 (1);for(int i = 0; i < 30; i++) { System.out.println(worker.nextId()); }}}Copy the code

The SnowFlake algorithm could be improved a little bit in real development.

Because you can think about it, when we generate a unique ID, we usually need to specify a table name, such as the order table unique ID.

Therefore, in the above 64 bits, the 5 bits representing the machine room can be replaced by the business table name, for example, 00001 represents the order table.

Because in most cases, there are not so many machine rooms, so the 5 bits may not be very meaningful as machine room ID.

That’s how it works. Each machine in the SnowFlake algorithm generates a unique ID for a business table in a millisecond, many ids in a millisecond, and treats them with the last 12 bits.

Read/write separation supports on-demand capacity expansion and performance improvements

At this point, the overall effect is already quite good. The strategy of large number of separate tables ensures that the data volume of each table is not too large in the next 10 years, which can ensure the efficiency and performance of SQL execution within a single table.

Then, the split mode of multiple databases can ensure that each database server bears part of the read and write requests, reducing the load of each server.

But here’s the problem: suppose each database server is hosting 2000 requests per second, and then 400 of those are writes and 1600 are queries.

In other words, only 20% of the SQL was added, deleted, or changed, and 80% of the requests were queries. Now let’s say that as the number of users gets bigger and bigger, it’s 4,000 requests per server.

So 800 requests are writes, 3,200 requests are queries, and if you were to scale up as it is now, you would need to add a database server.

However, table migration may be involved at this point, because some of the tables need to be migrated to the new database server.

This is not necessary. Databases generally support read/write separation, which is the master/slave architecture.

Writes to the master database server and queries to the slave database server allow read and write requests from a table to be executed separately on different databases.

So, let’s say write requests to the main library are 400 per second and query requests to the slave library are 1600 per second.

The graph looks something like this:

When writing to the master library, data is automatically synchronized to the slave library to ensure that the master and slave database data consistency.

Then when the query is to go from the library to query, this through the database master and slave architecture to achieve the effect of read and write separation.

Now, the advantage is that if the main library write requests have increased to 800, it doesn’t matter, you don’t need to expand. Then the number of read requests from the library increased to 3200 and needed to be expanded.

In this case, you can directly mount a new slave library to the main library, two slave libraries, each support 1600 read requests, do not need to expand the main library because of the increase in read requests.

In fact, in online production, you will find that the growth rate of read requests is much higher than that of write requests, so after read/write separation, most of the time it is just to expand from the library to support higher read requests.

In addition, if you write data to the same table (involving locking) and query data from the table, there may be lock conflicts and other problems, both write performance and read performance will be affected.

So once the read and write are separated, the tables in the master library are just writes, no queries affect them, and the tables in the slave library are just queries.

Summary of database architecture design under high concurrency

From a large simplification point of view, in high concurrency scenarios, the database level architecture must be carefully designed.

In particular, it involves the sub-database to support high concurrency requests, a large number of sub-tables to ensure that the amount of data in each table is not too large, read and write separation to achieve master and slave library expansion as required and performance guarantee.

This article is from a big perspective to sort out the idea, you can combine your own company’s business and projects to consider their own system how to do sub-database sub-table.

In addition, it is necessary to use the database middleware to realize the separation of database and table and read and write when the specific database and table are landed. You can refer to Sharding-JDBC or MyCAT’s official website, where the documents have detailed usage description.

, END,

The growth path of programmers

Though the road is long, the journey is sure to come

WeChat ID: cxydczzl

Highlights from the past

7 big platform tools for programmers to connect private work

Why is IntelliJ IDEA better than Eclipse

Use these 19 MySQL optimizations to increase your efficiency by at least 3 times

If you want to stock up on books, check it out (select list)

IDEA 32 shortcuts to be sure to understand

The dirtiest technology in the world, and I got it

1000 lines MySQL detail study notes

Seven tips help you write elegant Java code

How to design a database architecture that supports millions of concurrent transactions?

Related Posts

A collection of algorithms implemented in PHP

Laravel’s shopping cart management API

Are you familiar with JIT just-in-time compilation and optimization in the JVM?