Hello, I’m Leo. Currently, I am engaged in Java backend development in Changzhou. This article is the summary of MySQL learning collation series. This series will be output together with byte, netease, Ali, Tencent, Meituan and Kuaishou. Hope to help more friends as soon as possible into the factory!

After reading this article, don’t say you have learned MySQL.

After reading this article, stop saying that you have learned MySQL.

After reading this article, stop saying that you have learned MySQL.

After reading this article, stop saying that you have learned MySQL.

After reading this article, stop saying that you have learned MySQL.

Small knowledge, big challenge! This article is participating in the creation activity of “Essential Tips for Programmers”.

17. Separate databases and tables

When the volume of data reaches a certain level, code optimization can no longer meet the true performance requirements. The next step is to consider the sub-database sub-table.

I’ve seen a lot of people no matter what the problem, it’s wrong to come up with a separate list. Wechat public account also has a lot of articles title is also more funny “The boss let me optimize the database, I came up on the sub-database sub-table, he came over is a jio”

The following can introduce the two kinds of split under the sub-table and when to split

17.1 Horizontal split

Horizontal split, the main split a data magnitude of the problem. If there are more than 5 million rows in a table, then it’s time to consider splitting. The horizontal division is similar to the male and female division of a hospital.

100 people showed up to sign up. 50 men, 50 women. 50 men must choose to sign up for men’s subjects, 50 women choose to sign up for women’s subjects. Maybe the examples are not appropriate and the general idea is the same.

If there is raw data in the table, you can use the ID module processing. Even numbers go to table A, cardinals go to table B. This is a classic example.

Advantages of horizontal splitting:

  • Table association can almost all be done on the database side. There is no bottleneck problem for some very large data volume and high load tables;
  • The overall architecture of the application side has changed relatively little; Transaction processing is relatively simple;
  • As long as the sharding rules are well defined, it is generally difficult to encounter scalability limitations;

Disadvantages of horizontal segmentation:

  • The sharding rules are relatively more complex and it is difficult to abstract a sharding rule that can satisfy the whole database
  • Later data maintenance difficulty has increased, manual positioning data is more difficult;
  • The high coupling degree of each module in the application system may make it difficult to migrate and split the data later.

17.2 Vertical Split

It is divided into different databases according to different businesses, such as membership database, order database, payment database, message database, etc. Vertical split is commonly used in large e-commerce projects.

Advantages: Clear services, clear rules, easy integration or expansion between systems.

Disadvantages: Some business tables cannot join, and cross-database query is cumbersome (communication must be through interfaces (HTTP + JSON)), and distributed transactions may occur, which increases the complexity of the system. For example: it is not possible for the order service to connect directly to the database of the member service.

17.3 Splitting the Solution

I’m using myCat middleware for the split. Mycat supports 10 sharding strategies

  • 1. Modulus algorithm
  • 2. Fragmented enumeration
  • 3. Scope agreement
  • 4. Date designation
  • 5. Fixed sharding hash algorithm
  • 6. Take the mold
  • 7, ASCII code modular matching
  • 8. Programming specification
  • String splitting hash parsing

Detailed will not be introduced, I will choose a detailed introduction of the sub-database sub-table. That’s about it.

MySQL flush dirty pages

When the data in memory is the same as the data page on disk, it is called a clean page.

When the data in the memory is inconsistent with the data pages on the disk, the data pages are called dirty pages.

MySQL may run into a query stall with the dirty page flushing mechanism. Why would you say that? Let’s do a scenario. In my family, we had a supermarket, and the dirty page system was like our account book. If someone came to charge us during the supermarket’s business, we could write the credit information directly into one of those temporary notes, and then transfer the data to the supermarket’s big account book at the end of the night.

So if during this period, the temporary notepad is used up, we must stop the work in hand to transfer all the data on the temporary notepad to the large account book, and then carry out the following operation, in the process of transfer almost belongs to the situation of lag.

How to brush dirty pages

On the front page, you need to tell MySQL how much IO the current computer can do, so innoDB can do as much as it can on the dirty pages, which is also performance maximization. The value should not be too high or too low. If the value is too high, the query performance will be too low. If the value is too low, the data of dirty pages will not keep up with the added data. Ultimately, the system performance is affected.

How to set the proportion of dirty pages

The parameters involved are innodb_io_capacity, innodb_max_dirty_pages_pct

The second parameter controls the percentage of dirty pages that are brushed. The default value is 75, or 75%.

Assuming that the ratio of dirty pages is M, we calculate the range from 0 to 100. Innodb has a sequence number every time it writes to innoDB, and the difference between this sequence number and checkpoint is set to N. N will compute a number between 0 and 100. And then we’re going to use the F2 (N) algorithm, which is a very complicated algorithm, and we just need to be able to say that the bigger N is, the bigger the value is.

Then take F1 (M) and F2 (N) to a maximum value of R, and the engine can control the rate of dirty pages by multiplying innodb_io_capacity by R%.

The ratio of dirty pages is calculated by Innodb_buffer_pool_pages_dirty/Innodb_buffer_pool_pages_total

Extending a parameter

Innodb_flush_neighbors. If the value of the preceding parameter is 1, it will be enabled. If the value of the preceding parameter is 0, it will be disabled.

What is this linkage mechanism? If a dirty page is AA and the data page next to AA is also a dirty page, the data page next to AA will be brushed away as well.

The specific calculation is more complicated, it is suggested to pass over, and the general process can be said. Specific technical articles in my public number.

19. Delete data from the tablespace

This question is probably overasked. This question should be easier to understand if you have learned C language.

MySQL deletes data by deleting tags. You don’t delete the corresponding data directly, so it gives you the impression that the data is missing, but the data page still has that data memory.

Here’s an extension of the space reuse problem.

If the deleted data is between 300 and 700 and the inserted value has an ID between 300 and 700, the space will be demultiplexed. If the value is not in the range, the space will not be multiplexed. Only when a whole page of data is deleted at the same time will the next 100 percent reuse, which is relatively small.

If it’s not 100% reuse then there’s a void! Let’s repeat, in a record 1-5, 1,2,5 is being multiplexed, 3,4 is not being multiplexed, so that’s a void.

Inserts also cause voids, and the main effect of voids is that the data is not compact, resulting in slower query performance.

The solution

  1. Reconstruction of table
  2. Refresh the table index

20. 200GB data, 100GB memory will not OOM

The answer is definitely not OOM

First of all, let’s introduce the process problem when we query 200G data.

  • Get a line and write it to net_buffer. The size of this chunk of memory is defined by the parameter net_buffer_length, which defaults to 16K.
  • The row is repeatedly fetched until the net_buffer is full and the network interface is called to issue it.
  • If the message is sent successfully, the net_buffer is emptied, and the next line is fetched and written to the net_buffer.
  • If the send function returns EAGAIN or WSAEWOULDBLOCK, the socket send buffer is full and the host is waiting. Continue sending until the network stack is writable again.

From the above process we can see that a query in the process of sending, the maximum amount of memory occupied by MySQL is net_buffer_length, not 200GB; The socket send buffer cannot reach 200GB (the default value is /proc/sys/net/core/wmem_default). If the socket send buffer is full, the process of reading data is suspended.

MySQL queries are read as they are written!