Original address: Liang Guizhao’s blog

Blog address: blog.720ui.com

Welcome to pay attention to the public account: “server-side thinking”. A group of people with the same frequency, grow together, progress together, break the limitations of cognition.

Today, we discuss an interesting topic: how many times does a single table need to be considered in MySQL? Some say 20 million lines, others say 5 million lines. So, what do you think is the right number?

It has been widely circulated in the Chinese Internet technology circle that the performance of MySQL will significantly decline if the data volume of a single table is larger than 20 million rows. In fact, this rumor is said to have originated from Baidu. As a result, the DBA tested MySQL performance and found that the performance of SQL operations decreased dramatically when the number of single tables was on the order of 20 million rows. Then it is said that baidu engineers moved to other companies in the industry, also brought this information, so, in the industry spread such a saying.

Later, According to The Java Development Manual of Alibaba, only when the number of rows in a single table exceeds 5 million or the capacity of a single table exceeds 2GB is it recommended to carry out database and table division. In this regard, ali has the support of the golden iron law, so, many people design big data storage, will take this as the standard, table operation.

So, what do you think is the right number? Why not 3 million lines, or 8 million lines, but 5 million lines? You might say that’s probably ali’s best game number, right? So, again, how is this number evaluated? Hold on a second. Think about it for a minute.

In fact, this number has nothing to do with the actual number of entries, and everything to do with MySQL configuration and the hardware of the machine. MySQL loads indexes of tables into memory to improve performance. If InnoDB buffer size is large enough, it can be fully loaded into memory and queries will not be a problem. However, when a single-table database reaches a certain limit, its index cannot be stored in memory, resulting in disk I/O for subsequent SQL queries, resulting in performance degradation. Of course, this also has to do with the specific table structure design, which ultimately leads to memory limitations. Here, the additional hardware configuration may bring an immediate performance boost.

Therefore, my opinion on the sub-database and sub-table is that it is necessary to combine the actual needs and not to over-design. At the beginning of the project, the sub-database and sub-table design should not be adopted. Instead, with the growth of the business, the sub-database and sub-table design can be considered to improve the performance of the system when it is impossible to continue to optimize. In this regard, Alibaba “Java Development manual” added: if the amount of data is not expected to reach this level in three years, please do not create tables on the database. So, back to the original question, what do you think is the right number? My suggestion is to make a comprehensive evaluation according to the situation of the machine. If there is no standard in mind, then 5 million lines can be temporarily used as a unified standard, which is a compromise value.

Write in the end

[Server-side thinking] : Let’s talk about the core technologies of the server-side and discuss the project architecture and practical experience of the first-line Internet. Let all lone R&D personnel find their own circle to communicate and discuss together. Here, we can upgrade cognition, connect top technology, connect excellent way of thinking, connect the shortest path to solve problems, connect all excellent methods, break the limitation of cognition.

More wonderful articles, all in the “server-side thinking”!