Moment For Technology

How to optimize tables with tens of millions of data

Posted on Dec. 2, 2022, 5:48 p.m. by 樊龍
Category: The back-end Tag: The back-end

Explain here first, a lot of people on the Internet said that ali provisions of 500w data will be divided into sub-database sub-table. In fact, the 500W is not defined, but depends on the configuration of MySQL and the hardware of the machine. MySQL loads table indexes into memory to improve performance. However, when the amount of table data reaches a certain level, the memory cannot store the indexes. If the indexes cannot be stored, disk I/O can only be performed, resulting in performance degradation.

Practical tuning

So here I have a table with 1000W of data, and so far there's only one primary key index

CREATE TABLE `user` (
  `id` int(10) NOT NULL AUTO_INCREMENT,
  `uname` varchar(20) DEFAULT NULL COMMENT 'account'.`pwd` varchar(20) DEFAULT NULL COMMENT 'password'.`addr` varchar(80) DEFAULT NULL COMMENT 'address'.`tel` varchar(20) DEFAULT NULL COMMENT 'phone'.`regtime` char(30) DEFAULT NULL COMMENT 'Registration Time'.`age` int(11) DEFAULT NULL COMMENT 'age',
  PRIMARY KEY (`id`))ENGINE=InnoDB AUTO_INCREMENT=10000003 DEFAULT CHARSET=utf8;
Copy the code

Query all in about 16s. It's pretty slow. Usually we have a background system, such as this is an e-commerce platform, this is the user table. Background management system, generally query the user information, do some operations, such as background directly add users ah, or delete users ah these operations.

So there are two requirements, one for querying count and one for paging queries

Let's test the time for count and the time for paging queries, respectively

select * from user limit 1.10// When hardly usedselect * from user limit 1000000.10  //0.35s
select * from user limit 5000000.10  //1.7s
select * from user limit 9000000.10  //2.8s
select count(1) from user  //1.7s
Copy the code

As you can see from the query time above, if the query is a paging query, the later the data is queried, and the query count takes 1.7s. This is clearly not in line with our requirements. So, here we need to optimize. First let's try index optimization here

First look at the execution plan with only the primary key index:

alter table `user` add INDEX `sindex` (`uname`.`pwd`.`addr`.`tel`.`regtime`.`age`)
Copy the code

All -index; sindex = all-index;

The purpose of creating a federated index is to make conditional queries faster than full table queries

select * from user where uname='6.445329111484186' //3.5S (no joint index)select * from user where uname='6.445329111484186' //0.003S (with joint index)Copy the code

So that's the difference between a federated index and no index

This basically proves that full table queries, with or without indexes, are simply slow

Since the index result is no longer working, we have to find another solution. Now, as I said in my mysql interview, we can store count in a separate table

CREATE TABLE `attribute` (
  `id` int(11) NOT NULL.`formname` varchar(50) COLLATE utf8_bin NOT NULL COMMENT 'the name of the table'.`formcount` int(11) NOT NULL COMMENT 'Total table data',
  PRIMARY KEY (`id`))ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin;
Copy the code

In this case, the table does not look up all tables, but only one table, so when you build a table, you can build a hash

select formcount from attribute where formname='user'// When hardly usedCopy the code

So count is optimized. If there is a selection criterion above, you can create an index and filter through the index so that you don't have to read the count.

So, count is fine, but how does paging query optimization optimize? You can use subqueries here to optimize

select * from user where
id = (select id from user limit 9000000.1) limit 10 //1.7s
Copy the code

Actually the subquery this orthography, judge id, is to query by covering index. Efficiency will be greatly increased. However, my test is 1.7s. In the past, when the company optimizes this aspect, the query time is lower than this. You can also generate the data and test yourself

But if that's a lot of data, I would suggest that we go to ES or we do some default selection, and count can be listed separately

At this point, the optimization of a multi-million-level data paging query is complete.

Search
About
mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.