Keep updating the summary of Java technology stack, Java, MySQL, various middleware, etc., follow the public account “Asmaru Notes” to get the first update.

Collect technology-related e-books, interview questions, partners in need can pay attention to the public number [A Pill notes], no routine to receive.

1. The background

Count (*) = count(*) = count(*) = count(*);

But the harsh answer is that if you already use the right indexes, there’s basically nothing to optimize. Once there is a slow query, it is a slow query, to change, can only count or through other search platforms to do.

Today, let’s take a look at why this is the case and answer some of the most common puzzles you encounter on a daily basis.

2. Implementation of count(*)

MyISAM keeps the total number of rows in a table on disk, so it returns that number when it executes count(*), which is very efficient.
How does Innodb implement the count operation?
The InnoDB engine is a bit more cumbersome. When it executes count(*), it reads the data line by line from the engine and accumulates the count.
So, as we get more and more entries in the table, the count(*) gets slower and slower.

Of course, we are not talking about where conditions here, and MyISAM is also very slow with where conditions.

3. Correct opening method

Well, first of all, it’s not recommended to use count(*) for statistics on mysql, especially if tables are very large.
If the business is small and needs a quick start, at least make sure that count(*) has a scientific WHERE condition, and that the table is scientifically indexed.
1) If count(*) carries the WHERE condition, and it can override the index, it can still be done occasionally.
Select count(*) from where where (count(*), count(*) from where (count(*)), count(*) from where (count(*)), count(*) from where (count(*)), count(*) from where (count(*));
3) If it is pure count(*), or if the where condition does not have any index, it is not recommended.
For statistical business, several recommended practices:
1) If there is a self-increasing ID, it can be approximated by the maximum ID
2) Count yourself

3) Other data analysis platforms for aggregation

Count (*) = count(*);

In daily use, some students asked if they could use the statistics of the system table to replace the count.

The answer is no. The table ws here is only a reference value.

The table statistics here are actually obtained using show table status. How did you get this value? We need to understand the sampling statistics method of mysql.
Why do we sample statistics? Mysql does not have a good method for counting (*), because it is too expensive to extract the exact result from the entire table.
InnoDB will select N data pages by default, count the different values of these pages, get an average, and then multiply by the number of pages in the index to get the cardinality of the index.
Tables are constantly updated and index statistics are not fixed. Therefore, when the number of changed rows exceeds 1/M, a new index count is automatically triggered.

So this sample estimate is very inaccurate. How inaccurate? Official documentation says it could be 40 to 50 percent.

4. About those weird counts (?)

When we look at some old code query, we often see the count(1), count(ID), count(field), etc., so they are not better than each other, is there any difference in performance?

Here, we need to understand the semantics of count().
Count () is an aggregate function that evaluates the returned result set line by line, incrementing the total if the count function argument is not NULL, otherwise it is not incremented. Finally returns the cumulative value.
1) count(primary key id)
The InnoDB engine iterates through the table, fetching the ID value of each row and returning it to the Server layer. The server layer gets the ID, determines that it cannot be empty, and adds it up by row.
2) count (1)
InnoDB engine traverses the entire table, but does not take values. The server layer adds a number “1” to each row returned, judging that it cannot be empty, and adds it up.
3) count(field)
If the “field” is defined as not null, read the column from the record line by line.
If the “field” definition is allowed to be NULL, then the value must be extracted to determine if it is not null.
4) count (*)
It’s not going to pull all the fields out, it’s optimized, it’s not going to be evaluated. Count (*) must not be null.
In order of efficiency, count(field)


Scan my official account “Ahmaru Notes” to get the latest updates as soon as possible.

No routine access to Java technology stack e-books, each large factory interview questions