directory

00 Conclusion 01 Use of DISTINCT 02 Use of Group BY 03 Principles of DISTINCT and Group BY * 04 Reason for recommending Group BY

00 conclusion

First, the general conclusion (full conclusion at the end of the article) :

In the case of the same semantics and indexes: Group by and DISTINCT can use indexes with the same efficiency. In the case of the same semantics and no index, the distinct efficiency is higher than that of group BY. The reason is that both DISTINCT and Group BY perform grouping operations, but Group BY may perform sorting and trigger filesort, resulting in inefficient SQL execution.

Based on this conclusion, you might ask: why do group BY and DISTINCT have the same efficiency with the same semantics and indexes? And under what circumstances does group by sort?

Look for answers to these two questions. Next, let’s look at the basic uses of distinct and Group BY.

01 Use of distinct

Distinct usage

SELECT DISTINCT columns FROM table_name WHERE where_conditions;

Copy the code

Such as:

The DISTINCT keyword is used to return unique DISTINCT values. Before the first field in the query statement, andApplies to all columns of the main clause.

If a column has a NULL value and a DISTINCT clause is used for that column, MySQL keeps one NULL value and removes the others because the DISTINCT clause treats all NULL values as the same value.

Distinct Multiple columns are de-weighted

Deduplication of distinct multiple columns is performed based on the specified column information to be deleted. That is, only the specified column information is the same, the information is considered to be duplicated.

SELECT DISTINCT column1,column2 FROM table_name WHERE where_conditions;

Copy the code

02 Use of group by

For base de-duplication, the use of group by is similar to distinct:

Single row to heavy

Grammar:

SELECT columns FROM table_name WHERE where_conditions GROUP BY columns;

Copy the code

Perform:

More columns to heavy

Grammar:

SELECT columns FROM table_name WHERE where_conditions GROUP BY columns;

Copy the code

Perform:

The difference between the sample

The syntax difference between the two is that group by can perform single-column de-duplication. The principle of group by is to group and sort the results and then return themThe first of each groupThe data. And the weight is removed according to the following field of group by.



Such as:

03 Distinct and Group BY principles

In most cases, DISTINCT can be thought of as a special GROUP BY, which is implemented based on grouping operations and can be implemented through loose index scans, compact index scans. But for GROUP BY, before MYSQL8.0, GROUP BY was implicitly sorted BY field BY default.

Implicit ordering

For implicit sorting, please refer to Mysql:

MySQL :: MySQL 5.7 Reference Manual :: 8.2.1.14 ORDER BY Optimization

GROUP BY implicitly sorts by default (that is, in the absence of ASC or DESC designators for GROUP BY columns). However, relying on implicit GROUP BY sorting (that is, sorting in the absence of ASC or DESC designators) or explicit sorting for GROUP BY (that is, by using explicit ASC or DESC designators for GROUP BY columns) is deprecated. To produce a given sort order, provide an ORDER BY clause.

To paraphrase:

GROUP BY is implicitly sorted BY default (meaning that the GROUP BY column will be sorted even if there is no ASC or DESC indicator). However, explicit or implicit sorting of GROUP BY is deprecated. To generate a given sort ORDER, provide the ORDER BY clause.

So, before Mysql8.0,Group by would sort the results by function fields (the fields that follow Group by) by default. Group by does not require additional sorting if indexes are available; However, when index sort is not available, the Mysql optimizer is forced to implement GROUP BY BY using temporary tables and then sorting. In addition, if the size of the result set exceeds the size of the temporary table set by the system, Mysql will copy the data of the temporary table to the disk and perform operations on it. As a result, the execution efficiency of the statement becomes extremely low. This is why Mysql has chosen to disallow this operation (implicit sorting).

For these reasons, Mysql has been optimized for this update in 8.0:

MySQL :: MySQL 8.0 Reference Manual :: 8.2.1.16 ORDER BY Optimization

Previously (MySQL 5.7 and lower), GROUP BY sorted under certain conditions. In MySQL 8.0, that no longer occurs, so specifying ORDER BY NULL at the end to suppress implicit sorting (as was done previously) is no longer necessary. However, query results may differ from previous MySQL versions. To produce a given sort order, provide an ORDER BY clause.

To paraphrase:

Previously (prior to Mysql5.7), Group BY was implicitly sorted based on certain criteria. In mysql 8.0, this feature has been removed, so it is no longer necessary to add order by NULL to disable implicit sorting, however, the query results may differ from previous mysql versions. To generate results in a given ORDER, press the fields to be sorted BY ORDER BY.

So, here’s our conclusion:

In the case of the same semantics and indexes: Group by and DISTINCT can use indexes with the same efficiency. Since group BY and DISTINCT are almost equivalent, DISTINCT can be regarded as a special group BY. In the case of the same semantics and no index, the distinct efficiency is higher than that of group BY. The reason is that both DISTINCT and Group BY perform grouping operations, but group by will perform implicit sorting before Mysql8.0, resulting in filesort triggering and low SQL execution efficiency. Since Mysql8.0, implicit sorting has been removed from Mysql, so the efficiency of group by and DISTINCT is almost equivalent with the same semantics and no indexes.

04. Why do you recommend group by

So why do people prefer group by? 2. Group by can perform more complex data processing. Compared with distinct, group by has clear semantics. In addition, because the DISTINCT keyword takes effect on all fields, the use of Group BY is more flexible in compound business processing. Group BY can perform more complex data processing according to the grouping situation, such as filtering data through HAVING or calculating data through aggregation functions.