MySQL handles GROUP BY and DISTINCT queries similarly in most cases, and in fact, optimizations sometimes switch between the two. Both types of queries can benefit from indexes, and this is often the most important way to optimize them.

MySQL has two strategies for GROUP BY queries when indexes are unavailable: use temporary tables or filesort to perform grouping. Neither approach is more efficient for a given query. We can configure SQL_BIG_RESULT and SQL_SMALL_RESULT to specify which way the optimizer chooses.

In general, it is more efficient to group query tables by ID rather than by value, such as the following query:

SELECT actor.first_name, actor.last_name, COUNT(*)
FROM sakila.file_actor
INNER JOIN sakila.actor USING(actor_id)
GROUP BY actor.first_name, actor.last_name;
Copy the code

The following query is more efficient:

SELECT actor.first_name, actor.last_name, COUNT(*)
FROM sakila.file_actor
INNER JOIN sakila.actor USING(actor_id)
GROUP BY film_actor.actor_id;
Copy the code

Using actor_id for grouping is more efficient than film_actor_id.

The advantage of this query is that the actor name is dependent on actor_id, so it will return the same result, but not if the result is different. Sometimes the server even disables GROUP BY through the SQL_MODE configuration. At this point, if you don’t care about the values taken, and the values of the columns used for grouping are unique, you can use MIN and MAX to solve the problem.

SELECT MIN(actor.first_name), MAX(actor.last_name), ... ;Copy the code

For perfectionists, they will think that your grouping is wrong and they are right. The result of a virtual MIN or MAX is that the query does not assemble correctly. However, sometimes you just want MySQL to perform queries faster. A perfectionist would be happy with the following query:

SELECT actor.fisrt_name, actor.last_name, c.cnt
FROM sakila.actor
	INNER JOIN (
    SELECT actor_id, COUNT(*) AS cnt
    FROM sakila.film_actor
    GROUP BY actor_id
  ) AS c USING(actor_id);
Copy the code

However, the cost of creating and populating temporary tables in a subquery can be higher than what appears to be a dead end in theory. Keep in mind that temporary tables built by subqueries do not have indexes, which can lead to performance degradation.

In general, in grouped queries, it is a bad idea to select columns that are not grouped. This is because the query results are uncertain and can be changed if the index is changed or if the optimizer uses a different strategy. In fact, we recommend setting the SQL_MODE on the server to ONLY_FULL_GROUP_BY, where writing a bad grouped query will generate an error instead of executing it directly. After ONLY_FULL_GROUP_BY is enabled, the SELECT field can only be the field specified BY GROUP BY. In this case, you can build a step query or sub-query to query GROUP columns and then perform a second query.

MySQL automatically groups columns based on the ORDER specified BY GROUP BY, unless the ORDER specified BY is used. If you do not care about ORDER and find that this results in a filesort, you can use ORDER BY NULL to skip automatic sorting. You can also specify that the results are sorted in the specified direction BY adding DESC or ASC after GROUP BY.

Sometimes you can ask MySQL to do a super aggregate in the results when grouping queries. This can be done BY adding the WITH ROLLUP clause after GROUP BY, but this does not necessarily achieve the desired optimization. You can check the execution method through EXPLAIN, noting whether grouping is done via filesort or temporary tables. The same query is then compared after the WITH ROLLUP is removed. The comparison may lead to optimization.

Sometimes it is more efficient to add aggregated queries, even though more rows are returned. You can also hold intermediate query results by nesting subqueries after FROM, and then use UNION to get the final results.

Note, however, that it is best to remove the WITH ROLLUP in your application and optimize for grouped queries.

Conclusion: When using GROUP BY, it is best to use index column grouping. If you do not need to specify the ORDER, you can use ORDER BY NULL to optimize. If you do not group by index column, you need to consider workarounds and consider whether to use subqueries or WITH ROLLUP to check performance and then optimize. In addition, you are advised to enable ONLY_FULL_GROUP_BY to prevent unexpected errors in group queries.