Last article talked about MySQL’s unpopular knowledge – custom variables, it is estimated that few people use, but if used well can also aid in performance optimization. It is important to note that variables are session-based, and there may be some unexpected situations that need to be used with care. This article describes how to use custom variables for query optimization to improve efficiency.

Optimized sort query

An important feature of a custom variable is that you can assign the result of a mathematical calculation to the variable at the same time, similar to the way we use I = I + 1. Here is an example used to calculate the row number of a data table:

SET @rownum := 0;
SELECT actor_id, @rownum := @rownum + 1 AS rownum
FROM sakila.actor LIMIT 3;
Copy the code
actor_id rownum
1 1
2 2
3 3

The result may not seem significant because the primary key increments from 1, so the line number and primary key value are the same. However, this approach can be used for sorting. For example, if you want to query the top 10 actors who have starred in the most films, the usual way is to write something like this:

SELECT actor_id, COUNT(*) as cnt
FROM sakila.film_actor
GROUP BY actor_id
ORDER BY cnt DESC
LIMIT 10;
Copy the code

If we want to get the corresponding ranking value, we can introduce variables to do this:

SET @curr_cnt := 0.@prev_cnt := 0.@rank := 0;
SELECT actor_id,
	@curr_cnt := cnt AS cnt,
  @rank 		:= IF(@prev_cnt <> @curr_cnt, @rank+1.@rank) as rank,
  @prev_cnt	:= @curr_cnt AS dummy
FROM (
  SELECT actor_id, COUNT(*) AS cnt
  FROM sakila.film_actor
	GROUP BY actor_id
	ORDER BY cnt DESC
	LIMIT 10
) as der;
Copy the code

Here we assign the number of movie roles to the curr_cnt variable, using prev_cnt to store the previous actor’s number of roles. If the number of actors behind is not the same as the number of actors before it, the ranking goes down (+1). If they are the same, the ranking is the same as the previous actor. In this way, actors’ rankings can be obtained directly from the query results, without the need for secondary processing from the database query (this can also be done through program code).

Avoid fetching the row that you just modified twice

If you want to retrieve information about a row when it is updated, you often need to read the database again. This is because MySQL, unlike PostgreSQL’s UPDATE RETURNING function, can return both updated rows and only the number of rows affected by the UPDATE. However, we can do this by customizing variables. For example, to get a row whose update time has just been changed, an additional query is required without using a custom variable:

UPDATE tb1 SET lastUpdated = NOW() WHERE id = 1;
SELECT lastUpdated FROM tb1 WHERE id = 1;
Copy the code

This can be avoided with custom variables:

UPDATE tb1 SET lastUpdated = NOW() WHERE id = 1 AND @now  := NOW();
SELECT @now;
Copy the code

There is still a query operation, but subsequent query operations no longer require access to the database.

Lazy loaded federated queries

Suppose we need to write a federated query that looks for a matching row on one branch of the union and skips the other branches if it finds one. Y This occurs when you need to look up from hot or low frequency access data (such as recent and historical orders). This is plain SQL for the following user query:

SELECT id FROM users WHERE  id = 123
UNION ALL
SELECT id FROM users_archived WHERE id = 123;
Copy the code

This query looks for the user with ID 123 from the currently in use user table and then looks for the user with the same ID from the archived user table. However, this method is inefficient. Even if you find the desired user in the Users table, you still need to find the user from the users_archived table again, and the actual user ID 123 is only in one of the tables, or the data in both tables is the same. This can be avoided by lazy-loaded union queries, where the second branch queries only if the first branch does not find the data. Therefore, you can use MySQL’s GREATEST method as a container for query results to avoid returning multiple columns of data.

SELECT GREATEST(@found := - 1, id) AS id, users.name, 'users' as which_tb1
FROM users WHERE id = 123
UNION ALL
	SELECT id, users_archived.name, 'users_archived'
  FROM users_archived WHERE id = 123 AND @found IS NULL
UNION ALL
	SELECT 1.' '.'reset' FROM DUAL WHERE ( @found := NULL) IS NOT NULL;
Copy the code

If the query above has a result on the first row, @found will not be assigned and will therefore be NULL, and the second query will be executed. The third UNION actually has no effect, except to restore @found to NULL so that the SQL can be repeated. Another way to verify this is to perform this operation on the same table and see that only one row of data is actually returned or no data is returned (when no data is queried).

SELECT GREATEST(@found := - 1, `id`) AS `id`, `infocenter_city`.`name`, 'city' as which_tb1 
FROM `infocenter_city` WHERE `id` = 460100 
UNION ALL 
	SELECT `id`, `infocenter_city`.`name`, 'infocenter_city' 
	FROM `infocenter_city` WHERE id = 460100 AND @found IS NULL 
UNION ALL 
	SELECT 1.' '.'reset' FROM DUAL WHERE ( @found := NULL) IS NOT NULL
Copy the code