Abstract

IN must go to the index? Of course. Can we scan all of them without going through the index? I think I’ve seen some discussion of Exist IN index before. But it seems to have been too long and forgotten. If you’ve forgotten how IN is queried IN MySQL, let’s review.

The problem

The problem starts with the number of shops concerned

SELECT shop_id, count(user_Id) as attentionNumber
FROM shop_attention
WHERE shop_id IN
<foreach collection="shopIds" item="shopId" separator="," open="(" close=")">
    #{shopId}
</foreach>
GROUP BY shopId
Copy the code

We were looking at optimization from a cache perspective. Interested in reading this article how to do caching after microservitization

After the query converges and the application does caching, there is really no big problem. But as the number of store followers increased, slow SQL began to appear

In our business, SQL queries of 100ms are defined as slow queries that need to be optimized. Can not optimize must control the query frequency. At the same time, database operations that exceed 5s will be killed to prevent the entire database from crashing and causing related applications to be affected.

This SQL execution time has taken hundreds of ms and must be optimized. Ali cloud on this SQL detection report

  1. The ratio of scanned rows to returned rows exceeds 100
  2. The group_by function is used. Check to see if group_by uses an index

Analysis of the

The shop_ID field of group by must be indexed, so why is the ratio of scanned rows to returned rows so large?

Let’s review the three elements of analyzing queries

  1. Response time, that’s pretty clear. I won’t explain it much
  2. Number of rows scanned Indicates the number of rows scanned during the query
  3. Return rows The number of rows that a query hits is generally the best number of rows scanned and returned, but this is ideal, not the case. The number of scanned rows is greater than the number of returned rows in the case of associated query/range sort query. Generally, this ratio should be controlled under 10, otherwise there may be performance problems.

As an aside, I’ve always found mysql Explain’s display fields less intuitive than Mongo’s. The mongo Index principle is the same as mysql Index

Now the question is why the ratio of rows scanned to rows returned by this query is so large.

Then explain

Experiment 1

SELECT shop_id, count(user_Id) as attentionNumber
FROM shop_attention
WHERE shop_id IN(1,2,3)
GROUP BY shopId
Copy the code
type possible_keys key key_length ref rows Extras
range idx_shop idx_shop 8 null 16000 Using index condition

As I expected, the type is range and shopId index, nothing wrong. How can the ratio of rows scanned to rows returned be so large?

Experiment 2

Let’s try it again. I’m increasing the range of IN.

SELECT shop_id, Count (user_Id) as attentionNumber FROM shop_attention WHERE shop_id IN(1,2,3,4,5,6,7,8,9) GROUP BY shopIdCopy the code
type possible_keys key key_length ref rows Extras
index idx_shop idx_shop 8 null 303000 Using where

The result is different, the type is index, so instead of going through a range scan, you’re going through an index scan.

Experiment 3

Forced walk index

SELECT shop_id, Count (user_Id) as attentionNumber FROM shop_attention force index(idx_shop) WHERE shop_id IN(1,2,3,4,5,6,7,8,9) GROUP BY  shopIdCopy the code
type possible_keys key key_length ref rows Extras
range idx_shop idx_shop 8 null 29000 Using Index Condition

This is a range scan, not an index scan. But you’ll notice that the execution time is no shorter than the last one.

Mysql has optimized this query so that it does not go through a range scan. Instead, you go through an index scan. Then it is inevitable that with more and more IN conditions, the more rows scanned, the longer the execution time.

So the way to optimize this problem, is to do the application side of the cut, batch. Query N entries at a time to ensure that each query is fast.

conclusion

According to the actual situation, you need to control the scope of IN query. There are several reasons

  1. If there are too many IN conditions, the index fails and the index is scanned
  2. If there are too many conditions IN, too much data will be returned, which may cause memory overflow IN the application heap.

Therefore, the number of IN queries must be well controlled

Follow the public account [Abbot’s Temple], receive the update of the article in the first time, and start the road of technical practice with Abbot