Abstract

IN must go to the index? Of course. Can we scan all of them without going through the index? I think I’ve seen some discussion of Exist IN index before. But it seems to have been too long and forgotten. If you’ve forgotten how IN is queried IN MySQL, let’s review.

The problem

The problem starts with the number of shops concerned

SELECT shop_id, count(user_Id) as attentionNumber
FROM shop_attention
WHERE shop_id IN
<foreach collection="shopIds" item="shopId" separator="," open="(" close=")">
    #{shopId}
</foreach>
GROUP BY shopId
Copy the code

We were looking at optimization from a cache perspective. Interested in reading this article how to do caching after microservitization

After the query converges and the application does caching, there is really no big problem. But as the number of store followers increased, slow SQL began to appear

In our business, SQL queries of 100ms are defined as slow queries that need to be optimized. Can not optimize must control the query frequency. At the same time, database operations that exceed 5s will be killed to prevent the entire database from crashing and causing related applications to be affected.

This SQL execution time has taken hundreds of ms and must be optimized. Ali cloud on this SQL detection report

The ratio of scanned rows to returned rows exceeds 100

The group_by function is used. Check to see if group_by uses an index

Analysis of the

The shop_ID field of group by must be indexed, so why is the ratio of scanned rows to returned rows so large?

Let’s review the three elements of analyzing queries

Response time, that’s pretty clear. I won’t explain it much

Number of rows scanned Indicates the number of rows scanned during the query

Return rows The number of rows that a query hits is generally the best number of rows scanned and returned, but this is ideal, not the case. The number of scanned rows is greater than the number of returned rows in the case of associated query/range sort query. Generally, this ratio should be controlled under 10, otherwise there may be performance problems.

As an aside, I’ve always found mysql Explain’s display fields less intuitive than Mongo’s. The mongo Index principle is the same as mysql Index

Now the question is why the ratio of rows scanned to rows returned by this query is so large.

Then explain

Experiment 1

SELECT shop_id, count(user_Id) as attentionNumber
FROM shop_attention
WHERE shop_id IN(1,2,3)
GROUP BY shopId
Copy the code

type	possible_keys	key	key_length	ref	rows	Extras
range	idx_shop	idx_shop	8	null	16000	Using index condition

As I expected, the type is range and shopId index, nothing wrong. How can the ratio of rows scanned to rows returned be so large?

Experiment 2

Let’s try it again. I’m increasing the range of IN.

SELECT shop_id, Count (user_Id) as attentionNumber FROM shop_attention WHERE shop_id IN(1,2,3,4,5,6,7,8,9) GROUP BY shopIdCopy the code

type	possible_keys	key	key_length	ref	rows	Extras
index	idx_shop	idx_shop	8	null	303000	Using where

The result is different, the type is index, so instead of going through a range scan, you’re going through an index scan.

Experiment 3

Forced walk index

SELECT shop_id, Count (user_Id) as attentionNumber FROM shop_attention force index(idx_shop) WHERE shop_id IN(1,2,3,4,5,6,7,8,9) GROUP BY  shopIdCopy the code

type	possible_keys	key	key_length	ref	rows	Extras
range	idx_shop	idx_shop	8	null	29000	Using Index Condition

This is a range scan, not an index scan. But you’ll notice that the execution time is no shorter than the last one.

Mysql has optimized this query so that it does not go through a range scan. Instead, you go through an index scan. Then it is inevitable that with more and more IN conditions, the more rows scanned, the longer the execution time.

So the way to optimize this problem, is to do the application side of the cut, batch. Query N entries at a time to ensure that each query is fast.

conclusion

According to the actual situation, you need to control the scope of IN query. There are several reasons

If there are too many IN conditions, the index fails and the index is scanned
If there are too many conditions IN, too much data will be returned, which may cause memory overflow IN the application heap.

Therefore, the number of IN queries must be well controlled

Follow the public account [Abbot’s Temple], receive the update of the article in the first time, and start the road of technical practice with Abbot

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

SQL IN must go to the index?

Abstract

The problem

Analysis of the

Experiment 1

Experiment 2

Experiment 3

conclusion

SQL IN must go to the index?

Abstract

The problem

Analysis of the

Experiment 1

Experiment 2

Experiment 3

conclusion

Related Posts

CPU for performance analysis whiz – whiz

Why should Python code be beautiful and unambiguous

Gradle in Android Studio configures different server addresses to generate different installation packages.