Why don't SQL statements have too many joins?

This is the 28th day of my participation in the August Challenge

Send subtopic

Interviewer: Have you ever operated Linux?

Me: Yes

Interviewer: What command should I use to check memory usage

Me: Free or top

Interviewer: Tell me what you can see with the free command

Me: Well, you can see the memory and cache usage as shown below

Total total memory
Used Used memory
Free Free memory
Buff /cache Used cache
Avaiable Available memory

Interviewer: Do you know how to clean the used cache (buff/cache)

I: em… I don’t know

Interviewer: Sync; Echo 3 > /proc/sys/vm/drop_caches buff/cache

I :(send points, heart great joy) benefits greatly have, clear out the cache we have more available memory space, like the PC above xx guard’s small rocket, click, release a lot of memory

Interviewer: em…. Go back and wait for notice

Talk about SQL Join

Interviewer: Change the subject and talk about your understanding of join

Me: Ok (get it wrong again and you’re done, take your chance)

review

[` of SQL join `] (HTTP: / / http://mp.weixin.qq.com/s?__biz=MzI3ODcxMzQzMw%3D%3D&chksm=eb538c37dc240521dd10df0498404a37fb81c61aef665ef Cae142831ca7b7bbfadfcdb53ff4e & independence idx = 1 & mid = 2247485697 & scene = 21 & sn = # aa41e25d02a92d0d83a597074f6d579c wechat_redirect) may, in accordance with certain conditions Joins the specified tables and returns the data to the client

inner joinIn the connection

left joinLeft connection

right joinThe right connection

full joinAll connection

Interviewer: If you need to use join statement in project development, how can you optimize and improve performance?

Me: Divided into two cases, the data scale is small, the data scale is large.

Interviewer: And?

Data size is small, all dry into memory is done
Large data scale

You can optimize the speed of the JOIN statement by adding indexes. You can reduce the number of joins by using redundant information. Minimize the number of table joins

Interviewer: Can be summed up in [` join `] (HTTP: / / http://mp.weixin.qq.com/s?__biz=MzI3ODcxMzQzMw%3D%3D&chksm=eb538c37dc240521dd10df0498404a37fb81c61aef665ef Cae142831ca7b7bbfadfcdb53ff4e & independence idx = 1 & mid = 2247485697 & scene = 21 & sn = # aa41e25d02a92d0d83a597074f6d579c wechat_redirect) statement is relatively consumption Expensive, right?

I: yes,

Interviewer: Why?

The buffer

Me: There must be a comparison process when executing the JOIN statement

Interviewer: Yes

I: It is slow to compare two tables one by one, so we can read the data from two tables into a block of memory. For example, MySQL InnoDB engine uses the following statement to check the related memory area.

As shown in the figure below, the size of join_BUFFer_SIZE will affect the performance of our join statement

Interviewer: What else?

A big premise

Me: Any project will eventually come online and inevitably generate data on a scale that can’t be too small

Interviewer: That’s right

Me: Most of the data in the database is eventually saved to the hard disk and stored as files.

Take MySQL’s InnoDB engine as an example

InnoDB uses page as the basic IO unit, and each page is 16KB in size
InnoDB creates.ibD files for each table to store data

validation

Me: That means we have to read as many files as we want to join, and while we can take advantage of indexes, we still have to constantly move the head of the hard disk

Interviewer: So frequent movement of the head will affect performance

Me: Yes, don’t open source frameworks like to say they have greatly improved performance with sequential reads and writes, such as hbase and Kafka

Interviewer: That’s right. Do you think Linux has optimized for that? Hint, you can run the free command again to see

Me: Wonder why the cache takes up 1.2 gigabytes

Interviewer: Have you ever thought about it

What is stored in the buff/cache?
Why does the buff/cache take up so much memory, availlable and 1.1GB of available memory?
Why is it that you can clean up a buff/cache with two commands, while releasing used can only be done by terminating the process?

Product, you fine product

After thinking about it for a few minutes

Me: The buff/cache is not important, so clearing it will not affect the operation of the system

Interviewer: Not exactly

Me: Is it? I remember a sentence from CSAPP (In-depth Understanding of computer Systems)

The essence of a storage hierarchy is that each tier of storage devices is a cache for the lower tier

Translation: Linux treats memory like a cache for your hard drive

Related information: tldp.org/LDP/sag/htm…

Interviewer: Now do you know how to answer the scoring question

I: I…

The Join algorithm

Interviewer: Given another chance, what would you do if you were to implement the Join algorithm?

Me: Without an index, the nested loop is done. If there are indexes, they can be used to improve performance.

Interviewer: Back to join_buffer, what do you think is stored in join_buffer?

Me: During the scan, the database selects a table and puts the data it wants to return and compare with other tables into join_buffer

Interviewer: What happens when there is an index?

I: this is relatively simple, directly read the index tree of two tables for comparison on the completion of the ao, I introduced here no index processing mode

Nested Loop Join

The outerTable has 100,000 rows of data and the innerTable has 100 rows of data. The outerTable has 100 rows of data and the innerTable has 100 rows of data. It needs to be read 10,000 times.

Of course no database engine currently uses this algorithm (too slow)

Block nested loop

Block blocks, which means that one Block of data is fetched to memory at a time to reduce I/O overhead

MySQL InnoDB uses this algorithm when no indexes are available

Consider the following tables T_A and t_b

InnoDB automatically uses the Block nested loop algorithm when a join cannot be performed using an index

conclusion

When I was in school, the database teacher liked to test the database paradigm most. Until I went to work, I learned that everything should be based on performance. If redundancy is possible, then join if redundancy is impossible. Try increasing your JOIN_buffer_size or switching to a solid state drive.

The resources

Delve into computer systems. chapter 6 memory hierarchies
www.linuxatemyram.com/play.html the author through several examples to illustrate how hard disk cache effect on the performance of program execution
www.linuxatemyram.com/ Free Parameter description
www.thegeekdiary.com/how-to-clea… At the beginning of the article, the explanation of the command is given
Juejin. Im/book / 5 BFFCB… How does MySQL run: From the root understand MySQL
Mariadb.com/kb/en/block… The official documentation from MariaDB explains the implementation of the block-nested-loop algorithm

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Why don’t SQL statements have too many joins?

Send subtopic