Why does the code specification require SQL statements not to have too many joins?

Send subtopic

Interviewer: Have you ever operated Linux?

Me: Yes

Interviewer: What command should I use to check memory usage

Me: Free or top

Interviewer: Tell me what you can see with the free command

Me: Well, you can see the memory and cache usage as shown below

Total total memory
Used Used memory
Free Free memory
Buff /cache Used cache
Avaiable Available memory

Interviewer: Do you know how to clean the used cache (buff/cache)

I: em… I don’t know

Interviewer: Sync; Echo 3 > /proc/sys/vm/drop_caches buff/cache

I :(send points, heart great joy) benefits greatly have, clear out the cache we have more available memory space, like the PC above xx guard’s small rocket, click, release a lot of memory

Interviewer: em…. Go back and wait for notice

Talk about SQL Join

Interviewer: Change the subject and talk about your understanding of join

Me: Ok (get it wrong again and you’re done, take your chance)

review

A JOIN in SQL can join a specified table based on certain criteria and return the data to the client

There are two ways to join

inner joinIn the connection

left joinLeft connection

right joinThe right connection

full joinAll connection

The above pictures are from here

Interviewer: If you need to use join statement in project development, how can you optimize and improve performance?

Me: Divided into two cases, the data scale is small, the data scale is large.

Interviewer: And?

Data size is small, all dry into memory is done
Large data scale

You can optimize the speed of the JOIN statement by adding indexes. You can reduce the number of joins by using redundant information. Minimize the number of table joins

Interviewer: It can be concluded that the JOIN statement is relatively performance-expensive, right?

I: yes,

Interviewer: Why?

The buffer

Me: There must be a comparison process when executing the JOIN statement

Interviewer: Yes

I: It is slow to compare two tables one by one, so we can read the data from two tables into a block of memory. For example, MySQL InnoDB engine uses the following statement to check the related memory area.

As shown in the figure below, the size of join_BUFFer_SIZE will affect the performance of our join statement

Interviewer: What else?

A big premise

Me: Any project will eventually come online and inevitably generate data on a scale that can’t be too small

Interviewer: That’s right

Me: Most of the data in the database is eventually saved to the hard disk and stored as files.

Take MySQL’s InnoDB engine as an example

InnoDB topage(page) is the basic IO unit, and each page is 16KB in size
InnoDB creates a table for each table to store data.ibdfile

validation

Me: That means we have to read as many files as we want to join, and while we can take advantage of indexes, we still have to constantly move the head of the hard disk

Interviewer: So frequent movement of the head will affect performance

Me: Yes, don’t open source frameworks like to say they have greatly improved performance with sequential reads and writes, such as hbase and Kafka

Interviewer: That’s right. Do you think Linux has optimized for that? Hint, you can run the free command again to see

Me: Wonder why the cache takes up 1.2 gigabytes

Image source here

Interviewer: Have you ever thought about it

buff/cacheWhat’s in it?
whybuff/cacheIt takes up that much memory, so the available memory is 1, 2availlableThere are1.1 G?
Why can you clean it up with two commandsbuff/cacheOccupied memory that you want to releaseusedCan only be done by terminating the process?

Product, you fine product

After thinking about it for a few minutes

Me: The buff/cache is not important, so clearing it will not affect the operation of the system

Interviewer: Not exactly

Me: Is it? I remember a sentence from CSAPP (In-depth Understanding of computer Systems)

The essence of a storage hierarchy is that each tier of storage devices is a cache for the lower tier

Translation: Linux treats memory like a cache for your hard drive

Related information tldp.org/LDP/sag/htm…

Interviewer: Now do you know how to answer the scoring question

I: I…

The Join algorithm

Interviewer: Given another chance, what would you do if you were to implement the Join algorithm?

Me: Without an index, the nested loop is done. If there are indexes, they can be used to improve performance.

Interviewer: Back to join_buffer, what do you think is stored in join_buffer?

Me: During the scan, the database selects a table and puts the data it wants to return and compare with other tables into join_buffer

Interviewer: What happens when there is an index?

I: this is relatively simple, directly read the index tree of two tables for comparison on the completion of the ao, I introduced here no index processing mode

Nested Loop Join

The outerTable has 100,000 rows of data and the innerTable has 100 rows of data. The outerTable has 100 rows of data and the innerTable has 100 rows of data. It needs to be read 10,000 times.

Of course no database engine currently uses this algorithm (too slow)

Block nested loop

Block blocks, which means that one Block of data is fetched to memory at a time to reduce I/O overhead

MySQL InnoDB uses this algorithm when no indexes are available

Consider the following tables T_A and t_b

InnoDB automatically uses the Block nested loop algorithm when a join cannot be performed using an index

conclusion

When I was in school, the database teacher liked to test the database paradigm most. Until I went to work, I learned that everything should be based on performance. If redundancy is possible, then join if redundancy is impossible. Try increasing your JOIN_buffer_size or switching to a solid state drive.

The appendix

Next announcement => Self-description of the mechanical hard disk

Review of “Xiao Ke’s MySQL Learning Road”

Why should we try to avoid FileSort?

The resources

Delve into computer systems. chapter 6 memory hierarchies
Experiments and Fun with the Linux Disk Cache are a few examples of how hard disk caching affects application performance
Linux ATE My RAM Free parameter description
How to clear the buffer/pagecache (disk cache) under Linux
How does MySQL run: From the root understand MySQL
Block Bested loop From the official Documentation of MariaDB explains the implementation of the block-nested-loop algorithm