The previous section looked at properties such as ACID in a database together. Part 2 of the Database Interview series shares common steps for database tuning, underlying principles of indexing, page mechanisms, and more. The overall mind map is shown below.



1 positioning

The server is located from three aspects: user feedback, log recording, and server Intranet monitoring.

  • User feedback

Users are the most direct feedbacks, and their feedback is an important step towards further optimization of the system.

  • Server Resource Monitoring

Try not to scramble while waiting for user feedback. Usually the project will have a relatively complete server monitoring system, the so-called “monitoring is not in place, the leadership of two lines of tears”. So what does surveillance look like? In addition to basic indicators such as CPU and IO of the server, the access trend table usually displays the overall access volume of the service, response supply time, and number of errors. Performance reports show which resources or services are faulty.

Monitoring indicators
  • Log analysis

In addition to Linux system logs, there are database logs to locate problems based on.

Linux logs

Several directions for data tuning

  • Database selection

Select a database based on your application and service requirements. Whether transactions are considered, row storage or column storage, etc.

  • Optimize the table

(1) If there are many query analyses, the method of exchanging space for time can be adopted to increase redundant fields to improve query efficiency.

(2) The data types of different fields are directly related to the query efficiency and storage size.

(3) The third normal form is adopted to make the structure clearer and reduce redundant fields.

  • Fair use index

(1) The more indexes, the better. Indexes also need to occupy storage space, and will increase the calculation time of screening indexes.

(2) The repetition of data is too high to use the index.

(3) Index columns in different positions have a greater impact on the index. For example, in the WHERE clause, evaluating an index field invalidates the index.

  • Use redis etc as cache

Caches are classified as static caches, distributed caches, and hotspot caches. My understanding of cache is to solve the difference of different hardware speed, coordinate and make full use of hardware resources. Examples of caching are everywhere, from Linux kernel management TLB to HTTP caching mechanisms. In general, many places use caching to speed up access and minimize direct interaction with the database.

  • Repository level optimization

(1) In the case of more read and write, through the use of read and write separation to reduce the load of the database.

(2) Sub-database sub-table. Shards the database to multiple servers.


2 Principles of indexes

An index is a dictionary’s table of contents, which allows you to quickly locate the contents without having to spend time looking from beginning to end. Is it indexed to take off? That’s not necessarily true.

(1) In case of small amount of data, no index is added

(2) In the case of large data flow, index should be considered

Types of indexes

(1) Ordinary index

No constraint

(2) Unique index

Added uniqueness constraint, a data table can have multiple unique indexes

(3) Primary key index

On the basis of unique index, add non-null constraint.

(4) Full-text index

Mysql supports full-text indexes in English, usually using ES instead

Clustered indexes and non-clustered indexes

(1) Aggregate index

Find the desired value directly by index location. The following figure

(2) Non-clustered index

Index entries are stored sequentially, but pointing to content is random. So the first time you find the index, you need to find the position of the index a second time to fetch the row. As shown in the figure below.

(3) Differences between the two

  • Clustered index leaf nodes hold data values. The location where a non-clustered index leaf node holds data rows
  • A table can have only one clustered index but can have multiple non-clustered indexes
  • Clustered index query is efficient, while non-clustered index query is inefficient

Suitable for indexing

  • Field uniqueness properties

We know that both a unique index and a primary key index have unique constraints

  • Need to Group by and ORDER by frequently
  • Indexing allows data to be stored and retrieved in some order.
  • Indexes for distinct fields need to be created

When do you not need to create an index

  • Fields that are not used in the WHERE condition do not need to be indexed
  • Too few table records
  • There is a lot of repetition in the field
  • Fields that are frequently updated. Update field also update index, index many, update index time will become a burden.

Dead index

  • Indexes fail when viewing table execution plans using expressions such as EXPLAIN

EXPLAIN SELECT name from..

  • Using a function on an index also fails
  • Do not use “%” for fuzzy queries with “like”, otherwise it will fail

Binary tree

Binary search is an efficient retrieval method with O(log2n) time complexity, but in special cases it degenerates into a linked list resulting in O(n) time complexity. Then the concept of balanced binary tree is proposed, but the depth of binary tree is still O(log2n), data query depends on disk IO, so the m-tree is transformed. For example, b-tree, for a 1000-order B-tree, only three levels are needed to store 1000W of index data, because the height is much lower than the binary tree. To improve the stability of the query, B+ trees appear.

So here’s an interview question

B tree and B plus tree

  • B+ tree queries are more stable because the data is always found in the leaf node during the query. Non-leaf nodes in a B tree also store data
  • B+ trees are shorter and fatter, requiring less disk IO for queries. The SAME disk page, B+ tree can hold more node keywords.

3 page structure

In a database, reading one row or more rows loads the page. A page is the basic unit of database-managed storage space.

In a database, there are concepts such as pages, extents, and segments, and the relationship between them is shown in the figure below.

Database page extents

From the figure above we know that a table space has multiple segments, one of which contains multiple extents, and one of which has multiple pages with multiple rows per page. So what exactly does it do?

area

In Innodb, a range is allocated 64 consecutive pages. The default page size is 16KB, so a range size is 64*16KB=1MB

Period of

Segments are composed of multiple extents, different segments for different database objects. Create a table segment when creating a table. If an index is created, it is an index segment.

Table space

Logical container. It contains many segments, but a segment can only belong to one tablespace. A database consists of multiple tablespaces, including system tablespaces, user tablespaces, and so on.

page

The minimum unit of database I/O operations is a page. The detailed structure of the page is shown below.


In order to know what each field on the page means, a graph is summarized as follows


The header and tail of the file encapsulate the content of the page to ensure the integrity of the page through verification. The pages are also linked together in a linked list. As shown in the figure below.


Let’s look at the records. The record section contains maximum, minimum and user records, as well as variable free space for flexible allocation of new records.


The index part

Records in a page are stored as a single linked list. We know that singly linked lists are easy to insert and delete, but not so easy to find. Therefore, page directory is introduced here, which provides binary search to improve the retrieval efficiency of records. So how does that work?

  • Records are first grouped, with the first group having only one record and the last group having the largest record
  • The total number of records stored in the last record of each group.
  • The page directory stores the address offset of the last record, also known as a slot, whose pointer points to the last record in the group.

Assume that the search key is 6 users, the page directory subscript starts from 0, using binary search. (1) mid=(low+high)/2=1, at this time, the maximum record of slot 1 is 4,4<6, then search for (2) mid=(mid+high)/2=2, at this time, the maximum record of slot 2 is 8,8>6, directly search in slot 2, iterate and take out.

Pessimistic lock and optimistic lock

Locks are used in many ways. We are familiar with multi-threading, thread synchronization, etc. Locks can be used to adjust the running order and maintain consistency. In a database, there are row locks, table locks, and page locks in granularity.

(1) row locks

Advantages: Small lock granularity, low lock conflict probability, high concurrency. Disadvantages: Lock overhead, prone to deadlock

(2) page locks

We know from the previous summary that pages contain rows, so page locks have more data resources than row locks. Deadlocks occur between overhead rows and table locks.

(3) the table lock

Advantages: Low overhead lock, fast lock. Disadvantages: Large locking force, large lock conflict probability.

From the perspective of database management, it can be divided into shared lock and exclusive lock

(1) the Shared lock

It can be read by the user, but cannot be modified.

(2) the exclusive lock

Also called an exclusive lock, write lock, or X lock. It can only be used by locked transactions and cannot be modified or queried by other transactions. So when we use update operations, we use exclusive locks to prevent other things from changing.

From the perspective of developers, there are optimistic locks and pessimistic locks

Optimistic locking: controlled by its own timestamp or version mechanism. Pessimistic locking: Data operation exclusivity is guaranteed by the database’s own mechanism

4 Common PROCEDURES for SQL analysis

(1) Check whether there is any periodic rule. If there is, we can consider updating cache strategy or adding cache

(2) If not, consider whether the query statement itself, so as to analyze the query statement. You can try any of the query optimization methods described earlier. At the same time, the introduction of slow queries allows you to know which statements are executing slowly

(3) If you find the slow statement, you can introduce explain to check the SQL execution plan. Through expalin, you can know the data table reading order, the actual index used, the number of rows optimized, etc.

(4) Finally use show Profile to understand execution costs. This is off by default, turned on using set ‘profiling’=”on”.

This is the end of the article, thank you for your view, and please give a thumbs-up at the end of the article. Bye. I’ll see you next time.