Small knowledge, big challenge! This article is participating in the creation activity of “Essential Tips for Programmers”.

  • What is an index?

In relational databases, an index is a single, physical storage structure that sorts the values of one or more columns in a database table. It is a set of values of one or more columns in a table and the corresponding logical pointer list to the data page in the table that physically identifies these values. In short, it’s an ordered, fast lookup data structure.

  • Common index types

    • Single-valued index (single-column index) : An index contains only a single column, and a table can have multiple single-column indexes.
    • Unique index: Index columns must be unique, but empty values can be allowed
    • Composite index: An index contains multiple columns
  • Advantages and disadvantages of applicable indexes

    • advantages

      • Search: improve data retrieval efficiency and reduce IO costs.
      • Sort: Sort data by index, reducing sorting cost and CPU consumption
    • disadvantages

      • In fact, the index is also a table that holds the primary key and index fields and points to the index’s records, so the index column also takes up space.
      • Updating a table (INSERT, UPDATE, DELETE) not only saves data but also updates newly added index columns in the save index file.
  • When is it appropriate to create an index?

    • Primary key Automatically creates a primary key index
    • Fields that are frequently queried should be indexed
    • The foreign key relationship is used to index the fields associated with other tables in the query
    • Composite indexes tend to be built under high concurrency
    • A sort field in a query that can be accessed through an index greatly improves sorting speed
    • Query statistics or grouped data
  • Which situations are not suitable for indexing?

    • Fields that are frequently updated
    • Indexes are not created for fields that are not used by the WHERE condition
    • Too few table records
    • A watch that is often added, deleted, or modified
    • Data repeat too many fields, for it makes little sense to build index (if a table is 100000, there was a field the only two values, T and F the distribution probability of each value is only about 50%, then indexed on this field does not generally improve query efficiency, selectivity refers to the different index column of the index value than data and index record in the table, if, There are 2000 records in a table, and there are 1980 records of different values in the index column in the table. The selectivity of this index is 1980/2000=0.99. If the index item is closer to 1, the index efficiency is higher.)

(3) About SQL optimization

There is no fixed standard for SQL optimization. In essence, it is to do subtraction, reduce IO, CPU and other consumption, so that SQL execution is faster and ultimately meet our performance requirements.

SQL optimization is mainly carried out from three aspects

  • Try to go index;
  • Avoid full table scan as much as possible;
  • Minimize queries for invalid data;

Some optimization methods are as follows:

  • First consider indexing the columns involved in WHERE and Order BY
  • Try to avoid null values for fields in the WHERE clause
  • Avoid using it in where clauses! = or <> operator
  • Try to avoid using or to join conditions in where clauses
  • In and not in should also be used with caution, otherwise full table scanning will occur
  • Avoid vague queries like ‘%XXX’ at the beginning of a field
  • Try to avoid expression or function manipulation of fields in the WHERE clause
  • Do not perform functions, arithmetic operations, or other expression operations to the left of “=” in the WHERE clause
  • Minimize select* by using overridden indexes (queries that access only the index (the index column is the same as the query column)
  • Order by conditions must be the same as where conditions, otherwise order by will not be sorted by index

The order by optimization

  • Avoid filesort

  • Sort as many indexes as possible, following the best left prefix rule

  • Filesort has two sorts:

    • Dual-path sort: Scans disks twice
    • Single-path sort: read data stored in memory at one time, and pull data again

Single-path sort is the last one, which is generally better than two-path sort

  • Optimization strategy:

    • Increases the sort_buffer_size parameter setting
    • Increases the setting of the max_LENGTH_FOR_sorT_DATA parameter

Reason: Get as much memory as possible