MYSQL

The data structure

1. Data structure of B + tree. Compared with balanced binary tree, each parent node can have more child nodes, the height of the tree is lower, and the nodes are arranged in order. 2. Compared with B tree, B + tree puts the data stored by each node into leaf nodes. Mysql storage unit is page.


The storage engine

Myisam:

Non-clustered index (leaf node stores ID, leaf stores data pointer, index file and data file are separated)

Innodb:

Cluster index (leaf node is ID, leaf holds complete data)


The index

According to data structure:

Hash index (by hash value, hash conflict traverses the list, does not support range lookup) B tree index (friendly support range lookup and data traversal)

According to physical structure:

Clustered index, non-clustered index

According to logic:

Primary key index, secondary index single column index, composite index

When to create an index:

Fields commonly used for query conditions 2. Fields associated with tables as foreign keys 3. Fields used for grouping and sorting


The data type

Integer type:

BIT, BOOL, TINY INT, SMALL INT, MEDIUM INT, INT, BIG INT

Floating point types:

FLOAT, DOUBLE, DECIMAL

String type:

CHAR, VARCHAR, TINY TEXT, TEXT, MEDIUM TEXT, LONGTEXT, TINY BLOB, BLOB, MEDIUM BLOB, LONG BLOB

Date type:

Date、DateTime、TimeStamp、Time、Year

Other data types:

BINARY, VARBINARY, ENUM, SET, Geometry, Point, MultiPoint, LineString, MultiLineString, Polygon, GeometryCollection, etc

The transaction

Principle:

A (atomicity) C (consistency) I (isolation) D (persistence)

How to ensure atomicity:

Undo_log: Each record hides two fields, trx_id (transaction ID) and roll_pointer (previous transaction address). Undo_log maintains a version chain for each record, and when a transaction fails, it rolls back to the previous one. If it is a new record, it does not have a version chain and can be deleted during rollback.

How to ensure consistency:

Redo_log: Database changes are first recorded in redo_log and changes in memory. Because logging is sequential I/O, there is no need for redundant disk seek, and the I/O speed must be random. Fast I/O is written to data files only when the engine is idle. If the system crashes, the redo_log will also be used to look for the changes before the crash, and then decide whether to roll back or commit and synchronize to the data file.

How to ensure isolation:

  1. Optimistic locking:

MVCC version control 2. Pessimistic locks: row level locks, gap locks, table locks and other locks

MVCC:

  1. Current read and snapshot read:
  • Current read: Locks are used to ensure that the latest record is read
  • Snapshot read: Read records of one version without locking using MVCC
  1. Hidden fields:
  • Roll_pointer: Address of the record that saved the last transaction
  • Trx_id: transaction id
  1. Undolog logs:
  • Save a record of each change and their hidden fields, forming a version chain according to roll_pointer
  1. The read – the view:
  • ReadView: is a list of version numbers of currently uncommitted transactions
  1. Transaction visibility:
  • If the transaction ID is less than the minimum ID of readView, the transaction is visible.
  • If the transaction ID is greater than the readView’s maximum ID, the transaction is not visible.
  • The transaction ID is in the readView’s maximum and minimum ID. If the transaction is not in the readView array, it is visible; otherwise, it is invisible.
  • Rc: create a readVew for each query within a transaction
  • Rr: Each query in a transaction uses only the first readView created

Lock:

  1. By type of operation on data:
  • Shared lock: The shared lock is used for reading
  • Exclusive lock: Use exclusive lock when making changes
  1. From the granularity classification of data operations:
  • Record Locks: Locks added to an index that Record only one row
  • Gap Locks: Locks records in an interval, but does not prevent other Gap Locks from acquiring the Locks in Gap Locks
  • Next-key Locks: A combination of record Locks and gap Locks, which are Locks that are placed on a record and an interval for that record

Isolation level:

  1. Ru:
  • Read: unlocked, unblocked write
  • Write: lock, block write, do not block read
  1. Rc:
  • Select, update, delete, all use record lock, so phantom read is still generated
  • A transaction creates a new readVew for each query, which can cause unrepeatable reads
  1. Rr:
  • Select, update, and DELETE all use next_key_lock

To lock records within the index range, avoiding data insertion into the lock range, thus solving the phantom read problem

  • Each query in a transaction uses only the first readView created, eliminating the problem of non-repeatable reads
  1. The serializable.
  • Read and shared lock, write and exclusive lock, read and write mutually exclusive lock

Mysql tuning

  1. Causes of poor SQL performance:
  • The SQL statement is not written properly
  • No index walk
  • Too many table associated joins
  • Database parameter setting issues, such as buffering and thread pools
  • The CPU and memory bottleneck of the server. The CPU is needed to read data from hard disk to memory, and there is a memory bottleneck when there is not enough memory to load data
  1. Analysis method:

  • Slow Query logs

The slow log function is disabled by default. You can manually enable the slow log function. After the slow log function is enabled, the SQL whose time is longer than long_query_time is recorded

  • Explain statement analysis

Id: specifies the serial number. If the id is large, execute select_type first. The same as select_type from top to write: Query type, simple (simple query), primary (outer query with subquery), subQuery (subquery), Union table: the name of the query table type: All (non-index scan) index (full index scan) range (index scan) ref (query based on non-unique index, Possible multiple results) eq_ref (based on unique index) const (found once through index) System (in case of only one row) Possible_keys: Key: the index field that is actually used key_len: Ref: Which column of the index is being used, possibly a constant. Rows: The approximate number of rows that need to be read Using filesort using temporary tables to hold intermediate results using index Using join buffer using join buffer impossible WHERE (where condition is always false)

  1. The show command displays the system status

Mysql> show status mdash; mdash; Display status information (extended show status like lsquo; XXXrsquo;) Mysql> show variables mdash; mdash; Display system variables (extend show variables like lsquo; XXXrsquo;) Mysql> show innodb status mdash; mdash; Mysql> show processList mdash; mdash; Shell> mysqladmin variables -u username -p passwordmdash; mdash; Display system variables Shell> mysqladmin extended-status -u username -p passwordmdash; mdash; Display status information 4. Optimization methods:

  • The index optimization

Use the left-most matching rule not to do function operations on index fields and build overlays if possible

  • . Query optimization

Small tables drive large tables (in is small, exists can be large)

  • Data type optimization

Use the smallest fields that apply. Use simple data types, such as integers, which are more CPU-friendly than strings, and avoid null values

Depots table

Table:

  1. Vertical table:
  • You can put the most popular fields in one table and the rest of the less used fields in another table. The relationship between the primary and secondary tables should be one-to-one.
  1. Level table:
  • When the data in a single table is larger than 500W, the data can be split into different tables. Of course, the separate tables can be placed in the same database or in different databases
  • Split by time
  • Split by heat
  • Split by ID size
  • After splitting, take modules according to hash_mod to distribute evenly

Depots:

  1. Why to separate library:

In the database cluster environment, there are many slaves, which basically meet the reading operation. However, write operations or large data and frequent write operations have a great impact on master performance. In this case, a single library cannot solve the problem of large-scale concurrent write, so separate libraries are considered. 2. How to divide databases: There are too many tables in one database, which leads to massive data and system performance degradation. Split the tables originally stored in one database and store them in multiple libraries, usually dividing the tables according to function modules and relationship degree and deploying them to different libraries

Distributed transaction

  1. Selection: can not use distributed transactions, if you have to use, combined with their own business analysis, to see which business is more suitable for their own, is care about strong consistency, or the final consistency can be
  2. Solution:
  • 2 PC:

First stage: inform all affairs related to the server to perform commit operation, and returns the success The second stage: according to the results of the first phase of the decision is to perform or rollback faults: transaction manager single point problem Accrued in the end is blocked, the concurrent unfriendly The second stage network problems can lead to inconsistent data

  • TCC (try-confirm-cancel) :

Cancle: Release the reserved try resources and retry after a failure. Idempotency is guaranteed. Advantages: 1. The single point of coordinator is resolved, and the business activity is initiated and completed by the main business. The business activity manager has also become multi-point, introducing clustering. 2. Synchronization block: A timeout is introduced and the system compensates after the timeout. The entire resource is not locked and converted to service logic with smaller granularity. 3. Data consistency. With compensation, consistency is controlled by the business activity manager

  • Local message table:

Service A maintains A local message table, obtains unexecuted tasks from the local message table with A scheduled task after A transaction operation, and then pushes them to the message queue. Service B also maintains the local message table, updates the service table and the local message table after receiving the message, and notifies service A to update the local message table. Mq transactions are implemented by encapsulating local message tables

A distributed ids

  1. Uuid:

Unordered string, query efficiency Slow length excessive length Autoincrement mode: Setting the start bit and the autoincrement step are not suitable for expansion 2. Number segment mode: create a new number segment table, including the maximum ID, corresponding to the service, version number, each time from the number segment table, and maintain the number segment table, have the version number, concurrency friendly, because it can be updated like optimistic lock 3. Use redis inRC command, but need to use AOF persistence, and every commit should be persistent, otherwise after downtime will cause the ID duplication, so that the AOF file records a lot of INRC command, restart will be a long time to rewrite aOF, will take a long time 4. Snowflake algorithm: the first bit is positive and negative, generally positive 0. 41-bit timestamp is accurate to millisecond. 10-bit machine ID can define its own 12-bit serial number identification. Many countries have daylight saving time (DST). In winter, the id of the DST will be duplicated. Therefore, different algorithms can be used in summer and winter

Three paradigm

  • First paradigm: Each field should be an indivisible unit
  • Second normal Form: Each field must be associated with an ID, not part of it
  • Third normal Form: each field should be directly associated with ID, not indirectly. For example, the head teacher and the head teacher age field in the student table, the head teacher age is not directly associated with ID, but indirectly associated with student ID through the head teacher

Reference:

  • “Next time someone asks you about distributed transactions, throw this to them.” by Coffee Latte
  • 9 Ways to Generate Distributed ids in one go, and The Interviewer is a little Confused
  • The school recruit mysql binlog those thing | log module/redolog undolog “, author: mo coding
  • “MySQL 30 thousand words essence summary + interview 100 questions, and the interviewer more than enough to debate (collection series)” — by Jia Not false
  • Integrate the knowledge point, thank you very much