Have feelings, have dry goods, wechat search [three prince Ao Bing] pay attention to this different programmer.

This article has been included in GitHub github.com/JavaFamily, there are a line of large factory interview complete test sites, information and my series of articles.

National Day at home boring, I casually looked at the home database related books, this turn I read addiction, because the university more familiar with some database paradigm I actually forgot, with curiosity I read a small National Day.

In the process of reading, I also made some small notes, which may not be as interesting as my previous system article, but it is definitely full of dry stuff, suitable for everyone to review or interview surprise suit to see, not to say put pictures first.

The storage engine

InnoDB

InnoDB is MySQL’s default transactional storage engine, only consider using other storage engines when you need features it doesn’t support.

InnoDB uses MVCC to support high concurrency and implements four standard isolation levels (uncommitted read, committed read, repeatable read, serializable). REPEATABLE READ is the default level. In REPEATABLE READ level, phantom reading is prevented by MVCC + next-key Locking.

A primary index is a clustered index that stores data in the index to avoid reading directly from disk, thus providing high performance for primary key queries.

Internal optimizations include predictable reads for reading data from disk, adaptive hash indexes that automatically create hash indexes in memory to speed up reads, and insert buffers that speed up inserts.

InnoDB supports true online hot backup, other storage engines of MySQL do not support online hot backup. To obtain a consistent view, you need to stop writing to all tables. In a mixed read/write scenario, stopping writing may also mean stopping reading.

MyISAM

The design is simple and data is stored in a compact format. It can still be used for read-only data, or for tables that are small enough to tolerate fixes.

Provides a number of features, including compressed tables, spatial data indexes, and more.

Transactions are not supported.

Row-level locking is not supported. Only the entire table can be locked. Shared locks are added to all tables to be read, and exclusive locks are added to tables to be read. However, new records can be inserted into a table while a read is being performed on the table, which is called CONCURRENT INSERT.

Check and repair operations can be performed manually or automatically, but unlike transaction recovery and crash recovery, some data can be lost and repair operations are very slow.

If the DELAY_KEY_WRITE option is specified, the modified index data will not be written to the disk immediately after the modification is complete, but will be written to the key buffer in memory. The corresponding index block will be written to the disk only when the key buffer is cleared or the table is closed. This method can greatly improve the write performance, but in the case of database or host crash, the index is damaged and needs to be repaired.

InnoDB vs. MyISAM

  • Transactions: InnoDB is transactional and can use Commit and Rollback statements.

  • Concurrency: MyISAM only supports table level locking, while InnoDB also supports row level locking.

  • Foreign keys: InnoDB supports foreign keys.

  • Backup: InnoDB supports online hot backup.

  • Crash recovery: MyISAM crashes at a much higher rate than InnoDB, and recovery is slower.

  • Other features: MyISAM supports compressed tables and spatial data indexes.

The index

Principle of B + Tree

The data structure

B Tree is a search Tree, and all the leaf nodes are in the same layer.

B+ Tree is a variant of B Tree. It is implemented based on sequential access Pointers to B Tree and leaf nodes. It is commonly used in databases and file systems of operating systems.

B+ trees have two types of nodes: internal nodes (also known as index nodes) and leaf nodes. Internal nodes are non-leaf nodes. Internal nodes do not store data, but only indexes.

Keys in the internal node are arranged in ascending order. For a key in the internal node, all keys in the left subtree are smaller than it, and all keys in the right subtree are greater than or equal to it. Records of leaf nodes are also arranged in ascending order.

Each leaf node holds Pointers to adjacent leaf nodes.

operation

To find the

Lookups occur in a typical fashion, similar to a binary search tree. Starting at the root node, the tree is traversed from top to bottom, selecting a child pointer whose split value is on either side of the value to be looked for. A typical use within a node is binary lookup to determine the location.

insert

  • Perform a search to determine what bucket the new record should go into.

  • If the bucket is not full(a most B-1 entries after the insertion, b is an integer multiple of the page),add THT record.

  • Otherwise,before inserting the new record

    • split the bucket.
      • Original node has “(L+1)/2” items
      • New node has “(L+1)/2” items
    • Move “(L+1)/2” -th key to the parent,and insert the new node to the parent.
    • Repeat until a parent is found that need not split.
  • If the root splits,treat it as if it has an empty parent ans split as outline above.

B-trees grow as the root and not at the leaves.

delete

Similar to insertion, but a bottom-up merge operation.

Common features of trees

AVL tree

Balanced binary tree, is commonly with balance factor difference determined and done by rotating, left and right subtrees tree height difference is not more than 1, then compare it with red and black tree is strict balanced binary tree, the equilibrium conditions are very strict (tree height difference is only 1), as long as the insert or delete does not meet the above conditions to maintain a balance rotation. Because rotation is very time consuming. Therefore, AVL trees are suitable for scenarios where the number of insertions/deletions is low but the number of look-ups is high.

Red and black tree

By constraining the color of each node along the path from the root to the leaf, it ensures that no path is twice as long as the others and is therefore approximately balanced. Therefore, compared with the AVL tree, which strictly requires balance, its rotation keeps balance fewer times. Suitable for less search, insert/delete scenarios. (Some scenarios now use skiplist instead of red-black tree. Search “Why redis using skiplist instead of Red-black?” )

B/B + tree

Multi-path search tree, high output and low DISK I/O, commonly used in database systems.

Comparison of B + trees and red-black trees

Balanced trees such as red-black trees can also be used to implement indexes, but file systems and database systems generally use B+ Tree as an index structure, mainly for the following two reasons:

(1) Disk I/O count

A node in a B+ tree can store multiple elements. Compared with a red-black tree, a node has a lower tree height and fewer disk I/O times.

(2) Disk prefetch

To reduce disk I/O operations, disks are often preread each time rather than read strictly on demand. During the prefetch process, sequential data reads from disks do not require disk seek. An integer multiple of the page is read each time.

The operating system generally divides the memory and disk into fixed size blocks, each called a page, and the memory and disk exchange data on a page basis. The database system sets the size of one node of the index to the size of a page, so that a node can be fully loaded in one I/O.

B + tree versus B tree

B+ trees have lower DISK IO

The internal nodes of the B+ tree do not have Pointers to specific information about the keyword. So the internal nodes are smaller than the b-tree. If all the keywords of the same internal node are stored in the same disk block, then the disk block can contain more keywords. Read into memory at a time to find more keywords. The number of IO reads and writes is relatively low.

The query efficiency of B+ tree is more stable

Because the non-leaf nodes are not the nodes that ultimately point to the contents of the file, they are only the indexes of the keywords in the leaf nodes. So any keyword lookup must take a path from root to leaf. The length of all keyword query paths is the same, resulting in the same query efficiency of each data.

B+ tree element traversal efficiency is high

B tree improves disk IO performance but does not solve the problem of low efficiency of element traversal. It was to solve this problem that B+ trees were born. B+ trees can traverse the entire tree simply by traversing the leaves. Moreover, range-based queries are very frequent in the database, and b-trees do not support such operations (or are inefficient).

MySQL index

Indexing is implemented at the storage engine layer, not the server layer, so different storage engines have different index types and implementations.

B + Tree index

Is the default index type for most MySQL storage engines.

  • Because you no longer need to do a full table scan, you just need to search the tree, so lookups are much faster.

  • Because of the order of B+ Tree, it can be used for sorting and grouping in addition to searching.

  • Multiple columns can be specified as index columns, which together form a key.

  • Applies to the full key value, the key value range, and the key prefix search, where the key prefix search only applies to the leftmost prefix search. Indexes cannot be used if they are not looked up in the order of the index columns.

InnoDB B+Tree index is divided into primary index and secondary index. The data field of the leaf node of the main index records complete data records, which is called clustering index. Because rows cannot be stored in two different places, a table can have only one clustered index.

The data field of the leaf node of the secondary index records the value of the primary key. Therefore, when using the secondary index to search, it is necessary to find the primary key value first and then search in the primary index, which is also called back to the table.

The hash index

Hash indexes can look up at O(1) time, but lose order:

  • Cannot be used for sorting and grouping;
  • Supports only precise lookups, not partial and range lookups.

InnoDB storage engine has a special feature called “adaptive hash index”. When an index value is used very frequently, it creates a hash index on top of the B+Tree index. This gives the B+Tree index some advantages of hash index, such as fast hash lookup.

The full text indexing

The MyISAM storage engine supports full-text indexing, which is used to find keywords in text rather than directly compare for equality.

Search criteria use MATCH AGAINST instead of the normal WHERE.

Full-text indexing is implemented using inverted indexes, which record the mapping of keywords to the documents in which they reside.

InnoDB storage engine also began to support full-text indexing in MySQL 5.6.4.

Spatial data index

MyISAM Storage engine supports Spatial Data Index (R-Tree), which can be used for geographic data storage. Spatial data indexes index data from all dimensions and can effectively use any dimension for combined queries.

GIS related functions must be used to maintain the data.

The index optimization

Separate columns

The index column cannot be part of an expression or a parameter to a function when performing a query, otherwise the index cannot be used.

For example, the following query cannot use the index of the ACtor_ID column:

SELECT actor_id FROM sakila.actor WHERE actor_id + 1 = 5;
Copy the code

The column index more

Multi-column indexes perform better than single-column indexes when multiple columns are required for conditional queries. For example, in the following statement, it is best to set actor_id and film_id to multi-column indexes.

SELECT film_id, actor_ id FROM sakila.film_actor
WHERE actor_id = 1 AND film_id = 1;
Copy the code

Order of index columns

Put the most selective index columns first.

Index selectivity is the ratio of non-duplicate index values to the total number of records. The maximum value is 1, at which point each record has a unique index corresponding to it. The higher the selectivity, the higher the distinction between each record and the higher the query efficiency.

For example, customer_id is more selective than staff_id in the results shown below, so it is best to place the Customer_id column in front of the multi-column index.

SELECT COUNT(DISTINCT staff_id)/COUNT(*) AS staff_id_selectivity,
COUNT(DISTINCT customer_id)/COUNT(*) AS customer_id_selectivity,
COUNT(*)
FROM payment;
Copy the code
Staff_id_selectivity: 0.0001 CUSTOMer_id_SELECTIVITY: 0.0373 COUNT(*): 16049Copy the code

The prefix index

For BLOB, TEXT, and VARCHAR columns, you must use a prefix index that indexes only the first part of the character.

The selection of prefix length depends on index selectivity.

Cover index

The index contains the values of all the fields to be queried.

Has the following advantages:

  • Indexes are typically much smaller than the size of a data row, and reading only indexes can greatly reduce data visits.
  • Some storage engines, such as MyISAM, only cache indexes in memory, and the data is dependent on the operating system for caching. Therefore, just accessing the index can be done without using a system call (which is usually time-consuming).
  • With the InnoDB engine, there is no need to access the primary index if the secondary index overwrites the query.

Advantages of indexes

  • Greatly reduces the number of rows the server has to scan.

  • Helps the server avoid sorting and grouping, as well as creating temporary tables (B+Tree indexes are ordered and can be used for ORDER BY and GROUP BY operations). Temporary tables are created mainly during sorting and grouping, so there is no need to create temporary tables by sorting and grouping.

  • Change random I/ OS to sequential I/ OS (the B+Tree index is ordered and stores adjacent data together).

Conditions for the use of indexes

  • For very small tables, a simple full table scan is more efficient than indexing in most cases;

  • For medium to large tables, indexes are very effective;

  • But for very large tables, the cost of building and maintaining indexes increases. In this case, instead of matching record by record, you need a technique that can directly distinguish the set of data to be queried, such as partitioning.

Why is a simple full table scan more efficient than indexing in most cases for very small tables?

If a table is small, it is obviously faster to traverse the table directly than to walk through the index (because you need to return to the table).

Note: First of all, note that the implied condition of this answer is that the data queried is not part of the index, and no back operation is required. Second, the query condition is not a primary key, otherwise the data can be retrieved directly from the clustered index.

Query performance optimization

Analyze select query statements using Explain

Explain is used to analyze SELECT query statements, and developers can optimize query statements by analyzing explain results.

select_type

SIMPLE SIMPLE query, UNION UNION query, SUBQUERY and so on are commonly used.

table

The table to query

possible_keys

The possible indexes to choose

Optional index

key

The index actually chosen

The actual index used

rows

Estimate of rows to be examined

Number of rows scanned

type

Index query types, commonly used index query types:

Ref: use a non-unique index range: use the primary key, secondary index of a single field, and the last field of a secondary index of multiple fields. Index: scan the index tree.

system

Trigger condition: The table has only one row, which is a special case of const type

const

Trigger condition: only one row matches a query using a primary key or unique index.

SELECT * FROM tbl_name WHERE primary_key=1;

SELECT * FROM tbl_name
  WHERE primary_key_part1=1 AND primary_key_part2=2;
Copy the code

eq_ref

Trigger condition: when a join query uses a primary key or unique index and matches only one row of records

SELECT * FROM ref_table,other_table
  WHERE ref_table.key_column=other_table.column;

SELECT * FROM ref_table,other_table
  WHERE ref_table.key_column_part1=other_table.column
  AND ref_table.key_column_part2=1;
Copy the code

ref

Trigger condition: Use non-unique index

SELECT * FROM ref_table WHERE key_column=expr;

SELECT * FROM ref_table,other_table
  WHERE ref_table.key_column=other_table.column;

SELECT * FROM ref_table,other_table
  WHERE ref_table.key_column_part1=other_table.column
  AND ref_table.key_column_part2=1;
Copy the code

range

Trigger condition: Range is only a range query using a primary key, a secondary index for a single field, or the last field of a secondary index for multiple fields

SELECT * FROM tbl_name
  WHERE key_column = 10;

SELECT * FROM tbl_name
  WHERE key_column BETWEEN 10 and 20;

SELECT * FROM tbl_name
  WHERE key_column IN (10.20.30);

SELECT * FROM tbl_name
  WHERE key_part1 = 10 AND key_part2 IN (10.20.30);
Copy the code

index

The index join type is the same as ALL, except that the index tree is scanned. This occurs two ways:

Triggering conditions:

Only the index tree is scanned

1) The query field is part of the index and overwrites the index. 2) Sort using primary keys

all

Trigger condition: Scan all tables without moving indexes

Optimize data access

Reduce the amount of data requested

  • Return only necessary columns: it is best not to use SELECT * statements.
  • Return only necessary rows: Use the LIMIT statement to LIMIT the data returned.
  • Caching repeated queries: Using caching can avoid queries in the database, especially if the data to be queried is frequently repeated, the performance improvement of caching can be significant.

Reduce the number of lines scanned by the server

The most efficient way to override queries is to use indexes.

Refactoring the query mode

Shard large query

A large query, if executed at once, can lock up a lot of data at once, consume the entire transaction log, exhaust system resources, and block many small but important queries.

DELETE FROM messages WHERE create < DATE_SUB(NOW(), INTERVAL 3 MONTH);
Copy the code
rows_affected = 0
do {
    rows_affected = do_query(
    "DELETE FROM messages WHERE create  < DATE_SUB(NOW(), INTERVAL 3 MONTH) LIMIT 10000")
} while rows_affected > 0
Copy the code

Decompose large join queries

The benefits of breaking a large join query into a single table query for each table and then associating it in the application are:

  • Make caching more efficient. For join queries, if one of the tables changes, the entire query cache becomes unusable. When multiple queries are decomposed, even if one table changes, the query cache for the other tables can still be used.
  • Split into multiple single-table queries, whose cached results are more likely to be used by other queries, thereby reducing the number of redundant record queries.
  • Reduce lock contention;
  • Connecting at the application layer makes it easier to split the database, making it easier to achieve high performance and scalability.
  • The query itself may also be more efficient. For example, IN the following example, using IN() instead of join queries allows MySQL to query by ID order, which may be more efficient than random joins.
SELECT * FROM tag
JOIN tag_post ON tag_post.tag_id=tag.id
JOIN post ON tag_post.post_id=post.id
WHERE tag.tag='mysql';
Copy the code
SELECT * FROM tag WHERE tag='mysql';
SELECT * FROM tag_post WHERE tag_id=1234;
SELECT * FROM post WHERE post.id IN (123.456.567.9098.8904);
Copy the code

The transaction

A transaction is a set of operations that meet the ACID characteristics, and a transaction can be committed by Commit or rolled back using Rollback.

ACID

The four basic properties of a transaction are ACID:

  • Atomicity: Atomicity
  • Consistency:
  • Isolation: Isolation
  • They offer Durability.

atomic

A transaction is considered the smallest unit that is indivisible, and all operations of the transaction either all succeed or all fail and roll back.

consistency

The database maintains a consistent state before and after a transaction. In a consistent state, all transactions read the same data.

Isolation,

Changes made by one transaction are not visible to other transactions until they are finally committed.

persistence

Once a transaction commits, its changes are stored in the database forever. Even if the system crashes, the results of the transaction execution cannot be lost.

ACID

The concept of ACID properties of transactions is simple but difficult to understand, mainly because the properties are not in a horizontal relationship:

  • The result of a transaction is correct only if consistency is satisfied.
  • In the absence of concurrency, transactions are executed sequentially and isolation must be satisfied. So long as atomicity can be satisfied, consistency can be satisfied. In the case of concurrency, multiple transactions are executed in parallel. Transactions must satisfy not only atomicity but also isolation to satisfy consistency.
  • Transactions are persisted in the event of a database crash.

Isolation level

READ UNCOMMITTED

Changes in a transaction are visible to other transactions even if they are not committed.

READ COMMITTED

A transaction can only read changes made by committed transactions. In other words, changes made by one transaction are not visible to other transactions until they are committed.

REPEATABLE READ

Ensure that the same data is read multiple times in the same transaction with the same result.

SERIALIZABLE

Force transactions to be executed serially.

Locking implementations are required, whereas other isolation levels usually do not.

Isolation level Dirty read Unrepeatable read The phantom read
Uncommitted read Square root Square root Square root
Submit to read x Square root Square root
Repeatable read x x Square root
serializable x x x

The lock

Locking is a key feature that distinguishes database systems from file systems. The locking mechanism is used to manage concurrent access to shared resources.

The lock type

Shared Lock (S Lock)

Allows a transaction to read a row of data

X Lock

Allows transactions to delete or update a row of data

Intended shared Lock (IS Lock)

A transaction wants to acquire a shared lock for rows in a table

Intent exclusive lock

Transactions want to obtain exclusive locks for rows in a table

MVCC

The Multi-version Concurrency Control (MVCC) is a concrete way for MySQL’s InnoDB storage engine to implement isolation levels for both committed reads and repeatable reads. The uncommitted read isolation level always reads the latest row, without MVCC. Serializable isolation levels require locks on all read rows, which cannot be achieved using MVCC alone.

Basic concept

The version number

  • System version number: is an incrementing number that is automatically incremented with each new transaction started.
  • Transaction version number: the system version number at the start of the transaction.

Hidden columns

MVCC keeps two hidden columns after each row to store two version numbers:

  • Create version number: indicates the system version number when a snapshot is created for a data row.
  • Delete version: If the deleted version number of the snapshot is greater than the current transaction version number, the snapshot is valid. Otherwise, the snapshot has been deleted.

The Undo log

The snapshots used by MVCC are stored in the Undo log, which concatenates all snapshots of a Record through the rollback pointer.

The implementation process

The following implementation is for the repeatable read isolation level.

It is important to understand that when a transaction is started, the version number of the transaction must be greater than the creation version number of all current data row snapshots. The creation version of a row snapshot is the system version when the row snapshot is created. The system version increases with the creation of a transaction. Therefore, when a transaction is created, the system version of the transaction is larger than that of the previous system version, that is, the system version of all row snapshots is larger.

SELECT

Multiple transactions must read a snapshot of the same row, and the snapshot must be the closest valid snapshot to the present. The exception is that if a transaction is modifying the row, it can read the changes made by the transaction itself without matching the read results of other transactions.

A transaction that does not modify a row is called T. The snapshot of the row that T reads must have a creation version number less than or equal to the version number of T, because if the version number is greater than T, it indicates that the snapshot of the row is the latest modification of another transaction and therefore cannot be read. In addition, the deleted version number of the row snapshot to be read by T must be undefined or greater than T, because if the version number is less than or equal to T, the row snapshot has been deleted and should not be read.

INSERT

Use the current system version number as the creation version number of the data row snapshot.

DELETE

Use the current system version number as the deleted version number of the data row snapshot.

UPDATE

The current system version is used as the deleted version of the data row snapshot before update, and the current system version is used as the creation version of the data row snapshot after update. Perform DELETE first and then INSERT.

Snapshot read and current read

In the repeatable read level, the MVCC mechanism makes the data repeatable, but the data we read may be historical data, is not the current data of the database! This can be problematic in businesses that are particularly sensitive to the timeliness of data.

This method of reading historical data is called snapshot read, and the method of reading the current version of the database is called current read. Obviously, in MVCC:

Read the snapshot

The MVCC SELECT operation is the data in the snapshot and does not need to be locked.

select * from table... .;Copy the code

The current reading

MVCC other operations that modify the database (INSERT, UPDATE, DELETE) require locks to read the latest data. As you can see, MVCC does not completely eliminate locking, but simply avoids the lock operation of SELECT.

INSERT;
UPDATE;
DELETE;
Copy the code

When performing a SELECT operation, you can force a lock to be specified. The first statement below requires an S lock and the second requires an X lock.

- select * from table where ? lock in share mode;
- select * from table where ? for update;
Copy the code

The isolation level of transactions is actually defined as the current read level. To reduce the lock processing time (including waiting for other locks) and improve the concurrency capability, MySQL introduced the concept of snapshot read, so that select does not need to lock. Update, insert, and other “currently read” isolation needs to be locked to achieve.

Lock algorithm

Record Lock

Locks an index on a record, not the record itself.

If the table is not indexed, InnoDB automatically creates hidden clustered indexes on primary keys, so Record Locks are still available.

Gap Lock

Locks gaps between indexes, but not the indexes themselves. For example, when one transaction executes the following statement, other transactions cannot insert 15 in t.c.

SELECT c FROM t WHERE c BETWEEN 10 and 20 FOR UPDATE;
Copy the code

Next-Key Lock

It is a combination of Record Locks and Gap Locks, locking not only the indexes on a Record, but also the gaps between the indexes. For example, if an index contains the following values: 10, 11, 13, and 20, then the following ranges need to be locked:

(-Up,10]
(10.11]
(11.13]
(13.20]
(20.+Up)Copy the code

In the InnoDB storage engine, the problem of unrepeatable reads for SELECT operations is solved by MVCC, and the problem of unrepeatable reads for UPDATE and DELETE operations is solved by Record Lock. INSERT’s non-repeatable read problem is solved by next-key Lock (Record Lock + Gap Lock).

The lock problem

Dirty read

Dirty read refers to the fact that the current transaction can read uncommitted data from another transaction under different transactions.

Such as:

T1 modifies a piece of data, which T2 then reads. If T1 reverses the change, then T2 reads dirty data.

Unrepeatable read

Non-repeatable reads refer to multiple reads of the same data set within the same transaction, but the read data is different.

Such as:

T2 reads a piece of data, and T1 modifies it. If T2 reads this data again, it will not read the same data as it did the first time.

In the InnoDB storage engine, the problem of unrepeatable reads for SELECT operations is solved by MVCC, while the problem of unrepeatable reads for UPDATE and DELETE operations is solved by Record Lock. INSERT’s non-repeatable read problem is solved by next-key Lock (Record Lock + Gap Lock).

Phantom Proble

The so-called phantom problem occurs within a transaction when the same query produces different sets of rows at different times. For example, if a SELECT is executed twice, but returns a row the second time that was not returned the first time, the row is a “phantom” row.

Phantom Proble means that executing the same SQL statement twice in a row in the same transaction may return different results. The second SQL statement may return a row that did not exist before.

Phantom reading is a special unrepeatable reading problem.

Lost update

The update operation of one transaction is overwritten by the update operation of another transaction.

Such as:

Both T1 and T2 transactions modify a data, T1 first, T2 later, and T2 overrides T1.

This type of problem can be solved by placing exclusive locks on SELECT operations, although this may introduce performance issues, depending on the business scenario.

Sub-database sub-table data segmentation

The level of segmentation

Horizontal Sharding, also known as Sharding, splits records from the same table into multiple tables with the same structure.

Sharding is a natural choice when a table has an increasing amount of data, which can be distributed across different nodes in the cluster to cache the stress of a single database.

Vertical segmentation

Vertical sharding is to divide a table into multiple tables according to the columns, usually according to the intensity of the relationship between the columns. Vertical atmosphere can also be used to divide frequently used columns and infrequently used columns into different tables.

At the database level, vertical segmentation is used to deploy the database to inaccessible databases according to the density of tables in the database. For example, the original e-commerce data deployment database is vertically segmented into commodity database and user database.

Sharding strategy

  • Hash (key)%N
  • Range: Can be an ID range or a time range
  • Mapping table: Use a separate database to store mapping relationships

Problems with Sharding

Transaction issues

Use distributed transactions, such as XA interfaces

The connection

The original join can be broken down into multiple single-table queries and then joined in a user program.

uniqueness

  • Using globally unique ids (GUids)
  • Specify an ID range for each shard
  • Distributed ID generators (e.g. Twitter’s Snowflake algorithm)

copy

A master-slave replication

There are three main threads involved: binlog threads, I/O threads, and SQL threads.

  • Binlog thread: Responsible for writing data changes on the primary server to the Binary log.
  • I/O thread: is responsible for reading the -binary log from the primary server and writing to the Relay log of the secondary server.
  • SQL thread: Is responsible for reading the relay log, parsing out the data changes that have been made by the primary server and replaying them in the secondary server.

Reading and writing separation

The primary server handles write operations and reads with high real-time requirements, while the secondary server handles read operations.

Read/write separation improves performance because:

  • Master and slave servers are responsible for their own read and write, which greatly alleviates lock contention.
  • Secondary servers can use MyISAM to improve query performance and save system overhead.
  • Increase redundancy and improve availability.

Read/write separation is usually implemented by proxy. The proxy server receives the read/write request from the application layer and decides which server to forward the request to.

JSON

JSON data type is often used in actual business. There are two main requirements in the query process:

  1. There is a requirement in the WHERE condition to filter the returned results through a field in the JSON
  2. Query partial fields in json fields as return results (reduce memory footprint)

JSON_CONTAINS

JSON_CONTAINS(target, candidate[, path])

Returns 1 if the target value was found in the location path specified in the JSON field target, and 0 otherwise

Use JSON_CONTAINS_PATH() to check if data exists at the specified path.

mysql> SET @j = '{"a": 1, "b": 2, "c": {"d": 4}}';
mysql> SET @j2 = '1';
mysql> SELECT JSON_CONTAINS(@j.@j2.'$.a');
+-------------------------------+
| JSON_CONTAINS(@j.@j2.'$.a') |
+-------------------------------+
|                             1 |
+-------------------------------+
mysql> SELECT JSON_CONTAINS(@j.@j2.'$.b');
+-------------------------------+
| JSON_CONTAINS(@j.@j2.'$.b') |
+-------------------------------+
|                             0 |
+-------------------------------+

mysql> SET @j2 = '{"d": 4}';
mysql> SELECT JSON_CONTAINS(@j.@j2.'$.a');
+-------------------------------+
| JSON_CONTAINS(@j.@j2.'$.a') |
+-------------------------------+
|                             0 |
+-------------------------------+
mysql> SELECT JSON_CONTAINS(@j.@j2.'$.c');
+-------------------------------+
| JSON_CONTAINS(@j.@j2.'$.c') |
+-------------------------------+
|                             1 |
+-------------------------------+
Copy the code

JSON_CONTAINS_PATH

JSON_CONTAINS_PATH(json_doc, one_or_all, path[, path] …)

Returns 1 if data exists at the specified path, 0 otherwise

mysql> SET @j = '{"a": 1, "b": 2, "c": {"d": 4}}';
mysql> SELECT JSON_CONTAINS_PATH(@j.'one'.'$.a'.'$.e');
+---------------------------------------------+
| JSON_CONTAINS_PATH(@j.'one'.'$.a'.'$.e') |
+---------------------------------------------+
|                                           1 |
+---------------------------------------------+
mysql> SELECT JSON_CONTAINS_PATH(@j.'all'.'$.a'.'$.e');
+---------------------------------------------+
| JSON_CONTAINS_PATH(@j.'all'.'$.a'.'$.e') |
+---------------------------------------------+
|                                           0 |
+---------------------------------------------+
mysql> SELECT JSON_CONTAINS_PATH(@j.'one'.'$.c.d');
+----------------------------------------+
| JSON_CONTAINS_PATH(@j.'one'.'$.c.d') |
+----------------------------------------+
|                                      1 |
+----------------------------------------+
mysql> SELECT JSON_CONTAINS_PATH(@j.'one'.'$.a.d');
+----------------------------------------+
| JSON_CONTAINS_PATH(@j.'one'.'$.a.d') |
+----------------------------------------+
|                                      0 |
+----------------------------------------+
Copy the code

Actual use:

        $conds = new Criteria();
        $conds->andWhere('dept_code'.'in'.$deptCodes);
        if (!empty($aoiAreaId)) {
            $aoiAreaIdCond = new Criteria();
            $aoiAreaIdCond->orWhere("JSON_CONTAINS_PATH(new_aoi_area_ids,'one', '$.\"$aoiAreaId\ "')".'='.1);
            $aoiAreaIdCond->orWhere("JSON_CONTAINS_PATH(old_aoi_area_ids,'one', '$.\"$aoiAreaId\ "')".'='.1);
            $conds->andWhere($aoiAreaIdCond);
        }
Copy the code

The column – > path, the column – > > the path

Gets the value of the specified path

-> vs ->>

Whereas the -> operator simply extracts a value, the ->> operator in addition unquotes the extracted result.

mysql> SELECT * FROM jemp WHERE g > 2;
+-------------------------------+------+
| c                             | g    |
+-------------------------------+------+
| {"id": "3", "name": "Barney"} |    3 |
| {"id": "4", "name": "Betty"}  |    4 |
+-------------------------------+------+
2 rows in set (0.01 sec)

mysql> SELECT c->'$.name' AS name
    ->     FROM jemp WHERE g > 2;
+----------+
| name     |
+----------+
| "Barney" |
| "Betty"  |
+----------+
2 rows in set (0.00 sec)

mysql> SELECT JSON_UNQUOTE(c->'$.name') AS name
    ->     FROM jemp WHERE g > 2;
+--------+
| name   |
+--------+
| Barney |
| Betty  |
+--------+
2 rows in set (0.00 sec)

mysql> SELECT c->>'$.name' AS name
    ->     FROM jemp WHERE g > 2;
+--------+
| name   |
+--------+
| Barney |
| Betty  |
+--------+
2 rows in set (0.00 sec)
Copy the code

Actual use:

$retTask = AoiAreaTaskOrm::findRows(['status'.'extra_info->>"$.new_aoi_area_infos" as new_aoi_area_infos'.'extra_info->>"$.old_aoi_area_infos" as old_aoi_area_infos'].$cond);
Copy the code

Relational database design theory

Functional dependencies

A->B means that A depends on B, or B depends on A.

If {A1, A2… , An} is the set of one or more properties of the relationship. The set function determines all other properties of the relationship and is minimal. The set is called a key code.

For A->B, A->B is A partial function dependence if the proper subset A’ of A can be found such that A’-> B, otherwise it is A complete function dependence.

For A->B, B->C, then A->C is A transfer function dependency

abnormal

{Sno, Cname} -> {Sname, Sdept, Mname, Grade}, key code {Sno, Cname}. In other words, after identifying the student and the course, other information can be identified.

Sno Sname Sdept Mname Cname Grade
1 Students – 1 College – 1 Dean – 1 Course – 1 90
2 Students – 2 College – 2 Dean – 2 Course – 2 80
2 Students – 2 College – 2 Dean – 2 Course – 1 100
3 Students – 3 College – 2 Dean – 2 Course – 2 95

Relationships that do not conform to the paradigm will produce many anomalies, mainly including the following four:

  • Redundant data: for exampleStudents - 2It came up twice.
  • Modification exception: Information in one record is modified, but the same information in another record is not modified.
  • Deletion exception: if one message is deleted, other information will also be lost. For example, deleteCourse - 1I need to delete row 1 and row 3, soStudents - 1Information will be lost.
  • Insert exception: for example, if you want to insert a student’s information, if the student has not yet selected a course, then you cannot insert it.

paradigm

Paradigm theory is designed to deal with the four anomalies mentioned above.

The higher-level paradigm depends on the lower-level paradigm, and 1NF is the lowest level paradigm.

First Normal Form (1NF)

Properties are not separable.

Second normal Form (2NF)

Each non-primary property complete function depends on the key code.

It can be satisfied by decomposition.

Before the break

Sno Sname Sdept Mname Cname Grade
1 Students – 1 College – 1 Dean – 1 Course – 1 90
2 Students – 2 College – 2 Dean – 2 Course – 2 80
2 Students – 2 College – 2 Dean – 2 Course – 1 100
3 Students – 3 College – 2 Dean – 2 Course – 2 95

In the above student course relationship, {Sno, Cname} is the key code, which depends on the following functions:

  • Sno -> Sname, Sdept
  • Sdept -> Mname
  • Sno, Cname-> Grade

The Grade complete function relies on key codes, which do not have any redundant data, and each student has a specific Grade for each course.

Sname, Sdept, and Mname are all partially dependent on key codes, and when a student takes more than one course, these data will appear multiple times, resulting in a lot of redundant data.

After the decomposition

Relationship – 1

Sno Sname Sdept Mname
1 Students – 1 College – 1 Dean – 1
2 Students – 2 College – 2 Dean – 2
3 Students – 3 College – 2 Dean – 2

There are the following function dependencies:

  • Sno -> Sname, Sdept
  • Sdept -> Mname

Relationship – 2

Sno Cname Grade
1 Course – 1 90
2 Course – 2 80
2 Course – 1 100
3 Course – 2 95

There are the following function dependencies:

  • Sno, Cname -> Grade

Third normal Form (3NF)

Non-primary property transfer functions depend on key codes.

The following transfer function dependencies exist in relation -1 above:

  • Sno -> Sdept -> Mname

The following can be broken down:

Relationship – 11

Sno Sname Sdept
1 Students – 1 College – 1
2 Students – 2 College – 2
3 Students – 3 College – 2

Relationship – 12

Sdept Mname
College – 1 Dean – 1
College – 2 Dean – 2

ER figure

Entity-relationship consists of three parts: Entity, attribute, and Relationship.

It is used for conceptual design of relational database system.

Three connections of entities

There are three types: one-to-one, one-to-many, and many-to-many.

  • If A to B is one-to-many, then draw A line segment with an arrow pointing to B;
  • If it’s one to one, draw two line segments with arrows;
  • If it’s many-to-many, draw two line segments without arrows.

The relationship between Course and Student is one-to-many.

Represents relationships that occur more than once

The number of times an entity appears in a connection, it must be connected by several lines.

The figure below shows the prerequisite relationship of a Course. There are two Course entities in the prerequisite relationship, the first is the prerequisite Course and the second is the prerequisite Course. Therefore, two lines are needed to represent this relationship.

The multiplicity of associations

Although a teacher can teach more than one course and teach more than one student, for a particular student and course, only one teacher can teach, which constitutes a three-way connection.

Said a subclass

A triangle and two lines are used to connect classes and subclasses, so that properties and associations related to subclasses are connected to subclasses, and those related to both parent classes are connected to the parent class.

The resources

  • Jiang Chengyao. MySQL Technology Insider: InnoDB Storage Engine [M]. China Machine Press, 2011.
  • CS-Notes-MySQL
  • B+ tree
  • Red black tree, B(+) tree, hop table, AVL and other data structures, application scenarios and analysis, and some English abbreviations
  • B tree, B+ tree, red black tree, AVL tree comparison
  • 8.8.2 the EXPLAIN Output Format
  • The most official mysql Explain type field interpretation
  • 12.18.3 Functions That Search JSON Values

conclusion

This is some basic knowledge, I did not expect to review the majority of my forgotten, but also glad to have such a holiday to pick up again.

To tell the truth to do since the media I charge a lot less time and a lot less time to study technology stack depth, National Day holiday for a long time, I also think reflection back to continue to compress their spare time, such as mobile phones look at B standing time compression, or charge on time, the current law of work and rest is to achieve the early to bed and early to rise, together we refueling oh.


This article is constantly updated. You can search “Santaizi Aobing” on wechat and read it for the first time. Reply [Information] There are the interview materials and resume templates for first-line big factories prepared by me.