Dry reference since: author: royal light flowing links: www.nowcoder.com/discuss/573… Source: Niuke.com

Function dependencies and database patterns

Functional dependencies

  • Name “function dependent” on student id: student ID → name
  • Complete function dependence: XF→ Y → Y → Y
  • Part of the function depends on: XP→ Y (not completely functional dependent, there is redundancy in X)
  • The transfer function depends on: XT→ Z (if Z depends on Y and Y depends on X)

In the picture below: scoreComplete function dependenceYu (student id, course name), name, department namePartial function dependenceYu Student id, deanTransfer function dependenceIn the student id.What exactly does the database paradigm 1, 2, 3 say? – Liu Wei’s article – Zhihu

Key (code)

  • In a table, if all attributes except K are completely functionally dependent on K
  • Candidate keys (referred to simply as keys); Primary key;
  • Primary property: appears in any candidate key;
  • Non-primary property: Not a primary property

Database paradigm

  • 1NF: Each attribute is non-divisible (columns are not)
  • 2NF: Through “mode decomposition”, the partial function dependence of non-primary attributes on code is eliminated
  • 3NF: On the basis of 2NF, the transfer function dependence of non-primary attributes on codes is eliminated
  • BCNF: Removes the dependence of the main attribute (another candidate code) on the part of the code and the transfer function on the basis of 3NF

ER figure

  • Base (solid rectangle; Attribute circle; Relation diamond)
    • How to draw er graph exactly… ? – Answer to yitu icon – Zhihu
  • Advanced (weak entity; Bridging entity; Compound attribute; Multi-valued attribute; Derived attribute; Optional attributes; Association properties)
    • Database ER diagram basic concept collation

2 the index

Index features

A table can have any number of indexes, and each index can be a combination of any number of fields. Indexes may speed up queries if they are used, but they certainly slow down writes. Because the data of the column values needs to be copied again, it also increases the storage overhead. On shorter tables, index lookups may not be as fast as sequential lookups. On long tables, indexes may not fit in memory, which may also cause problems.

  • Index advantages: 1. Natural sort, 2. Speed up the search.
  • Disadvantages of indexes: 1. Take up space, 2. Reduce the speed of updating tables.

Classification of indexes (by form)

MySQL MySQL MySQL MySQL MySQL MySQL MySQL MySQL MySQL MySQL MySQL MySQL MySQL MySQL MySQL

  • Single-column index: An index that contains only one field is called a single-column index.
  • Composite index: An index that contains two or more fields is called a composite index (or composite index). When building composite indexes, the order of the fields is extremely important.
  • Unique index: Unique index columns must have unique values, but empty values are allowed. If it is a composite index, the combination of column values must be unique. Check each time data is added using an INSERT or UPDATE statement. After a non-unique index matches a query value, it needs to check whether the next value meets the conditions.
  • Primary key index: unique + non-empty
  • Clustered index: unique + non-empty (default is primary key. Each table has a unique clustered index)

Classification of indexes (by storage)

What is an indexed back table and how can it be avoided? – The answer of the wild dove Cicada – Zhihu

  • Clustered indexes: The order of clustered indexes is the physical order of data on the hard disk. Therefore, only one cluster index is allowed for a table. The default is primary key. If no primary key is defined in the table InnoDB selects a unique non-empty index instead. If not, implicitly create a primary key.
  • Non-clustered index: also called secondary index, secondary index. If the index does not contain all the fields to be searched, it needs to search for the fields in the clustered index based on the primary key value stored in the index leaf node. This process is also called table back.

Overwrite indexes, back table queries, and left-most prefix principles

  • Table back: If the index does not contain all the fields to be searched, it needs to search for the required fields in the clustered index based on the primary key value of the index leaf node. This process is also called table back.
  • Overwrite index: The secondary index information contains all the information required by the query, and the secondary index directly returns the information without going through the table query, which can greatly improve the query speed. It is not an index type, but an optimized query method. In general, the purpose of using a composite index is to overwrite the index and reduce the number of returns to the table.
  • Left-most prefix rule: a compound index can be used if the query condition matches exactly one or more consecutive columns to the left of the index. The match stops when a range query (>, <, between, like) is encountered.
  • Index tree scan: Select part of a joint index (select C; Select a,b; select a,c; Select a, B,c), may use index type scan, which can avoid back table, but is far less efficient than normal index query. Different from all, index scans the index tree. All scans the data on disks

The data structure of the index

  • Interviewer: Why do MySQL indexes use B+ trees instead of other trees? Like a B tree?
  • Why use B+ trees
    • Compared with b tree/binary tree,B tree can store more nodes per node and has lower levels.
    • Compared to b tree, each query must reach the leaf node, the query is more stable
    • In contrast to b tree/binary tree /Hash, leaf nodes have bidirectional linked lists to facilitate range queries.
    • Hash index has absolute advantage in equivalent query, but it cannot be used for range query, sorting, and does not support the leftmost prefix matching principle.
  • Pros and cons of B trees and B+ trees: Don’t spray a B tree worthless
    • Why do Mongodb indexes use B trees while Mysql indexes use B+ trees? – Lonely smoke article – Zhihu
    • When doing a single data query, the average performance of using B tree is better. However, since there are no adjacent Pointers between nodes in the B-tree, the b-tree is not suitable for data traversal. In Mongodb, there are few range queries and more single data queries, so B trees are used.
    • B+ tree data only appear on the leaf node, so the query speed is very stable when querying single data. Its average performance is inferior to that of B tree in single data query. However, there are Pointers on the leaf nodes of B+ trees, which are suitable for range query.

3 transaction

Programmer, know the principle of ACID in Mysql? – Lonely smoke article – Zhihu

Atomic Atomicity

A transaction (COMMIT) is an indivisible unit of work in which all or none of the operations are performed. That is, either the transfer succeeds or the transfer fails, there is no intermediate state.

  • Undo log: Rollback log, which is the key to achieving atomicity. When a transaction is rolled back, it can undo all SQL statements that have been successfully executed.
  • Timing of rollback: The transaction called ROLLBACK or failed due to unexpected reasons.

Consistency Consistency

Consistency refers to the fact that the data is in a legitimate state, semantically rather than syntactically, before and after a transaction is executed. For example, the primary key cannot be empty, referential integrity, and the balance must be positive.

  • At the database level, AID must be implemented to achieve consistency. A) to B) to C) to D) to
  • At the developer level, be careful not to write bugs that violate constraints in transactions. (If account B is intentionally not added in the transfer)

Isolation, the Isolation

When multiple transactions are executed concurrently, the operations inside the transaction are isolated from other transactions, and the concurrently executed transactions cannot interfere with each other.

  • MVCC and Next-key Lock are Innodb’s tools for ACHIEVING RR-level consistency.

Persistence Durability

Once a transaction commits, its changes to the database should be permanent. Subsequent operations or failures should not affect it in any way. (Commit the memory operation, but suddenly down, all the data changed in memory is lost, but the database is not changed)

  • When data is modified, it is not only in memory, but also in the redo logRecord this operation.
    • The redo log only records the changes that transactions make to the data page. The file size is smaller and has sequential I/OS, and it can also be stored in memory before writing to disk. In short, better than real-time update brush table.
  • When the database is restarted, changes to committed transactions in the redo log are written to the data file.

Isolation: Several types of problems caused by concurrent transactions

Software Architecture Design: The Fusion of Technical Architecture and Business Architecture for Large Websites — Knowledge on MySQL transaction isolation level and implementation principle (just read this article!) – Lao Liu’s article – Zhihu

  • Dirty read: Read data not committed by another transaction (may be rolled back, may not be stored in the database)
  • Non-repeatable read: the same batch of data is read at different times within the same transaction (e.g. the batch of data has been changed and committed by another transaction)
  • Phantom read: Two queries within the same transaction return different result sets. Suppose transaction A changes the contents of some rows, at which point transaction B inserts and commits the same record row as the record before transaction A changed. When you query in transaction A, it looks as if the change you just made did not take effect.
  • Lost update: Two transactions modify the same record at the same time, and one is overwritten by the results of the other. This requires developers to manually lock the solution.

Transaction isolation level

  • Read uncommitted: The read uncommitted isolation level is unlocked, so it provides the best performance. But it does not solve any transaction concurrency problems.
  • Read Committed: Read only committed data. Solve the dirty read problem. However, it is still possible for other transactions to commit midway, resulting in unrepeatable reads.
  • Repeatable read: A transaction cannot read changes made to existing data by other transactions, even if other transactions have committed. Generally, the isolation level corresponding to the MVCC database in RR is Snapshot isolation. In this case, phantom read will not occur.
  • Serialization: Read with shared lock, write with exclusive lock. Solve illusory problem, solve missing update problem.

MVCC

  • Each row in Innodb can have multiple versions, and the trx_ID record generates the transaction ID of this version.
  • These different versions of data do not exist physically, but are calculated dynamically each time using undo log.
  • At the heart of MVCC is an array of transaction ids that each transaction maintains itself.

MVCC implements repeatable snapshot reads

  • If trx_id is less than the low watermark, the version was committed before the transaction started.
  • If trx_id is greater than the high watermark, this version is generated after the transaction is started and is not visible.
  • If trx_id is greater than the low water level and less than the high water level, there are two cases:
    • If trx_id is in the array, this version has not been committed at the time of transaction startup and is not visible.
    • If trx_id is not in the array, this version is already committed when the transaction is started.
  • How does MySQL implement repeatable reads? – Super super can’t fly article – Zhihu

Repeatable read vs read committed

  • Repeatable reads generate a snapshot at the time of the transaction’s first snapshot read
  • In read submission, a snapshot is generated every time a snapshot is read.

MySQL phantom read problem at RR isolation level

  • Mysql in RR isolation level is resolved by MVCC or next key algorithm. – Answer by Deng Xiaoxian – Zhihu
  • Since InnoDB uses MVCC in MySQL, why REPEATABLE-READ can’t eliminate phantom READ? – Answer by IN355Hz – Zhihu

MySQL does not implement strict Snapshot isolation, which allows concurrent transactions to update committed data. As a result, MySQL raises exceptions that other MVCC databases do not. That is, the current read can be performed after the snapshot read and the version of the snapshot image is updated. In this case, different data may be read from the previous snapshot read. And snapshot reads are performed after this, which is the updated version of the read. Strictly speaking, this is not a magic reading problem, but a multi-version problem.

mysql> SELECT * FROM char_encode WHERE glyph = 'a'; + -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- + | glyph | codepoint | + -- -- -- -- -- -- -- + -- -- -- -- -- -- -- -- -- -- -- + | a | 97 | + -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- - + 1 row in the set (0.03) SEC) // call SELET FOR UPDATE implicitly. The current reading. mysql> UPDATE char_encode SET codepoint = codepoint + 1 WHERE glyph -> = 'a'; Query OK, 1 row affected (0.07 SEC) Rows matched: 1 Changed: 1 Warnings: Mysql > SELECT * FROM char_encode WHERE glyph = 'a'; mysql> SELECT * FROM char_encode WHERE glyph = 'a'; +-------+-----------+ | glyph | codepoint | +-------+-----------+ | a | 101 | +-------+-----------+Copy the code

4 lock

Optimistic locks and pessimistic locks

Optimistic and pessimistic locks are concepts, not concrete locks.

  • Pessimistic lock: Lock before reading. However, it can cause deadlocks and is inefficient in the case of high concurrency. Such as row locks, table locks, read locks, write locks, etc., are locked before the operation.
  • Optimistic lock: leave the lock open before reading, wait until the update time to determine whether the original data has changed (version change), if so redo. This can be implemented using the version number mechanism and CAS algorithm. Optimistic locking is suitable for multi-read applications, otherwise frequent rollback is inefficient.

Sort by lock granularity

  • Table locks: DDL statements use this granularity. In addition, intent locks are added to the table.
  • Row locking: Locks a record, usually using this granularity.
  • Gap Locks: Locks a range. Prevent other transactions from inserting or modifying records in this range so that phantom readings do not occur.
  • Next-key Lock: record Lock +Gap Lock. Left open right closed interval.
  • Insert intentional Locks (II Gap Locks) : Conflicts with Gap or next-key Locks. It is the INSERT intent lock that is required to execute an INSERT statement, and the INSERT intent lock conflicts with the gap lock, preventing the INSERT operation from being performed.

According to the nature of the lock classification

  • Shared lock: Add for update to all Select to implement read lock (normal Select uses snapshot, no lock. Lock only when reading the latest version of a non-snapshot)
  • Exclusive locking: Write locking is implemented by adding lock in share mode to all Select.
  • Intention Locks: Locks placed inside a table, Intention Locks of the same type placed on the table. This allows other transactions that want to lock the entire table to be informed of the existence of the row lock in advance and to anticipate compatibility.
  • AUTO_INC locks are incompatible, meaning that only one AUTO_INC lock can be added to a table at the same time. Self-increment is +1 once allocated and does not go back if the transaction is rolled back, so self-increment may be interrupted.

The principle of locking

  • MySQL locks and their principles are drawn – Java architects want to quiet article – Zhicho.com # TODO: The analysis of this article is very detailed. But it's not recommended to watch it now when time is tight. Certainly not.
  • MySQL transaction isolation levels and implementation principles (just read this article!) – Lao Liu’s article – Zhihu

In Mysql, row-level locking does not directly lock records, but locks indexes.

  • If a statement operates on a primary key index: Mysql locks the primary key index;
  • If a statement operates on a non-primary key index: MySQL locks the non-primary key index and then the related primary key index.
  • If there is no index: InnoDB locks all data in the table by hiding the clustered index, which works like a table lock. Because there is no index, to find a record, you have to scan the entire table, and to scan the entire table, you have to lock the table.

The necessary conditions for deadlock formation

  • Mutually exclusive condition: Only one process can use a resource at a time.
  • Request and hold conditions: processes hold at least one resource waiting to acquire additional resources held by other processes.
  • Non-deprivation condition: a resource can only be released voluntarily after the process has completed its task.
  • Cyclic waiting condition: N processes exist, and the resources used for cyclic waiting.

Avoid deadlock

# TODO: Read the textbook, remember the relevant content, in order to add, add all the locks at one time, etc

5 Specific language usage

See for yourself: www.runoob.com/sql/sql-cre…

like

Selects all customers whose names start with the letter "G". The "%" symbol is used to define wildcards before and after the patternSELECT * FROM Websites
WHERE name LIKE 'G%'; The followingSQLStatement selects all customers whose names end with the letter "k" :SELECT * FROM Websites
WHERE name LIKE '%k'; The followingSQLStatement selects all clients whose name contains pattern "oo" :SELECT * FROM Websites
WHERE name LIKE '%oo%'; Underscore wildcard that matches any characterSELECT * FROM Websites
WHERE name LIKE 'G_o_le'; Use REGEXP to write the followingSQLStatement select name**Don't to**Sites starting with the letters A through H:SELECT * FROM Websites
WHERE name REGEXP '^[^A-H]';
Copy the code

COUNT

-- Queries the number of all records
select count(*) from access_log;

Query the number of records that are not empty in the Alexa column of websites
select count(alexa) from websites;

Alter table websites select * from country
select count(distinct country) from websites;
Copy the code

Drop, TRUNCate, and DELETE

  • Delete:
    • The DELETE statement is the Data Maintain Language (DML). The operation is placed in the rollback segment and does not take effect until the transaction is committed. If there is a corresponding tigger, it will be fired at execution time.
    • Use DROP and truncate with caution. To delete some rows, use delete and be careful to constrain the scope of influence with WHERE. The rollback section should be large enough.
    • For tables referenced by a FOREIGN KEY constraint, use DELETE. Since TRUNCATE is not logged, triggers are not activated.
  • Drop:
    • Delete table data and table structure. Free up all space occupied by the table.
  • Truncate:
    • Only the data in the table is deleted, but the table itself is not deleted. After a table is TRUNCATE, the space occupied by the table and index is restored to the initial size.
    • TRUNCATE applies only to tables, not views
    • Truncate and DROP are DATA define languages (DLLS). The operation takes effect immediately. The original data is not added to the rollback segment and cannot be rolled back. Delete triggers associated with tables are not activated. Fast execution speed.
  • MySQL — Drop, TRUNCate and delete

ORDER BY

SELECT * FROM Websites
ORDER BY country ASC,alexa DESC;

order byA and B are in ascending order by defaultorder by A descB is in descending order and ascending orderorder by A ,B descA is in ascending order and B is in descending orderCopy the code

GROUP BY

SELECT Websites.name, COUNT(access_log.aid) AS nums 
FROM access_log LEFT JOIN Websites
ON access_log.site_id=Websites.id
GROUP BYWebsites.name; According to theGROUP BYThe computed aggregate function does not have fieldsGROUP BYYou compute the aggregate function as a whole as a large groupCopy the code

HAVING

SELECT Websites.name, SUM(access_log.count) AS nums 
FROM Websites INNER JOIN access_log
ON Websites.id=access_log.site_id
WHERE Websites.alexa < 200 
GROUP BY Websites.name
HAVING SUM(access_log.count) > 200; inSQLaddHAVINGThe reason is that,WHERECannot be used with aggregate functions.HAVINGClauses allow us to filter the grouped groups of dataCopy the code

JOIN all kinds of joins

  • Self-join: A single table, using multiple aliases, ostensibly for multi-table join queries
SELECT b.lastStation,b.nextStation,a.lastStation,a.nextStation 
FROM bus_sche a, bus_sche b 
WHERE b.nextStation = a.lastStation;
Copy the code
  • In the connectionQuery operationOnly matches to join conditions are listedINNER JOIN or JOIN directly. An inner join can have no join condition and will retain all results (Cartesian product), much like a later shared cross join
    • Contour connectionSELECT * from Table_A A JOIN Table_B B ON A.id = B.id;
    • Before connectingSELECT * from Table_A A JOIN Table_B B ON A.id > B.id;
    • Natural connection:SELECT * from A NATURAL JOINB WHERE A.id = B.id;
      • Delete duplicate columns from the join table using the ‘=’ operator
      • A. where B. on C. where D. where
      • WHERE is usually less efficient than ON in join queries. ON matches until the first one, and WHERE matches all the way through.
  • Cross connection
    • Without the WHERE clause, it returns the Cartesian product of all rows of the two tables being joined
    • SELECT * from Table_A CROSS JOIN Table_B;
  • Outer join
    • The outer join does not just list the rows that match the join condition
    • All rows in the left (left-outer join) or right (right-outer join) table or both (full-outer join) tables that match the search criteria are also added.
    • Null is set for fields that cannot match other tables

Execution sequence of various keywords

SQL writing order and execution order – Zhang Xiao’s article – Zhihu

(8) SELECT (9)DISTINCT<Select_list>
(1) FROM <left_table> (3) <join_type>JOIN<right_table>
(2) ON<join_condition>
(4) WHERE<where_condition>
(5) GROUP BY<group_by_list>
(6) WITH {CUBE|ROLLUP}
(7) HAVING<having_condtion>
(10) ORDER BY<order_by_list>
(11) LIMIT<limit_number>
Copy the code
  • Do JOIN first, use ON filter, if external JOIN then add back
  • WHERE filter records
  • GROUP BY GROUP, HAVING filter GROUP
  • SELECT a column from a list
  • Finally, ORDER BY and LIMIT are optional

The data type of mysql

  • Three categories: numerical value; The string; Date/time
  • What are the data types in the Mysql database? 51Testing software – Zhihu article

Optimize means and bulk experience

Query slowness and resolution

(Explain, slow query logs, show profile, etc.)

  • Explain: What you need to know before an interview with MySQL [explain] – Java3y
  • Slow query logs: Slow query logs can effectively trace query statements that take too long to execute or do not use indexes, helping optimize query statements. MySQL Core Technology and Best Practice — Zhihu
  • Show profile: Find the section that takes the longest time and has the lowest performance, that is, the slowest, to analyze why slow SQL is so slow. SQL performance analysis tool show profile – Ju Qian’s article – Zhihu

Explain the parameters of the

This time is really saved me, MySQL index optimization, explain very clear – Java program apigu article – Zhihu

  1. Id: The larger the number is, the earlier it is executed. If the number is the same as the number, the earlier it is executed
  2. select_type:
    • Simple: A simple SELECT statement that does not require a union and does not contain subqueries.
    • Primary: If the SQL statement contains any subqueries, the outermost layer of the subquery is marked as primary
    • SubQquery: If a subquery is included in a SELECT or WHERE, the subquery is marked as subQquery

    • Derived: Subqueries contained in from are marked as derived queries and put the query results into a temporary table

    • Union/union result: If you have two SELECT statements and they are joined by union, the second select will be marked as union and the result of the union will be marked as union result.

  3. Table: the name of the table. If an alias is used, the alias is displayed
  4. Type: indicates the index type used by MySQL. Different index types have different query efficiency. Good to bad: system, const, eq_ref, ref, fulltext, ref_or_NULL, unique_subquery, index_subquery, range, index_merge, index, ALL(no index)
  5. Possible_keys: Possible_keys will be listed if indexes exist on the fields involved in this query, indicating possible, but not necessarily necessary, indexes.
  6. Key: indicates the index used by the query
  7. Key_len: indicates the maximum possible length of the index used to process the query. The shorter the length, the more efficient the query is without sacrificing accuracy
  8. Ref: Displays the associated fields. If a constant equivalence query is used, const is displayed, and if a join query, the associated fields are displayed.
  9. Rows: The estimated number of scanned rows in the execution plan, not the exact value.
  10. Filtered: Indicates the percentage of records that satisfy the query after data returned by storage engines are filtered at the Server layer.
  11. extra: The field contains more information.
    • Using fileSort is not good
    • Using temporary is not good
    • USING index good

Add index fair use

  • www.cnblogs.com/orchidbaby/…
  1. If the table is small, it may not even need to be indexed. The index needs to be maintained when the record is modified, so there is an overhead to measure the gains and losses after the index is built (space + maintenance time).
  2. Which fields can be indexed? The following fields are where, ORDER by, or group by.
  3. forSelect * from students where name=' zhang3 'and age=18;Cases:
    1. Create separate indexes for name and age: Generally, mysql will select one of the two indexes. Name is more likely because mysql will count the number of repetitions on each index and select fields with low repetitions.
    2. A combined index of name and age: this index matches best. However, compared with a single index, the maintenance cost is high, and the index data occupies a larger storage space. But!!!! Is it necessary to use federated indexes? Generally not necessary: the school has 10,000 students, named Xie Chunhua will be more than 5? Finding one of five is much cheaper than building a joint index.
  4. When is it good to use a federated index? For example, a college course needs to create a relational mapping table with two fields,student_idteacher_id“, want to query whether there is a relationship between a teacher and a student.
    • A student would choose 50 teachers, and a teacher would lead 200 students.
    • If only student_ID is used to create an index, 50 records are selected from the index and the rest of the records are removed from the memory. In contrast, if teacher_ID is indexed only, 200 records will be selected and the rest of the students will be removed in memory.
    • In both cases, it is not optimal because the range is still large after using the index. In this case, it is most appropriate to use the joint index. The efficiency is almost doubled by directly finding the corresponding record through the index.

Big table optimization

  • How to optimize MySQL for large tables with tens of millions of levels? – Zhuqz’s answer – Zhihu
  • First, optimize your SQL and indexes; Second plus cache, memcached,redis;
  • Third, after all the above is done, or slow, master slave replication or master master replication, read and write separation, can be done in the application layer, high efficiency
  • Fourth, if all the above is still slow, do not think to do partitioning, mysql comes with a partition table, first try this, for your application is transparent, do not need to change the code, but SQL statements need to be optimized for partitioning table
  • Fifth if the above are done, it is first to do vertical split, in fact, according to your module coupling degree, will be a large system into a number of small systems, that is, distributed system;
  • For tables with large data volume, this step is the most troublesome and can test the technical level. It is necessary to choose a reasonable Sharding key. In order to have good query efficiency, the table structure should also be changed to make certain redundancy, and the application should also be changed. Instead of scanning all tables;

Redis

Why use Redis when TODO needs to be perfected? Why is Redis so fast? – Fat dog’s article – Zhihu

  1. Release subscription
  2. Data elimination mechanisms (several)
  3. Dictionary and progressive Rehash
  4. RDB and AOF
  5. Why efficient (in-memory database, non-blocking IO, IO multiplexing, single thread, hash table, hop table, etc.)
  6. Several data structures of Redis