Reference since the problem: www.zhihu.com/question/44… Mysql, a table with 300 million data, undivided table, the requirement is to add a column of data to this large table. The database can not stop, and there are add, delete and change operations. How do I operate it? The answer is personally original

MySQL > add a column to MySQL

ALTER TABLE ADD COLUMN char(128);Copy the code

Will cause the lock table, simple process is as follows:

  1. Let’s create a new Table2 that’s completely isomorphic to Table1
  2. Alter TABLE Table1 add write lock
  3. Execute on table Table2ALTER TABLE ALTER TABLE ADD COLUMN char(128)
  4. Copy the data in Table1 to Table2
  5. Rename Table2 to Table1 and remove Table1, releasing all associated locks

If the amount of data is very, very large, the table lock will take a long time, during which all table updates will be blocked, and online services cannot be normally executed.

For MySQL 5.6 (not included), the trigger is used to repeat the update of one table on another table and synchronize the data. When the data synchronization is complete, the name of the table is changed and the new table is published. Business will not be suspended. Trigger Settings are similar to:

Create trigger person_trigger_update AFTER UPDATE on for each row begin set @x = "trigger UPDATE"; Replace into new table SELECT * from old table where old table. Id = old table. Id; END IF; end;Copy the code

MySQL 5.6 (included) later introduced online DDL functionality:

Alter table, your watch ALGORITHM [=] {the DEFAULT | INSTANT | INPLACE | COPY}, LOCK [=] {the DEFAULT | NONE | SHARED | EXCLUSIVE}Copy the code

Parameters:

  • ALGORITHM:
    • DEFAULT: The DEFAULT mode. In MySQL 8.0, if ALGORITHM is not displayed, INSTANT ALGORITHM is preferred. If INPLACE ALGORITHM is not available, COPY ALGORITHM is used
    • INSTANT: 8.0 new added algorithm, add column is returned immediately. But it cannot be a virtual column. This principle is very simple, to create a new column and table all the original data and not change immediately, just inside the table dictionary to record this column and a default value, for the default Dynamic row format (in fact, the Compressed varieties), if the update this column is original data marked as deleted after appended to update the record. This means that the column space is not reserved in advance, and subsequent updates may often result in row record space changes. But for most businesses, it is the most recent record that changes, so it is not a problem.
    • INPLACE: directly modify the original table, do not copy the temporary table, can be modified one by one, does not generate a large number of Undolog and redolog, does not occupy a lot of buffer. You can avoid the IO and CPU cost of rebuilding tables while maintaining good performance and concurrency.
    • COPY: COPY to a temporary new table for modification. Due to record copying, a large number of Undolog and RedolOG will be generated and occupy a large amount of buffer, affecting business performance.
  • The LOCK:
    • DEFAULT: similar to the DEFAULT of ALGORITHM
    • NONE: No lock, allowing concurrent reading and updating of tables
    • SHARED: a SHARED lock that can be read but cannot be updated
    • EXCLUSIVE: Reads and updates are not allowed

Comparison of algorithms used for online DDL modifications supported by each version:

Reference Documents:

  • MySQL 5.6: dev.mysql.com/doc/refman/…
  • MySQL 5.7 :dev.mysql.com/doc/refman/…
  • MySQL 8.0 :dev.mysql.com/doc/refman/…

It can be done through:

Char (128), ALGORITHM=INSTANT, LOCK=NONE;Copy the code

Similar statement to achieve online increase field. It is best to specify ALGORITHM and LOCK so that DDL execution will have a clear idea of the impact on online business.

In the meantime, the process of executing online DDL is like:

Picture reference from:zhuanlan.zhihu.com/p/162073721

As you can see, metadata locking is required in the beginning. Metadata locking was introduced to mysql in 5.5, and there is a similar mechanism to protect metadata before, but the concept of metadata locking is not explicitly introduced. However, there is a significant difference in metadata protection between versions before 5.5 (for example, 5.1) and those after 5.5. 5.1 protects metadata at the statement level, while 5.5 protects metadata at the transaction level. The so-called statement level, that is, after the statement execution, regardless of whether the transaction is committed or rolled back, its table structure can be updated by other sessions; At the transaction level, metadata locks are released after the transaction ends.

After metadata lock is introduced, two problems are solved. One is transaction isolation. For example, in the repeatable isolation level, session A modiates the table structure during the two query sessions, and the two query results are inconsistent. Another problem is data replication. For example, when session A executes multiple update statements, another session B changes the table structure and submits it first. As A result, the slave performs ALTER first and then update again, resulting in replication errors.

If there are many transactions currently executing and you have a transaction that contains a large query, for example:

START TRANSACTION; Select count(*) from your table...Copy the code

Such similar transactions, which take longer to execute, also block.

Therefore, in principle:

  • Avoid big things
  • Make structural changes at the peak of business