1. The background

As the basic service of social applications, IM is responsible for instant chat, group chat, live broadcast, multi-voice and other functions. Instant messaging is one of the most important in many types of functional modules, the need to maintain two user’s session (many-to-many relationships, with broad user level exponentially growing session relations, session information interface call frequency exponentially, because application project is in a hurry, for application development speed forecast more pessimistic, In addition, limited human resources at the early stage did not introduce sub-database and sub-table. As the data of single table grows exponentially, the response speed becomes slower visible to the naked eye, and the optimization of session table is imminent

Relevant concepts

  • The session

  • Pull the session

2. Current situation of the problem

Assuming that the number of sessions per active user is 10, the total number of sessions is about 1000 when the daily activity is 100. Live on the same day arrived in millions, daily conversation is a large number of orders of magnitude, in the actual business scenarios in dialogue and the session list as long as the user without the current chat session would call interface (QPS for 1000, rt is about 400 ms) session information, called high frequency response speed slow lead to the user experience is very poor, This interface often causes the server CPU to be in a high load state for a long time during the peak user activity period, or directly causes the entire system to break down in serious cases

3. Solutions

Trouble spots

  • Uid is the id of the session initiator and to_uid is the ID of the session receiver. In actual services, UID plays the role of the payer and to_uid plays the role of the beneficiary. It costs a certain amount of money for uid to chat with to_UID. In such a business context, the UID and to_UID table structures are designed. Although this structure meets the business requirements, there are still business defects, such as: Query session information of user 1(id = 1), user 2(id = 2), and user 3(id = 3). The uid of the two users cannot be determined

// SQL

SELECT * FROM chat_table 


    (uid = 1 AND to_uid IN (2.3)) 


    (uid IN (2.3) AND to_uid = 2);
Copy the code
  • Because it is a single table and the amount of data reaches 3000W +, only SQL query takes about 300ms, plus the service processing time, a single interface call takes 400ms+
  • The optimization process can not stop the maintenance, can not affect the normal use of users

Optimization idea

  • Since the original table structure could not determine which uid was given two user ids, the only way to query both sessions was through OR. Select * from splic_user_id; select * from splic_user_id; select * from splic_user_id; That is, the purpose of quickly identifying a session by uid and to_UID is achieved

  • After the optimization idea is clear, it is necessary to consider that the maintenance can not be shut down. Directly adding fields to the table through DDL statements will lock the table and lead to the direct breakdown of IM services. How to complete the optimization without stopping the maintenance and affecting services?

    • Solution a: Create a new table (chat_table_new) with splic_user_ID (splIC_user_ID) and write data to chat_table_new in real time
    • Solution 2: Find a tool that can modify the table structure online without locking the table, so that no code changes are required and the cost is small

Percona-toolkit is a tool that allows you to modify the MySQL table structure online without locking the table

Scheme comparison


Percona-toolkit is derived from Maatkit and Aspersa, two of the most famous tools to manage MySQL, but Maatkit is no longer maintained, all merged into Percona-Toolkit. Percona Toolkit is a set of advanced command line tools used to manage MySQL and system tasks. It includes:

  • Verify consistency between the master node and the replicated data
  • Effectively file the line of record
  • Find duplicate indexes
  • MySQL Server
  • Analyze queries from logs and tcpdump
  • Collect important system information when a problem occurs
  • Modify the table structure online

In the context of continuous service iteration, DDL operations on tables, such as modifying, adding, and deleting fields and indexes, are inevitable. DDL is a function that needs to be used with great caution for MySQL, because tables will be locked when DDL is performed in MySQL. The greater the amount of table data, the greater the impact. Before 5.1 DDL was very time-consuming, after 5.1 with the advent of Plugin InnoDB online indexing speed has improved a lot, but there are still impacts (time reduced); The above situation can be avoided in 5.6, currently InnoDB engine DDL through the following steps:

  • Create a temporary table (tMP_TABLE) that is not visible according to the original table structure and DDL statement.
  • Block all update operations (INSERT, delete, update, etc.) by placing a write lock on the original table
  • performinsert into tmp_table select * from original_table
  • Rename original_table and tmp_table, and finally drop original_table
  • Release write Lock.

We can see that when InnoDB performs DDL, the original table can only be read but not written. For this purpose, Perconal has introduced a tool, PT-online-schema-change, which features no read and write blocking during the modification process.

pt-online-schema-changeThe working principle of

  • If there are foreign keys, according toalter-foreign-keys-methodParameter value, detect foreign key related table, do the corresponding setting processing. The tool does not execute without specifying a specific value with alt-foreign-keys-method
  • Create a new empty table named underline + old table name +_new— –_ Old table name _new
  • Update the table structure of the new table according to the ALTER statement;
  • Create triggers to record the data modification operations on the source data table after the data copy is started. After the data copy is complete, these operations can be performed to prevent data loss. The tool will not work if triggers are already defined in the table.
  • Copy data from the source table to the new table.
  • Modify subtables associated with foreign keys. Modify the subtables associated with foreign keys based on the modified data.
  • If the source TABLE rename is an old TABLE, rename uses the new TABLE as the name of the source TABLE. The new TABLE processes two tables simultaneously using a RENAME TABLE to implement atomic operations. (RENAME TABLE dbteamdb. chat_table TO dbteamDB. _chat_table_old, dbteamDB. _chat_table_new TO dbteamdb. chat_table)
  • Drop old table, drop trigger.

  • The new scheme directly hits the index, can achieve accurate query, query speed is very fast
- new

SELECT * FROM chat_table WHERE splic_user_id IN ('1 _2'.'1 _3'); 
Copy the code

4. Effect comparison

Before optimization:

After optimization: query speed increased by more than 10 times

5. To summarize

  • Encountered problems as far as possible not behind closed doors, hard to think, can first look for other people have encountered similar problems on the Internet, so that you can avoid falling into the pit of deeper and deeper, standing on the basis of previous experience to solve the problem can get twice the result with half the effort
  • Practice has proved that percona-Toolkit real-time modification of the table structure is really powerful, tens of millions of data tables to add fields, no impact on business