First, write at the beginning

Vivo cloud service provides users with the ability to back up mobile phone contacts, SMS, notes, bookmarks and other data, the underlying storage uses MySQL database for data storage.

With the development of Vivo’s cloud service business, the number of cloud service users is growing rapidly, and the amount of data stored in the cloud is increasing. Massive data brings great challenges to back-end storage. The biggest pain point of cloud service business in recent years is how to solve the storage problem of massive data of users.

Second, facing challenges

From 2017 to 2018, the core indicators of cloud service products focused on increasing the number of users. Cloud service has made a major adjustment in the product strategy. The data synchronization switch of cloud service is enabled by default after users log in vivo account.

This product strategy has brought explosive growth to the number of cloud service users. The number of users directly jumps from millions to tens of millions, and the amount of data stored in the back-end also leaps from tens of billions to hundreds of billions.

In order to solve the problem of massive data storage, cloud services will be divided into four boards: horizontal table, vertical table, horizontal database, vertical database, all carried out practice.

1. Level table

** The thorns road 1: ** browser bookmarks, notes sheet library single table, single table data volume has exceeded 100 million how to do?

I believe that the brothers who have understood the knowledge system of sub-database and sub-table will soon be able to answer: the amount of data in a single table is too large to carry out sub-table. We did the same thing, splitting the browser bookmarks/notes module table into 100 tables.

Migrate the browser bookmark and note sheet tables into 100 sub-tables, each table carrying 1000W of data.

This is everyone familiar with the first board axe: level points table.

! [](https://static001.geekbang.org/infoq/e3/e366bf90f6a5cdb060f5c6c68f66a162.webp)

2. Horizontal branch library

The thorn Road 2: The contact and SMS data have been divided into tables, but only 50 tables were divided at the beginning, and the database was not divided. After the explosive growth of the number of users, the total number of single database contacts has reached several billion, and the amount of single table data has reached 5000W. The continued growth will seriously affect the performance of mysql. How to do?

The second plate axe, horizontal branch library: 1 library can not support, it is divided into several libraries. We split the original single database into 10 libraries, and expanded the original single database contacts and SMS 50 tables to 100 tables. During the same period, billions of stock data were migrated and re-routed, which was very painful.

! [](https://static001.geekbang.org/infoq/84/846e7db590d1cdd3100e959783c60f28.webp)

3, vertical branch library, vertical branch table

** Initially the data storage of the various modules of the cloud service was jumbled together.

When there is a space bottleneck, we analyze the storage space distribution of data in each module, and the situation is as follows:

The disk capacity of a single database is 5 TB, the storage space of contact data is 2.75 TB (55%), the storage space of SMS data is 1 TB (20%), the storage space of all other modules is 500 GB (5%), and the remaining available space is 1 TB. Contact and SMS data occupy 75% of the total storage space.

The remaining 1T space capacity cannot support the continuous growth of user data, and the situation is not optimistic. If there is not enough space, all modules will be unavailable due to space problems, what to do?

(The following figure shows the data storage space distribution of cloud service at that time)

! [](https://static001.geekbang.org/infoq/8f/8f84a159c291501f3c428f53091edd8d.webp)

** We decouple the storage of contact data, SMS data and other module data. Contact data, SMS data are separated into separate libraries.

! [](https://static001.geekbang.org/infoq/3c/3c78da46d13f870f36b5a502a538c899.webp)

At this point, cloud services will be sub-database sub-table of the 4 board axe all practice again, data the demolition of the demolition, the points of the points.

4. Dynamic capacity expansion scheme based on routing table

** From the above description, we know that the split contact database adopts the strategy of fixed 10 databases. In the preliminary evaluation, 10 databases *100 tables can meet the demand of business data growth. We thought we could rest easy, but the growth rate of contact data exceeded expectations.

Nine months after the separate split of the contact database, the storage space of the individual library increased from 35% to 65%. At this rate of growth, another six months will see the separate, splintered contact database run out of space again.

How to solve it? It is certain to continue expansion, and the core point is which expansion strategy to adopt. If the conventional expansion scheme is adopted, we will face the relocation and re-routing of massive storage data, which will cost too much.

After communication and discussion by the technical team, combined with the characteristics of cloud service contact services (the number of contacts of old users is basically stable, not frequently adding a large number of contacts, and the growth rate of old users’ contact data is controllable), we finally adopted the dynamic capacity expansion scheme based on routing table.

The following are the features of the scheme:

  • Add user routing table, record user contact data specific routing in which library, which table;
  • The contact data of new users will be routed to the newly expanded database without causing data storage pressure on the existing database.
  • Old user data will not be moved, or stored in the original database.
  • The feature of this scheme is to ensure that the old database only needs to ensure the data growth of the old users, and the new users are all carried by the newly expanded database.

Although the growth rate of old user contacts can be controlled, we expect the old database to reserve 60% of the storage space to support the growth of old user data. At present, the old library only has 35% available space, which does not meet our requirements.

In order to reduce the storage space occupied by the old database data, we naturally thought of data compression level.

Iii. Pre-study of compression scheme

Cloud service prestudies the following three schemes for database data compression:

Scheme 1: The program realizes data compression by itself and saves the data to the database after compression

Advantage:

There is no need for any transformation of the database, the modification is completely convergent by the program itself, and you can freely control the fields that need to be compressed.

Disadvantage:

It is necessary to develop additional compression tasks to compress the stored data, and the time consuming of data compression by programs is uncontrollable because the magnitude of the stored data is too large.

After data is compressed into the database, select query fields from the DB platform are no longer readable, which makes it more difficult to locate problems.

Scheme 2: Data compression capability of MySQL database InnoDB

Advantage:

Use InnoDB’s existing ability for data compression, no need to do any modification for the upper procedure, and does not affect the subsequent SELECT data query.

Disadvantage:

This method is suitable for large data volumes and few reads and writes, and is not suitable for services that require high query performance.

Option 3: Switch InnoDB storage engine to TokuDB and use the natural data compression capability of TokuDB engine

Advantage:

TokuDB naturally supports data compression and a variety of compression algorithms. It supports frequent data writing scenarios and has natural advantages for large data storage.

Disadvantage:

MySQL needs to install additional plug-ins to support the TokuDB engine, and the company currently has no business with mature use experience of TokuDB, so the risks after access are unknown, and the maintenance of subsequent DBAs is also a challenge.

After consideration, we finally decided to use the second compression solution: InnoDB’s own compression capability.

The main reasons are as follows:

  • ** Simple operation: ** By changing the file format of the existing InnoDB data table, the data can be compressed;
  • ** Compression speed controllable: ** After testing, a 2000W data table, through this way, can complete the data compression of the whole table in 1-2 days;
  • ** Low transformation cost: ** The whole transformation process only requires THE DBA to execute the relevant SQL, change the file format of the data table, and the upper program code does not need to do any change;
  • ** Suitable for cloud service business scenarios: ** User data backup and recovery do not belong to high performance and high QPS business scenarios, and most of the data tables of cloud services conform to the characteristics of a large number of string fields, which is very suitable for data compression.

Fourth, compression scheme verification

1. Introduction to InnoDB compression capability

As of MySQL 5.1.38, innoDB-base storage engine is available with Antelope as the default file format, which supports two row_formats: COMPACT and REDUNDANT, neither of which is a data compression type of row format.

Innodb-plugin was introduced after MySQL 5.1.38 and Barracude file format was also introduced. Barracude is fully compatible with Antelope file format and supports DYNAMIC and COMPRESSED line formats

! [](https://static001.geekbang.org/infoq/f9/f985bc76e33abecd4e179892777db1c9.webp)

2, compression environment preparation

Modify database configuration: Change the file format of the database, default is Antelope, change to Barracuda

SET GLOBAL innodb_file_format=Barracuda;

SET GLOBAL innodb_file_format_max=Barracuda;

SET GLOBAL innodb_file_per_table=1

Note: Innodb_file_per_table must be set to 1. The reason is that the InnoDB system table space cannot be compressed. The system table space contains not only user data, but also internal system information of InnoDB. It can never be compressed, so different table Spaces need to be set up to support compression.

After setting OK, run SHOW GLOBAL VARIABLES LIKE ‘%file_format%’ and SHOW GLOBAL VARIABLES LIKE ‘%file_per%’ to check whether the modification takes effect.

! [](https://static001.geekbang.org/infoq/15/155eaf9b98ca718b0ce9177201b47aa9.webp)

(This setting takes effect only for the current session and takes effect after the mysql instance is restarted. If you want to make it permanent, please configure it in mysql global configuration file.

3. Compression effect test and verification

Prepare one data table that supports compression and one data table that does not support compression in the same field format.

The compression table:

! [](https://static001.geekbang.org/infoq/68/68bcdf0031ac1d3e29751220507ab883.webp)

Row_format =compressed, specify the row format as compressed. Recommend key_block_size = 8. Key_block_size the default value is 16, and the optional values are 16, 8, and 4. The smaller the value is, the greater the compression power is. Based on the CPU and compression ratio, you are advised to set the value to 8 online.

Non-compressed table:

! [](https://static001.geekbang.org/infoq/b4/b44ac42697589a5bd1b32e778f1a71d6.webp)

Prepare data: use the stored procedure to insert 10W identical data into both the t_nocompress and t_compress tables. The space occupied by the two tables is shown as follows:

! [](https://static001.geekbang.org/infoq/10/10e9994789c654a767100812018b9915.webp)

The T_COMPRESS table occupies 10 MB and the t_NOCOMPRESS table occupies 20 MB, and the compression rate is 50%.

Note: The compression effect depends on the type of field in the table. Typical data usually has duplicate values and therefore can be compressed effectively. CHAR, VARCHAR, TEXT, BLOB, etc.

String data usually compresses well. Binary data (integers or floating-point numbers) and already-compressed data (JPEG or PNG images) generally don’t work.

5. Online practice

From the above test verification, if the compression rate can reach 50%, the space occupied by the old contact database can be compressed from 65% to 33%, and 60% of the remaining space can be reserved.

However, we need to be in awe of online data. Before online practice, we need to carry out offline scheme verification. Meanwhile, we also need to consider the following issues:

1. Does data compression and decompression affect the performance of the DB server?

We used performance pressure measurement to evaluate the impact of pre – and post-compression on the DATABASE server CPU. The following is the CPU comparison of the DB server before and after compression:

If the amount of data in a contact table exceeds 2000W, insert data into the contact table.

** Before compression: ** Insert 50 contacts at a time, concurrency 200, 10 minutes, TPS 150, CPU33%

! [](https://static001.geekbang.org/infoq/c4/c48757bad0d608a4a77109661d58c286.webp)

** Compressed: ** Insert 50 contacts at a time, 200 concurrent, 10 minutes, TPS 140, CPU43%

! [](https://static001.geekbang.org/infoq/51/51816b18aa950fbecefd950dbaeab34e.webp)

** Frequent inserts into the database do increase the CPU after table compression, but the TPS is not significantly affected. After repeated pressure tests, the DATABASE server CPU is basically stable at about 40%, which is acceptable for services.

2. Will changing the format of the data table file affect service SQL read and write and affect normal service functions?

We mainly do offline verification and online verification:

** Offline verification: ** The test environment adjusted all the contact data tables to the compressed format, arranged the test engineer to assist in checking the full function of the contact, and finally all the functions were normal.

The pre-online environment goes through the steps of the test environment again, and the function point check is normal.

** Online verification: ** Select the data table of the call record module that is not sensitive to users to compress, select a table in a database to compress, pay attention to the data read and write status of this table, and pay attention to user complaints.

After 1 week of continuous observation, the call record data in this table can be read and written normally, and no abnormal feedback from users has been received during this period.

3. Online contacts There is a huge amount of contact data. How to ensure the stability of service during compression?

We mainly carry out trade-offs according to the following ideas:

  • Select a contact data table for compression and estimate the time spent on individual tables.
  • Select a single database and perform concurrent compression of multiple tables to observe the CPU usage. The DBA balances the maximum CPU value at 55% and gradually adjusts the number of concurrent compression to keep the CPU stable at around 55% to determine the maximum number of tables that can be simultaneously compressed by a single library.
  • Combined with the first step and the second step, we can calculate the approximate time it takes to complete the compression of all databases and all data tables. After synchronization with the project team and relevant responsible persons, we can implement the compression according to the steps.

The results of online contact database data compression are as follows:

! [](https://static001.geekbang.org/infoq/93/93543dabf1e2f00d54b2f72ca572f153.webp)

Write at the end

This paper introduces the cloud service with the development of business, massive data storage challenges, and cloud service in the sub-database sub-table, database data compression experience, hoping to provide reference.

InnoDB data compression is suitable for the following scenarios:

  • Services that have a large amount of data and space pressure on database disks.

  • This mode is applicable to service scenarios that have high requirements on performance and QPS.

  • Applicable to business data table structures where there is a large amount of string data, this type of data table can usually be compressed effectively.

Finally:

  • In the selection of sub-database and sub-table, it is necessary to fully estimate the growth of data volume, because the subsequent data migration caused by the expansion of the database will break the bones.

  • Be in awe of online data. Solutions must be validated offline before they can be applied online.

Author: Vivo platform product development team