How to build unified KV storage service based on MySQL and Redis

This paper introduces the unified KV storage service based on MySQL and Redis: common deployment mode and characteristics, Cluster Manager, MySQL and Redis Cluster scheme, as well as Sync data synchronization service.

MySQL+Redis common deployment mode

1.1 the topology

1.2 the characteristics of

The business layer writes MySQL and Redis simultaneously through double writes. The read is usually in Redis. If the read is not available, the read is from MySQL, and then the data is synchronized to Redis. Redis usually sets expire or the default LRU for data flushing.

This method of use has the following problems:

1) Data inconsistency risks exist in MySQL and Redis, especially in systems that run for a long time

2) The business layer needs to deal with the logical difference between MySQL SQL Schema and Redis KV data structure

3) There is no unified operation and maintenance

4) It is not convenient to expand/shrink the capacity

Two, KV storage concept

2.1 MySQL Is great NoSQL

Reference Documents:

http://www.aviransplace.com/2015/08/12/Mysql-is-a-great-nosql/

Why use MySQL:

“An important consideration when building scalable systems is the maturity of the technology used,” he says. “The maturity of the technology means the ability to recover quickly when things go wrong. Of course, developers can also use the latest NoSQL database in their projects, which should work fine in theory, but how long does it take to recover if something goes wrong in production? Existing knowledge and experience in technology, including what Google can search for, is critical to mitigating the problem.

In contrast, relational databases have existed for more than 40 years, and the industry has accumulated a lot of experience in the maintenance of relational databases. Because of these considerations, Mysql is often chosen over NoSQL when it comes to technology selection for new projects, unless NoSQL really has a very, very clear advantage.”

2.2 KV concept

For hundreds of millions of data stores, especially when it involves horizontal splitting of cross-machine databases and tables, online access to the database can only be as simple as possible. The support of group by/ Order by/ paging/universal join/ transaction is not suitable for MySQL system at this level.

Basically at present, all class proxy MySQL solutions can only be used to read and write operations by pressing split key in real scale online applications. In fact, it is also a KV system made by split key.

If you want to use complex SQL processing, the most logical deployment scenario is to abstract the Mysqlbinlog pipeline synchronization service and synchronize it to an OLAP-like system in real time.

So for mass storage services, it is feasible for MySQL to be designed as a KV system from the beginning. Value uses mediumblob to store serialized XML/JSON/Protobuf /thrift formatted data.

2.3 Use mode of MySQL KV

MySQL > select * from primary key

2. All other non-primary and non-index keys are wrapped in value, which uses mediumblob to store XML/JSON/Protobuf /thrift formatted data after serialization.

3. Data read and write operations are based on a whole row of key data. The business layer analyzes the structure of the value inside and adds, deletes, and changes the internal structure without changing the schema of MySQL itself.

2.4 Not Applicable Scenarios

1. Business scenarios where the data volume and access volume are not large and the business logic depends on the MySQL database for processing

2. It involves the processing of multi-table join, etc

For this limitation, KV can also be processed by processing the associated table into a wide table based on the associated conditions.

3. It involves the processing of transactions.

Iii. MySQL+Redis is designed as a unified KV storage service

3.1 the target

1) The business layer accesses MySQL and Redis in a unified way, instead of using MySQL client and Redis client for access

2) MySQL cluster /Redis cluster deployment

3) Change the service double-write mode to the data synchronization mode between MySQL and Redis

4) Heterogeneous data stores support final consistent data read and write services

5) Supports storage layer capacity expansion and failover without service awareness

6) QPS/TPS support for 10 billion times per single cluster per day (large categories of business are moderately split into different clusters)

3.2 Final Implementation

MySQL+Redis based on UniStore =

MySQL cross-machine sub-database sub-table cluster

+ Redis cluster

+ MySQL->Redis real-time data synchronization service

+ unified external data access interface

+ Internal complete operation and maintenance support system (support online capacity expansion/reduction, failover, etc.)

3.3 architecture diagram

3.4 Architecture Description – Storage is designed as a service

1. Make MySQL+Redis a unified KV storage service

2. Provide a unified data access interface through ACC Proxy and support cross-language data access through a unified protocol

Access protocol (custom protocol, Protobuf protocol, Thrift protocol, etc.)

MySQL Cluster supports cross-machine sub-database sub-table design, schemaless design, all business table KV design

4. Redis Cluster supports instance splitting across machines

5, Sync data synchronization service provides unified Mysql to Redis cross-IDC/non-cross-IDC data synchronization service, less than 100ms delay

6. The whole system does not involve distributed transaction processing

3.5 Three Deployment modes

1. Pure MySQL cluster deployment

This deployment mode is the same as other MySQL Proxy cross-machine database and table solutions, and all reads and writes are in MySQL

2. Pure Redis cluster deployment

This deployment mode is equivalent to other Redis Proxy cross-machine sub-database and sub-table schemes, and all reads and writes are in Redis

3, MySQL+Redis heterogeneous deployment

Write in MySQL, and read can be read from MySQL or Redis, depending on the latest data read requirements of the service.

3.6 Interface Description

1、int get(int appid, string key,string& value)

Redis is dedicated to read operations

2、int get_with_version(int appid,string key, string& value, int64& version)

MySQL is dedicated to read operations and has its own version number to prevent write overwrite

3、int set(int appid, string key,string value, int64 version)

Appid distinguishes MySQL from Redis, both of which support write operations

4, int delete(int appid, string key)

Appid is used to distinguish MySQL from Redis. Batch deletion is not supported

5, int multiget(int appId,vector<string> keys, map<string, string>& key_value_pairs)

Supports batch read operations, and does not care about internal data routing and data consolidation

6, IntMultiset (int appId, map<string, string>& key_value_pairs)

Support is not recommended because cross-machine transactions are involved and ACID cannot be guaranteed

7, int Redis_op(string CMD,…)

Redis other native interface packages (incR /expire/list/setnx, etc.)

Here I recommend an architecture learning exchange group. Exchange learning group number: 575745314 inside will share some senior architects recorded video video: Spring, MyBatis, Netty source analysis, high concurrency, high performance, distributed, microservice architecture principle, JVM performance optimization, distributed architecture, and so on these become architects necessary knowledge system. I can also get free learning resources, which I benefit a lot from now

Cluster Manager service

4.1 Cluster Manager is a Service

The Cluster Manager provides the following functions

1) MySQL/Redis fragment routing information management

1, MySQL sub-database sub-table routing information

2. Redis Slot Routing information

3. Change management of routing information

2) Exploration of Redis instance and migration of Redis expansion and reduction data

For example, if the ping fails at an interval of 30sRedis for three consecutive times, the instance is considered to have hung up and an alarm is issued or automatic switchover is performed

3) Cluster Manager does not recommend to participate in the management of the primary and secondary levels of Mysql Group

MySQL cluster management solution at the primary and secondary levels:

1. MHA+VIP (most commonly used by Internet companies)

Level 2, WeChat PHXSQL system: https://github.com/tencent-wechat/phxsql financial reliability

MySQL cluster solution

5.1 architecture diagram

5.2 Design Principles

1) Unified schemaless table structure

2) Cross-machine data distribution

Support for horizontal splitting of a single logical table into multiple Mysql servers

3) Other instructions

1. High data storage reliability. All service data is stored in the Value column through serialization

2. Each row of data has its own version number. Services use cas to prevent write overwriting caused by simultaneous write of multiple instances at the service layer

Global unique version number implementation: local microsecond timestamp + server_ID + proccess_ID

3. Fixed data splitting mode of 100 tables / 100 tables, multi-machine cross-mysql instance deployment

5.3 Routing Policy

1) Consistency hash

2) Routing calculation algorithm

Crc32 / MD5 / Various hash algorithms based on strings

3) Format of routing information

CREATETABLE `Mysql_shard_info` (

`appid` int(32) NOT NULL,

`begin` int(32) NOT NULL,

`end` int(32) NOT NULL,

`ip` varchar(20) NOT NULL DEFAULT ”,

`port` int(11) NOT NULL DEFAULT ‘0’,

`user` varchar(50) NOT NULL DEFAULT ”,

`pwd` varchar(50) NOT NULL DEFAULT ”,

PRIMARY KEY (`appid`,`begin`)

)ENGINE=InnoDB DEFAULT CHARSET=utf8;

5.4 Data Migration/Automatic Expansion

Data migration:

STEP1: use the MySQL active/standby replication mechanism to replicate data

STEP2: Stop the old fragment write operation when the data difference is less than a critical value (read-only)

STEP3: wait until the new fragment data is updated

STEP4: Change routing rules. Cluster Manager updates routing information to all Access proxies

STEP5: delete the old sharding

Automatic extension:

The process is similar to data migration

6. Redis cluster scheme

6.1 Deployment Modes

1, heterogeneous read and write separation -MySQL write, Redis read

1) Data write operation in MySQL, read operation in Redis

2) Data is parsed through Sync system and synchronized from Mysql to Redis

3) Data synchronization delay (less than 100ms) to achieve final consistency

Application scenario: High reliability data is required and a large amount of data is read. Inconsistent data can be read for a short period of time. If you want to read the latest data, use the get_with_version() interface to read data from MySQL

2. Independent Redis cluster service

1) Reading and writing are all in Redis, providing independent KV storage service

2) Users do not need to pay attention to expansion, reduction, and fault recovery

3) Multiple services are stored in the cluster to improve memory usage

Application scenario: Independent Redis cluster services, similar to TwenProxy/CODIS

6.2 Design Essentials

1. Consistency hash

Support data splitting across Redis instances, fixed Slot number for splitting

2. Single-node multi-instance deployment

1) Each physical machine supports multiple Redis instances

2) Each Redis instance only serves a single business

3) The memory size of Redis instance depends on business requirements, considering both business access and data volume

RedisIP+port indicates the unique instance. For 128GB machines,

Configurable 3 Redis instances * 30GB per instance

Or 10 Redis instances * 10G per instance

Or 20 Redis instances * 5G per instance

Split rule: Single instance maximum memory usage < native free memory

3. Smooth capacity expansion or reduction by Slot

4. Redis instance failover

6.3 Smooth Capacity Expansion or Reduction

The main steps are as follows:

STEP1: confirm capacity expansion/reduction

The Cluster Manager generates alarms about system load and data volume to confirm capacity expansion or reduction

STEP2: modify the routing table

1) Modify the routing table, change the status of shards to Migrate, and push the new routes to all access layers

2) ACC Proxy will transfer the write operation to the new Redis instance. By default, the read operation will read the new Redis instance first. If the key does not exist, the acc proxy will continue to read from the old Redis instance

STEP3: migrate data

1) Cluster Manager uses the automatic data migration tool to MIGRATE data. The Cluster Manager plans to scan related keys by using the Scan command of Redis and MIGRATE data

2) Perform multiple scans to ensure that all data in the Slot has been migrated

STEP4: modify the routing table. The migration is complete

The Cluster Manager cuts all reads and writes to the new Redis instance and no longer operates from the old Redis

Sync Data synchronization service

7.1 architecture

7.2 Application Scenarios

The service can be abstracted into an independent data synchronization distribution service, and the SQL processing lost due to KV can be synchronized to the OLAP system for processing through this service. In addition to Redis, you can also synchronize to ElasticSearch or hbase or write HDFS files to implement complex calculations and analysis based on the Hadoop ecosystem.

7.3 Design Essentials

1. Real-time data synchronization between clusters

MySQL requires binlog logs to be in row format

2. DDL processing is not involved

MySQL Schemaless design eliminates DDL processing and simplifies synchronization services (across/without IDCs)

3. Synchronization delay monitoring based on timestamp

MySQL binlog ROW logs have time stamps, which are used to monitor the synchronization delay

4. Synchronization location management based on binlog file name +offset

Periodic quantitative persistence saves the binlog file name and offset of the current synchronization for synchronization recovery in various scenarios

5. Row-based parallel synchronization

In multi-threaded synchronization mode, the main thread hashes the TableID or key to distribute the binlogEvent time to the queue of the corresponding worker thread. The worker thread obtains binlog events from the queue in turn for execution

7.4 Implementation Principles

The principle is relatively simple:

1) Sync simulates the Mysql slave interaction protocol and sends dump protocol to Mysqlmaster disguised as Mysql slave

2) Mysqlmaster receives dump request and starts to push binary log to slave

Sync parses binary log objects (originally byte streams) and converts them to Redis or other storage (HDFS /hbase/ES databases) corresponding data operation interfaces or stores them as messages to MQ (RocketMQ or Kafka).

7.5 ROW Format Events

MySQL 5.5 Binlog has a variety of event types. This section describes only events related to ROW mode

1) QUERY_EVENT: Stores SQL operations that are not related to data, such as begin and drop table

2) TABLE_MAP_EVENT: records the table information corresponding to the next event, in which database name and table name are stored

3) WRITE_ROWS_EVENT: the operation type is INSERT

4) UPDATE_ROWS_EVENT: Indicates that the operation type is UPDATE

5) DELETE_ROWS_EVENT: The operation type is DELETE

6) XID_EVENT, used to identify transaction commit (COMMIT)

A typical INSERT statement consists of the following four events:

7.6 Other Open Source Synchronization Schemes

1. tungsten-replicator(JAVA)

http://code.google.com/p/tungsten-replicator/

2. linkedin databus(JAVA)

https://github.com/linkedin/databus

3. Alibaba canal(JAVA)

https://github.com/alibaba/canal /