1. Key issues

  • The data distribution

For storage systems, the most important problem is data distribution, which data is placed on which nodes. When distributing data, you need to consider whether data is balanced and whether it is easy to expand in the future. Different data distribution methods also have different advantages and disadvantages, which need to be selected according to their own data characteristics.

1) Hash distribution => random read

Mold direct hashing:

Key: find a hash function with good hash properties

Problem: Massive data migration as servers are added and removed

Solution: 1) Store < hash, server > metadata in metadata server; 2) Consistent hashing

Consistency hashing:

Key: The hash value becomes a range, and the data stored on each physical node is the data whose hash value is in the previous range.

Advantages: The addition or deletion of nodes affects only the neighboring nodes in the Hash ring.

Maintain the position of each machine in the hash ring: 1) record the position information of the previous and the next node, and each search may traverse the whole hash ring all servers; 2) O (logN) location information, search time complexity is O (logN); 3) Each server maintains the location information of all servers in the whole cluster, and the time complexity of searching for servers is O (1).

Virtual node:

Advantages: The traditional hash distribution of one (physical nodes) to one (hash values) distribution of one (physical nodes) to many (hash values). The distribution of data can be adjusted according to the capabilities of the physical nodes.

2) Sequential distribution => Sequential scan

The data in the table is ordered by primary key

  • Load balancing

1) When data is written, the choice of writing nodes (space capacity? CPU load?

2) Data migration during operation

If a new machine is added during the running, the amount of data stored on each machine is different, you need to automatically discover and adjust the data. However, in the process of adjustment, we should also control the speed, so as not to affect the business.

  • Copy & multiple backups

1) Maximum protection mode

Strong synchronous replication: The replication is successfully performed on at least one standby database

Success is returned only when at least two backups are stored successfully.

2) Maximum performance mode

Asynchronous replication: Returns when the primary database executes successfully

Success is returned as long as 1 backup was successfully saved.

3) Maximum availability mode

There is a compromise between the two modes: maximum protection mode in normal cases, maximum performance mode in the event of a failure

  • Data consistency

Version number: Generates the corresponding version number when a data write request is received.

Delete the old version number. Ensure that the data read is the latest version number; When writing, ensure that the version number of the data to be written is new and stored.

  • Fault tolerance
1) Fault detection

Heartbeat: S sends a heartbeat packet to C at regular intervals

Lease mechanism: Authorization with a timeout period

2) Fault recovery

Master: indicates the active/standby mechanism and persistent indexes

Datanode: permanent fault, add backup

  • scalability
1) Whether the total control node becomes the bottleneck

Not a bottleneck: the processing of small files is abandoned, the read and write control of data is transferred to the working machine, and the access to the master controller node is reduced through the client cache metadata

Memory becomes the bottleneck: a two-level structure is adopted, with a layer of metadata nodes added between the master controller and the worker

2) Isomorphic system

Storage nodes are divided into several groups. The nodes in each group serve the same data

3) Heterogeneous system

Data is divided into slices of similar size, and multiple copies of each fragment are distributed to any storage node in the cluster. If a node fails, the original services are restored by the whole cluster instead of a few fixed storage nodes