Original: Little sister taste (wechat public ID: XJjdog), welcome to share, reprint, please keep the source.

Distributed system, through data redundancy, to ensure data security. When writing a distributed system, there is a big hurdle to get around, and that is data synchronization.

Synchronization, these two words, tortured a lot of people.

Is it synchronous or asynchronous? Push or pull? Who is the master and who is the slave? What happens when you go offline? What happens when you go online? Centralized, or peer?

These problems all torture the designers of distributed systems.

Next, we’ll take a look at how some of the major storage services address data synchronization issues.

How does MySQL master slave synchronize?

The master server of mysql is called master and the slave server is called slave.

The master server records the changes in the binlog, and the slave copies these records through a separate thread and then plays them back.

There are three types of binlogs: statement, Row, and Mixed.

  • Statement writes the changed SQL statement to the binlog, which has some impact on accuracy
  • Row writes the changes of each record to the binlog
  • Mixed A combination of the above two. So MySQL is going to determine when statement is used and when row is used

Since the slave is copied by an asynchronous thread, it is easy to get delayed. When the master fails, there is a delayed loss of data.

To address the asynchronous replication problem, MySQL introduced the concept of semi sync after version 5.5. Semisynchronization is between asynchronous and full synchronization. The master does not return directly after executing a transaction, but waits until at least one slave has successfully written to it. With at least one slave to interact with, the performance of asynchronous replication is certainly compromised.

Full replication mode, of course, waits for all slave nodes to complete replication, which is the most secure, but also the least efficient. Conceptually, a half copy with only one slave is a full copy.

After 5.7, mysql implemented the group Replication protocol. It supports single-master mode and multi-master mode, but in the same group, cannot exist at the same time. As amazing as that sounds, it’s actually done through the PaxOS protocol.

How does Kafka do replica synchronization?

Because Kafka is a message queue, it doesn’t have to worry about random deletes and updates, it only cares about writes. Structurally, Kafka’s synchronization units are very fragmented: Kafka has multiple topics, each of which is divided into multiple partitions, and replicas are based on partitons.

The primary partition is called the leader, and 1-N replicas are called followers. When the producer sends a message, it needs to find the leader of the partition and then send the data to it. Followers exist only as a backup, so that they can be topped up in the event of a problem with the primary partition.

The master/slave synchronization In Kafka is called In Sync Replica (ISR) mechanism.

When is a message sent? It depends on the sending level of the ACK.

  • 0Indicates that the message is sent asynchronously. When the message is sent, it is considered successful
  • 1When the leader master copy is written, the send is successful
  • - 1The leader is sent, and both copies in the ISR need to reply ack

In both cases, Kafka has the potential to lose messages. In the case of -1, ensure that at least one follower has successfully committed to ensure message security. If none of the followers can catch up with the leader, the followers will be removed from the ISR list. That’s right, just remove it. When ISR is null, there is no difference between kafka partitions and stand-alone machines, so Kafka provides the min.insync.replicas parameter that specifies the minimum ISR.

  • What happens when the ISR is not satisfied? Kafka does not lose messages, of course, because the commit fails and the message does not enter the system
  • What happens when all copies are unavailable? At this point, the partition will never be available

Data is copied between copies by using the “follower pull” method.

Master-slave replication of Redis

Redis is an in-memory KV database, far faster than other databases, in theory, master/slave synchronization is easier. However, under high traffic and high QPS, master-slave replication can still be problematic.

After a Redis slave is connected, a full synchronization is performed. It sends the psync command to the master, and the master executes bgsave to generate an RDB file. Full synchronization is to copy the RDB snapshot file to the slave.

What about the data that appears in the middle of a full copy? It has to be cached. The master starts a buffer, records the new data generated during the full replication, and completes the incremental data after the full synchronization.

After slave disconnection, it is not necessary to perform full synchronization each time. The concepts of offset, replication backlog buffer, and run_id are also introduced to match the increments. You can see that it’s used to identify the slave, its copy location and its buffer.

After the synchronization, you can always use psync to replicate. It is still asynchronous replication.

It can be seen that redis’ master-slave replication consistency is heavily dependent on memory and the level is very weak. But it’s fast. Fast solves a lot of problems, so the application scenarios are different.

ElasticSearch Primary/secondary replication

Es is a search engine based on Lucene. Data nodes contain multiple indexes. Each index contains multiple shards, and each shard contains multiple replicas.

As described above, these concepts are highly similar to kafka, where the replication unit of ES is a shard.

It also maintains a list of synchronized slaves (InSyncAllocationIds). The yellow and red copies are not in this list.

The master waits for all the normal copies to be written before returning them to the client, so the consistency level is high because its slave nodes participate in the read operations and it is a near-real-time system.

Since it is a database, there will still be deletes and updates. Translog is the equivalent of WAL logs, ensuring data security in case of power outages, which is consistent with other RDBMSS.

Cassandra cluster mode

Cassandra is a very famous CAP theory practice database, more like an AP database, and currently has a high ranking on db-engines.com.

Data storage is the concept of tables, a table can be stored on multiple machines. Its partitions are designed using partition keys, and the data distribution relies heavily on hash functions. What if something goes wrong with a node? That requires consistent hash support.

Cassandra is very interesting because its replicas are not like other master data, but more like multiple master data, all of which are served at the same time. An active/standby switchover is not required when a checkpoint is dropped.

Why is this possible? Because Cassandra is looking for ultimate consistency. Due to the existence of copies in distributed systems, asynchronous or synchronous replication is inevitable. So what level of replication is appropriate? Quorum’s R+W is a tradeoff policy.

quorum = (sum_of_replication_factors / 2) + 1
Copy the code

What does that mean? Given that you have 5 drawers and you randomly put W balls in, how many times does it take R to get one out. If you put a ball in it, and you have to open it 5 times to get it right each time, R=5, W=1; When you have 2 balls, you only need to open them 4 times; If you put five balls in, you only need to read it once.

When R+W>N, it belongs to strong consistency; When R+W<=N, it belongs to the final consistency.

Interestingly, the cluster information in Cassandra, the meta information, is transmitted using push-pull-Gossip.

MongoDB master/slave replication

Mongodb has three types of data redundancy. One is master-slave (not recommended), the other is replica set, and the other is sharding mode.

Mongodb’s replica master/slave is a standard fail-over implementation that does not require human intervention. After the master node is lost, a new master node is selected from the replica set and other nodes are directed to connect to the master node.

For mongodb’s electoral algorithm, bully is used.

Changes to the primary node are stored in specific system tables. The slave periodically pulls these changes and applies them. It can also be seen from this description that mongodb may lose data in case of synchronization delay or single node failure.

conclusion

Distribution is designed to solve the capacity problem of a single machine, but it introduces a new problem, which is data synchronization.

Data synchronization should focus on consistency, fault recovery, and timeliness.

There are two main types of data that need to be synchronized.

  • Metadata information
  • Real data

For metadata information, the current mainstream practice can refer to the raft protocol for data distribution. When it comes to actual data synchronization, the RAFT protocol is somewhat inefficient, so asynchronous replication is commonly used.

In this case, the asynchronous replication list is the key metadata information that the cluster needs to maintain the state of these nodes. In the worst case, the asynchronous replication nodes are all unavailable and the master runs itself in a very untrusted environment.

In order to increase the flexibility of data allocation, these replication units will mostly operate on sharding sharding, which brings about the explosion of meta information.

There are so many distributed systems, but there is no uniform pattern. Interestingly, even the most inefficient distributed systems have a huge following. Don’t believe it? Just look at what BTC is doing.

Author introduction: Little sister taste (XJjdog), a programmer is not allowed to detour the public number. Focus on infrastructure and Linux. Ten years of architecture, 10 billion traffic per day, and you discuss the high concurrency world, give you a different taste. My personal wechat xJJdog0, welcome to add friends, further communication.