1. The background

In the current database system ecosystem, most systems support data synchronization mechanism between multiple node instances, such as Mysql Master/Slave master-slave synchronization, Redis AOF master-slave synchronization, and MongoDB supports 3-node (or more)ReplicaSet synchronization. The above mechanism supports data redundancy and HA for a logical unit. Across logical units (3, master-slave node instance instance), and even across different unit, data synchronization of data center, in the business layer is very important, sometimes can support load balancing of city more room, and more room for each other, and even beyond the disaster data center (such as optical fiber to dig into, earthquake and other small probability events) and more. Based on the above background, the MongoDB version of cloud database officially launched the bidirectional synchronization product “MongoDB Cloud Disaster Recovery” (also known as BLS) between MongoDB instances to help enterprises quickly copy Alibaba’s remote DISASTER recovery and multi-live architecture. Products to address

2. Product introduction

MongoDB On-Cloud DISASTER Recovery obtains Oplog (Operations Log) data from the source database (Oplog logs are stored in the Oplog) and transmits the Oplog data to the destination database for replication. Two replication links are constructed to implement bidirectional synchronization. Based on this, Dr And hypermetro functions can be implemented.



The figure above shows the overall replication synchronization process. Currently, the source and destination databases supported by MongoDB DISASTER Recovery on cloud are ReplicaSet replicas. Sharding mode is about to take effect and single-node mode is not supported. To relieve the pressure on the primary node, the Oplog is pulled from the Secondary node.

Full plus delta

The synchronization model takes a full + incremental approach: the source database is fully synchronized at creation time, and subsequent changes are synchronized through increments.

Live and live models

Because replication is asynchronous, users must ensure that the same unique key is not changed at the same time in hypermetro/multi-active mode, because simultaneous changes may cause data errors. The current conflict policy can be overridden or ignored. The line verification program will be installed later to provide an interface error when the user operates the same unique key.

High efficiency guarantee

The synchronization delay varies by region and network, and theoretically TPS can be close to 200,000 (200,000 oplogs per second). In order to ensure high efficiency of batch transmission, data is sent by a caching mechanism, so the delay of a single Oplog may be 1 second in the limit case. Normally, the global data synchronization delay is less than 3 seconds.

  • Source databases are pulled in parallel, resolving conflicting dependencies
  • Send in parallel to the Kafka channel
  • The destination side writes to the database in parallel while resolving dependencies

Annular copy

To prevent circular replication (data is copied from source to destination and from destination to source), add gids to Oplog logs to solve this problem.

Reliability transmission

Data synchronization is not affected when an instance is restarted.

High availability

Link synchronization is highly available. If the synchronization process fails, the standby process starts to take over the service.

Limit that

  • DDL statement synchronization is not enabled. Therefore, if the index operation of the source and destination data is changed, the synchronization cannot be enabled.
  • The destination database needs to be created. A channel between two existing MongoDB instances is not supported.
  • Currently, only the hypermetro function (synchronizing data between two MongoDB instances) is supported. In the future, the online multi-active function (synchronizing data between multiple MongoDB instances) is supported.

3. The architecture

MongoDB Disaster Recovery on the Cloud has three components:

  • BLS Manager. The central control module is responsible for scheduling and monitoring tasks of Collector and Receiver.
  • BLS Collector. Data acquisition module, responsible for fetching Oplog data from the source MongoDB database and sending it to Kafka channel.
  • BLS Receiver. The data playback module is responsible for getting data from the Kafka channel and writing it to the destination MongoDB database.

The following figure shows the overall architecture of the system.

4. User use cases

Amap App is the leading map and navigation application in China. Aliyun MongoDB database service provides partial storage support for this application, storing hundreds of millions of levels of data. Now Autonavi uses the three-center strategy in China to improve service quality by routing to the nearest center through geographical location and other information. The business party (AMAP) routes to the three urban data centers through users, as shown in the figure below. There is no dependent calculation between data in the computer room. Data is synchronized in different centers using MongoDB Disaster Recovery on the Cloud.



These three cities are geographically across the whole China from north to south, which poses challenges to the replication and disaster recovery of DCS. If there is a problem in the equipment room or network in one area, the traffic can be smoothly switched to another place, so that users hardly feel the problem.

Currently, two equipment rooms are connected in the topology. Data in each equipment room is asynchronously synchronized to the other two equipment rooms through MongoDB Disaster Recovery on cloud. Then, through the routing layer of Autonavi, user requests are routed to different data centers, and read and write are sent to the same data center to ensure a certain transactional nature. This ensures that each data center has the full amount of data (ensuring final consistency). If a problem occurs in any equipment room, one of the other two equipment rooms can provide read and write services after switching. The following figure shows the synchronization of rooms in City 1 and City 2.



If a unit cannot be accessed, the Manager can obtain the synchronization offset and time stamp of each equipment room through the product management interface of MongoDB Disaster Recovery on cloud. The Manager can determine whether the asynchronous replication has been completed at a certain point in time by collecting and writing data. Together with the DNS traffic cutting of the service side, the traffic of the unit is cut off and the requests of the original unit can be read and written in the new unit, as shown in the following figure.