Application deployment across multiple data centers is one of the hottest topics of the moment, as business disruptions can be devastating to many enterprises as the number of service Windows continues to grow.

Today, in best practices for application deployment across multiple data centers, databases are typically responsible for handling reads and writes to multiple geographic regions, replication of data changes, and providing the highest possible availability, consistency, and persistence.

But not all technologies are created equal in choice. For example, one database technology can provide a higher guarantee of availability, while at the same time providing a lower guarantee of data consistency and persistence than another technology.

This paper first analyzes the requirements of database architecture in modern multi-data center applications. Then it discusses the types and advantages and disadvantages of database architecture, and finally studies how MongoDB applies to these categories, and finally realizes the application architecture of hypermetro. Hypermetro requirements

When organizations consider deploying applications across multiple data centers (or regional clouds), they often want to use a “active-active” architecture, where application servers in all data centers process all requests simultaneously.




Figure 1: hypermetro application architecture

As shown in Figure 1, this architecture can achieve the following goals:












The alternative to the hypermetro architecture is the master-DR (also known as master-slave) architecture consisting of one master data center (region) and multiple DR (disaster recovery) areas (figure 2).




Figure 2: Master-DR architecture

Under normal operating conditions, the primary data center processes requests while the DR site is idle. If the primary data center fails, the DR site immediately starts processing requests (and becomes active).

Typically, data is copied from the primary data center to the DR site so that it can be taken over quickly in the event of a primary data center failure.

Today, the definition of hypermetro architecture is not widely agreed upon in the industry, and the application architecture of the master-DR is sometimes counted as “hypermetro”.

The difference is whether the failover from the master site to the DR site is fast (usually in seconds) and can be automated (without human intervention). In this interpretation, a hypermetro architecture means that application downtime is close to zero.

It is a common misconception that the application architecture of hypermetro requires a multi-master database. This is a mistake because it misunderstands the consistency and persistence of data across multiple master databases.

Consistency ensures that the results of previous writes can be read, while data persistence ensures that committed writes can be kept forever without conflicting writes. Or data loss due to node failure.

Database requirements of hypermetro applications

When designing the application architecture of hypermetro, the database layer must meet the following four architectural requirements (of course, it must also have the functions of standard database, such as: rich query language with secondary index capability, low latency data access, local drivers, comprehensive operation tools, etc.) :

The performance,



Data persistence,



Consistency,


Availability,The database must be able to continue running when a node, data center, or network connection is down. In addition, the recovery time from such a failure should be as short as possible, the general requirement is a few seconds.

Distributed database Architecture

For the application architecture of hypermetro, there are three types of database structures:

Distributed transactions using two-step commits.



The multi-master database schema is sometimes referred to as the “no-master database schema.”



A sharding database has multiple master shards, each of which is responsible for a unique segment of the data.


Let’s take a look at the pros and cons of each structure. ,

Distributed transactions committed in two steps

The distributed transaction approach updates all nodes containing a record in a single transaction, rather than writing to one node and then (asynchronously) copying to other nodes.

This transaction guarantees that all nodes will receive updates, otherwise all nodes will revert to their previous state if a transaction fails.

While the two-step commit protocol ensures persistence and multi-node consistency, it sacrifices performance.

The two-step commit protocol requires two-step communication between all participating nodes in a transaction. That is, at each stage of the operation, requests and acknowledgements are sent to ensure that each node completes the same write simultaneously.

When database nodes are distributed across multiple data centers, the query latency increases from milliseconds to seconds.

In most applications, especially those where the client is a user device (mobile device, Web browser, client application, etc.), this level of response is unacceptable.

Multi-master database

A multi-master database is a distributed database that allows a record to be updated on only one of the cluster nodes. Write operations typically replicate the record to multiple nodes in multiple data centers.

On the surface, a multi-master database should be an ideal solution for implementing a hypermetro architecture. It allows each application server to read and write copies of local data without limitation. However, it has serious limitations in data consistency.

Because two (or more) copies of the same record may be updated simultaneously by different sessions in different locations. This can result in two versions of the same record, so the database (and sometimes the application itself) must resolve inconsistencies by resolving conflicts.

Common conflict resolution strategies are: the most recent update “wins” or records with more changes “wins”. If other, more complex resolution strategies are used, performance will suffer significantly.

This also means that different data centers will read different values and conflicting values for the same record in the time between writing and the completion of the conflict resolution mechanism.

Partitioned (sharded) databases

A partitioned database divides a database into different partitions, or shards. Each shard is implemented by a set of servers, each of which contains a full copy of the partitioned data. The key here is that each shard retains exclusive control over the data partition.

For each shard at any given time, one server acts as master and the other servers act as replicas. Data reads and writes are published to the master database.

If the primary server fails for any reason, such as hardware or network failure, one of the secondary servers automatically takes over the role of the primary server.

Each record in the database belongs to a specific partition and is managed by a shard to ensure that it is only written by the master shard. Records within a shard are mapped to a master shard of each shard to ensure consistency.

Since there are multiple shards in the cluster, there are multiple primary shards (multiple primary partitions), which can be assigned to different data centers to ensure that writes occur locally in each data center, as shown in Figure 3:




Figure 3: Partitioned database

The shard database can be used to implement a hypermetro application architecture by deploying at least as many shards as the data center and assigning master shards to the shards so that each data center has at least one master shard, as shown in Figure 4:




Figure 4: Hypermetro architecture with sharded database


In addition, configuring sharding ensures that each shard has at least one copy (a copy of the data) in the various data centers.

For example, the diagram in Figure 4 depicts a distributed database architecture across three data centers:

  • New York City (NYC)

  • London (LON)

  • Sydney (SYD)

The cluster has three shards, each with three copies:












In this way, each data center has copies from all shards, so the local application server can read the entire data set and the master shard of a shard for write operations locally.

Sharded database can meet the consistency and performance requirements of most application scenarios. Because the reads and writes take place on the local server, performance is very good.

When read from the master shard, consistency is guaranteed because each record can only be assigned to one master shard.

For example: We have two data centers in New Jersey and Oregon in the US, so we can split the data set by geographic region (east and west) and route the traffic of east Coast users to the data center in New Jersey.

Because the data center contains shards that are mostly used in the east; And routes traffic from west Coast users to an Oregon data center that contains sharding primarily for the West.

We can see that a sharded database gives us all the benefits of multiple master databases without the complexity of inconsistent data.

The application server can read and write from the local host server, and since each master server has its own records, there are no inconsistencies. Conversely, a multi-master database solution can result in data loss and inconsistent reads.

Database architecture comparison




Figure 5: Database schema comparison

Figure 5 provides the advantages and disadvantages of each database architecture when it meets the requirements of a hypermetro application. When choosing between a multi-master database and a partitioned database, the determining factor is whether the application can tolerate inconsistencies and data loss.

If the answer is yes, then multiple master databases may be slightly easier to deploy. If the answer is no, then sharded databases are the best choice.

Since inconsistencies and data loss are unacceptable for most applications, sharded databases are often the best choice.

MongoDB Active-active applications

MongoDB is an example of a sharded database architecture. In MongoDB, the construction of primary and secondary server sets is called replica sets. Replica sets provide high availability for each shard.

A mechanism called Zone Sharding is configured as a set of data managed by each shard. As mentioned earlier, ZoneSharding enables geographical partitioning.

MongoDB Multi-DATA Center Deployment:




Zone Sharding




In fact, many organizations, including Ebay, YouGov, Ogilvyand Maher, are using MongoDB to implement a hypermetro application architecture.

In addition to standard shard database capabilities, MongoDB provides fine-grained control over write durability and read consistency, making it ideal for multi-data center deployments. For writes, we can specify write Concern to control the persistence of writes.

Writeconcern allows applications to specify the number of copies to be written to servers in one or more remote data centers before MongoDB confirms the write. As such, it ensures that changes to the database will not be lost in the event of a node or data center failure.

In addition, MongoDB complements a potential shortcoming of sharded databases: write availability is not 100%.

Because there is only one primary node per record, if that primary node fails, the partition cannot be written to for a period of time.

MongoDB dramatically reduces failover time through multiple write attempts. Through multiple write attempts, MongoDB can automatically deal with write failures caused by temporary system errors such as network failures, thus greatly simplifying the amount of application code.

Another notable feature of MongoDB that is suitable for MULTI-DC deployments is the speed of automatic MongoDB failover.

In the event of a node or data center failure or network outage, MongoDB can failover within 2-5 seconds (depending on its configuration and the reliability of the network itself, of course).

When a failure occurs, the remaining replica set is configured to select a new master slice and MongoDB driver, which automatically identifies the new master slice. Once the failover is complete, its recovery process automatically performs subsequent writes.

For reads, MongoDB provides two capabilities to specify the desired level of consistency.

First, when reading from secondary data, the application can specify a maximum aging value (maxStalenessSeconds).

This ensures that the lag time for the secondary node to replicate from the primary node does not exceed the specified age value, so that the data returned by the secondary node is time-sensitive.

In addition, reads can be associated with ReadConcern to control the consistency of returned data in queries.

For example, ReadConcern can inform MongoDB of the data being copied to many nodes in the replica set with some return value.

This ensures that queries only read data that has not been lost due to node or data center failures, and also provides an application with a consistent view of the data over time.

MongoDB 3.6 also introduces the concept of “causal consistency” to ensure that each read in the client session is always “focused” only on whether the previous write has completed, regardless of which specific copy is servicing the request.

This causal consistency ensures that each read always follows a logical consistency through strict causal ordering of operations in the session, leading to monotonic reads for distributed systems. This is exactly what various multi-node databases cannot meet.

Causal consistency not only enables developers to retain the advantages of strict data consistency in the implementation of traditional single-node relational databases in the past; It can also take full advantage of the current popular architecture on a scalable and highly available distributed data platform.






Data and cloud
Data and cloud