Author: Gu Gu Man – Junjun

Mongodb is widely used as a non-relational database because of its high performance, high availability and sharding support. Its high availability is mainly reflected in the mongodb replica set (can be simply understood as a master multi-slave cluster), this article mainly from the replica set introduction, local build replica set, replica set read and write data these three aspects to take you to understand mongodb replica set.

First, mongodb replica set introduction

Replica sets include primary nodes and Secondaries.

There can only be one primary node. All write requests are processed on the primary node. Data on the primary node can be backed up by synchronizing operation logs (Oplog) of the primary node.

When the primary node fails, the replica node with voting authority automatically initiates an election and elects a new primary node from it.

Replica nodes can be configured to specify their specific properties, such as election, hiding, delay synchronization, etc. There can be a maximum of 50 replica nodes, but only 7 replica nodes can participate in the election. Although replica nodes cannot handle write operations, they can handle read requests, which are covered below.

A replica cluster requires at least three nodes: one primary node and two backup nodes. If the three nodes are distributed properly, 99.9% of online data can be guaranteed. The architecture of the three nodes is shown below:

If there is only one master node, one copy node, and there is no resource to be used as the second copy node, then an arbiter node (arbiter node) can be used, no data, only for election, as shown in the following figure:

When the primary node fails, an election will be held between the two replica nodes to elect a new primary node, as follows:

In terms of replica set member attributes, there are some special notes: Priority, hidden, slaveDelay, tags, votes.

  • priority

For the copy node, this attribute can be used to increase or decrease the possibility of the node being elected as the primary node, the value range is 0-1000 (if it is arbiters, the value is only 0 or 1), the larger the data is, the more likely it is to become the primary node, if it is configured as 0, then it cannot be elected as the primary node. And you can’t initiate an election.

This property will typically be used in multiple data centers, such as a master data center, a backup data center, the main data center will be faster, if the master node hang up, we must hope that the new master node in the master data center, then we can set the backup copies of data center node priority to 0, as shown in the figure below:

  • hidden

    The hidden node synchronizes data from the master node, but is not visible to the client, and is not displayed by the db.ismaster () method in mongo shell. The hidden node must have Priority 0, that is, cannot be elected as the master node. However, if you have the right to configure the election, you can participate in the election.

    Because the hidden node is invisible to the client, it does not interact with the client and can be used to back up data or run some back-end scheduled tasks. The details are shown in the following figure. Four backup nodes synchronize data from the primary node, and one of them is the hidden node:

  • slaveDelay

    For example, if the delay is set to 1 hour and the current time is 09:52, the data on the delayed node will be synchronized to the primary node before 08:52. Note that the delay node must be hidden and Priority is 0.

    So what’s the use of this delay node? Developers with painful experience of database misoperation must know the answer, that is to prevent database misoperation, such as before updating the service, the database update script is generally executed first, if there is a problem with the script, and the operation is not done before the backup, the data may not be found back. But if you configure a delay node, then the operation is done by mistake, and the node can be saved, it can only be said that the feature is really sweet. The specific delay node is shown in the figure below:

  • tags

    Support for copy tag set members, the query data will be used, such as find a copy of the corresponding tag node, and then read the data from the node, it can be very useful, can classify the nodes according to the label, when querying data of different services client to specify the corresponding node labels, for a label to increase or decrease the number of nodes, Nor does it affect services that use other tags. The specific use of Tags will also be covered in the following sections of this article.

  • votes

    Indicates whether the node has the permission to participate in the election. A maximum of seven replica nodes can be configured to participate in the election.

Second, the construction and testing of replica sets

Install the mongo tutorial: docs.mongodb.com/manual/inst…

Let’s build a copy set of P-S-S structure (1 Primary node, 2 Secondary nodes). Roughly, the process is as follows: start the Mongod process of three different ports first, and then execute commands in mongo shell to initialize the copy set.

The command to start a single mongod instance is:

mongod --replSet rs0 --port 27017 --bind_ip localhost,<hostname(s)|ip address(es)> --dbpath /data/mongodb/rs0-0 --oplogSize 128

Parameter Description:

parameter instructions The sample
replSet Replica set name rs0
port Mongod instance port 27017
bind_ip The list of addresses to access the instance can be set to localhost or 127.0.0.1 for local access. Internal domain names are recommended for production environments Localhost
dbpath Data storage location /data/mongodb/rs0-0
oplogSize Operation log size 128

The steps are as follows:

  1. Create three directories to store the data on each node

    mkdir -p /data/mongodb/rs0-0 /data/mongodb/rs0-1 /data/mongodb/rs0-2

  2. Start three mongod processes on port 27018,27019,27020

Mongod –replSet rs0 –port 27018 –bind_ip localhost –dbpath /data/mongodb/rs0-0 –oplogSize 128

Mongod –replSet rs0 –port 27019 –bind_ip localhost –dbpath /data/mongodb/rs0-1 –oplogSize 128

Mongod –replSet rs0 –port 27020 –bind_ip localhost –dbpath /data/mongodb/rs0-2 –oplogSize 128

  1. Enter the first Mongod example using Mongo and initialize it using Rs.Initiate ()

Log in to 27018: mongo localhost:27018

Perform:

rsconf = {
    _id: "rs0".members: [{_id: 0.host: "localhost:27018"
      },
      {
        _id: 1.host: "localhost:27019"
      },
      {
        _id: 2.host: "localhost:27020"
      }
    ]
}

rs.initiate( rsconf )
Copy the code

The above has completed the construction of a copy set, execute rs.conf() in mongo shell, you can see the properties of host, arbiterOnly, hidden, priority, votes, slaveDelay and other nodes in each node, is not super simple.

Run rs.conf() and the result is displayed as follows:

rs.conf()
{
    "_id" : "rs0"."version" : 1."protocolVersion" : NumberLong(1),
    "writeConcernMajorityJournalDefault" : true."members": [{"_id" : 0."host" : "localhost:27018"."arbiterOnly" : false."buildIndexes" : true."hidden" : false."priority" : 1."tags": {},"slaveDelay" : NumberLong(0),
        "votes" : 1
      },
      {
        "_id" : 1."host" : "localhost:27019"."arbiterOnly" : false."buildIndexes" : true."hidden" : false."priority" : 1."tags": {},"slaveDelay" : NumberLong(0),
        "votes" : 1
      },
      {
        "_id" : 2."host" : "localhost:27020"."arbiterOnly" : false."buildIndexes" : true."hidden" : false."priority" : 1."tags": {},"slaveDelay" : NumberLong(0),
        "votes" : 1}]."settings" : {
      "chainingAllowed" : true."heartbeatIntervalMillis" : 2000."heartbeatTimeoutSecs" : 10."electionTimeoutMillis" : 10000."catchUpTimeoutMillis" : -1."catchUpTakeoverDelayMillis" : 30000."getLastErrorModes": {},"getLastErrorDefaults" : {
        "w" : 1."wtimeout" : 0
      },
      "replicaSetId" : ObjectId("5f957f12974186fc616688fb")}}Copy the code

Note that in mongo shell, there are RS and DB.

  • Rs.initiate (), rs.conf(), rs.reconfig(), rs.add() and other methods for operating replica sets
  • Db.ismaster (), db.collection.find(), db.collection.insert(), etc.

Let’s test Automatic Failover again

  1. You can directly stop the primary node localhost:27018 to test the failure of the primary node, and the replica node elects a new primary node, namely, Automatic Failover.

After killing primary node 27018, you can see the election part in the output log of 27019. 27019 initiates the election and becomes the primary node successfully:

2020-10-26T21:43:58.156+0800 I REPL [replexec-304] Scheduling remote Command Request for vote Request: RemoteCommand 100694 -- target:localhost:27018 db:admin cmd:{ replSetRequestVotes: 1, setName: "rs0", dryRun: false, term: 17, candidateIndex: 1, configVersion: 1, lastCommittedOp: { ts: Timestamp(1603719830, 1), t: 2020-10-26T21:43:58.156+0800 I REPL [replexec-304] Scheduling remote Command Request for vote Request: RemoteCommand 100695 -- target:localhost:27020 db:admin cmd:{ replSetRequestVotes: 1, setName: "rs0", dryRun: false, term: 17, candidateIndex: 1, configVersion: 1, lastCommittedOp: { ts: Timestamp(1603719830, 1), t: 2020-10-26T21:43:58.159+0800 I ELECTION [replexec-301] VoteRequester(term 17) received an invalid response from localhost:27018: ShutdownInProgress: In the process of shutting down; Response message: {operationTime: Timestamp(1603719830, 1), OK: 0.0, errmsg: "In the process of shutting down", code: 91, codeName: "ShutdownInProgress", $clusterTime: { clusterTime: Timestamp(1603719830, 1), signature: { hash: BinData(0, 0000000000000000000000000000000000000000), keyId: 020-10-26T21:43:58.164+0800 I ELECTION [replexec-305] VoteRequester(term 17) received a yes vote from localhost:27020; $clusterTime: {term: 17, voteGranted: true, Reason: "", OK: 1.0, $clusterTime: {clusterTime: Timestamp(1603719830, 1), signature: { hash: BinData(0, 0000000000000000000000000000000000000000), keyId: 0 } }, operationTime: Timestamp(1603719830, 1)} 2020-10-26T21:43:58.164+0800 I ELECTION [replexec-304] ELECTION Succeeded, assuming primary role in term 17Copy the code
  1. Then run rs.status() to check the current replica set, 27019 becomes the primary node, 27018 is displayed as failed health = 0
rs.status()
{
    "set" : "rs0"."date" : ISODate("The 2020-10-26 T13: heavy. 071 z"),
    "myState" : 1."heartbeatIntervalMillis" : NumberLong(2000),
    "majorityVoteCount" : 2."writeMajorityCount" : 2."members": [{"_id" : 0."name" : "localhost:27018"."ip" : "127.0.0.1"."health" : 0."state" : 8."stateStr" : "(not reachable/healthy)"."uptime" : 0."optime" : {
          "ts" : Timestamp(0.0),
          "t" : NumberLong(-1)},"optimeDurable" : {
          "ts" : Timestamp(0.0),
          "t" : NumberLong(-1)},"optimeDate" : ISODate("1970-01-01T00:00:00Z"),
        "optimeDurableDate" : ISODate("1970-01-01T00:00:00Z"),
        "lastHeartbeat" : ISODate("The 2020-10-26 T13:44:20. 202 z"),
        "lastHeartbeatRecv" : ISODate("The 2020-10-26 T13:43:57. 861 z"),
        "pingMs" : NumberLong(0),
        "lastHeartbeatMessage" : "Error connecting to localhost:27018 (127.0.0.1:27018) :: caused by :: Connection refused"."syncingTo" : ""."syncSourceHost" : ""."syncSourceId" : -1."infoMessage" : ""."configVersion" : -1
      },
      {
        "_id" : 1."name" : "localhost:27019"."ip" : "127.0.0.1"."health" : 1."state" : 1."stateStr" : "PRIMARY"."uptime" : 85318."optime" : {
          "ts" : Timestamp(1603719858.1),
          "t" : NumberLong(17)},"optimeDate" : ISODate("2020-10-26T13:44:18Z"),
        "syncingTo" : ""."syncSourceHost" : ""."syncSourceId" : -1."infoMessage" : ""."electionTime" : Timestamp(1603719838.1),
        "electionDate" : ISODate("2020-10-26T13:43:58Z"),
        "configVersion" : 1."self" : true."lastHeartbeatMessage" : ""
      },
      {
        "_id" : 2."name" : "localhost:27020"."ip" : "127.0.0.1"."health" : 1."state" : 2."stateStr" : "SECONDARY"."uptime" : 52468."optime" : {
          "ts" : Timestamp(1603719858.1),
          "t" : NumberLong(17)},"optimeDurable" : {
          "ts" : Timestamp(1603719858.1),
          "t" : NumberLong(17)},"optimeDate" : ISODate("2020-10-26T13:44:18Z"),
        "optimeDurableDate" : ISODate("2020-10-26T13:44:18Z"),
        "lastHeartbeat" : ISODate("The 2020-10-26 T13:44:20. 200 z"),
        "lastHeartbeatRecv" : ISODate("The 2020-10-26 T13: incense. 517 z"),
        "pingMs" : NumberLong(0),
        "lastHeartbeatMessage" : ""."syncingTo" : "localhost:27019"."syncSourceHost" : "localhost:27019"."syncSourceId" : 1."infoMessage" : ""."configVersion" : 1}}]Copy the code
  1. Start again 27018:

mongod --replSet rs0 --port 27018 --bind_ip localhost --dbpath /data/mongodb/rs0-0 --oplogSize 128

You can see in the log for node 27019 that 27018 has been detected and has become a replica node, as well as when you view the result through rs.status.

2020-10-26T21:52:06.871+0800 I REPL [replexec-305] Member LOCALhost :27018 is now in state SECONDARYCopy the code

Third, some characteristics of copy set write and read

Write Concern

Replica set write concern refers to that a piece of data is written. After the primary node processes data, the client can receive a data write success message only after other data copy nodes confirm the write success message.

This function is used to solve the problem that data is lost before data is synchronized to the replica node after the primary node is down.

The number of nodes can be configured. The default value {” w “: If “W” is set to 2, it indicates that not only the primary node but also one of the nodes receiving the copy returns a write success. “W” can also be set to “majority”, indicating that the majority of nodes bearing data and having voting rights return a write success.

As shown in the figure below, the p-S-S structure (one primary node, two secondary nodes) contains W in the write request: “Majority” : Data is synchronized from the master node to the first copy node, and the first copy node replies to the client only after the data is successfully written.

There are two approaches to how write concerns work in practice:

  1. Specify writeConcern parameters in the write request as follows:
db.products.insert(
    { item: "envelopes".qty : 100.type: "Clasp" },
    { writeConcern: { w: "majority" , wtimeout: 5000}})Copy the code
  1. Modify the replica set getLastErrorDefaults configuration as follows:
cfg = rs.conf()
cfg.settings.getLastErrorDefaults = { w: "majority".wtimeout: 5000 }
rs.reconfig(cfg)
Copy the code

Read preference

Read is different from write. For consistency, write can only go through the master node, but read can choose the master node or the replica node. The difference is that the master node has the latest data, and the replica node may have delays due to synchronization problems, but reading from the replica node can distribute the pressure on the master node.

Since there can be multiple nodes hosting data, how does the client choose which node to read from? With 3 conditions (Tag Sets, maxStalenessSeconds, Hedged Read) and 5 modes (primary preferred, primary preferred, secondary, secondaryPreferred, nearest)

First, let’s talk about the five modes, which are characterized by the following table:

model The characteristics of
primary All read requests are read from the primary node
primaryPreferred If the primary node is normal, all read requests are read from the primary node. If the primary node fails, all read requests are read from the qualified replica node
secondary All read requests are read from the replica node
secondaryPreferred All read requests are read from the replica node, but if the replica nodes all fail, they are read from the primary node
nearest Mainly depends on the network delay, select the node with the minimum delay, both the master node and the replica node

Then there are three conditions. The condition is that on the basis of conforming to the mode, specific nodes are deleted and selected according to the conditions

  1. Tag Sets

As the name suggests, this tags nodes, and when looking for data, you can select the corresponding node based on the label, and then look for data on that node. You can use the Mongo shell to view the tags below each node using rs.conf(). Modify or add tags as you did above to modify getLastErrorDefaults. cfg.members[n].tags = { “region”: “South”, “datacenter”: “A” }

  1. MaxStalenessSeconds (Maximum allowable synchronization delay)

As the name implies +1, this value is the ratio of the time it took the replica node to synchronize writes from the master node to the actual last write time of the master node. If the master node fails, this value is compared to the last write time in the replica set.

You are advised to set this value to avoid data synchronization on the primary node for a long time and reading old data due to network problems on some replica nodes. It is important to note that the value should be set to more than 90s, because the client is scheduled to check the synchronization delay of the replica node, so the data will not be very accurate. If the value is set to less than 90s, exceptions will be thrown.

  1. Hedged Read

This option is not supported until version 4.4 of the sharded MongoDB cluster. It means that the mongos instance route read request is sent to two matching replica set nodes at the same time, and then the result is returned to the client first.

The question is, how can such useful patterns and conditions be used in query requests?

  1. When using the Connection String URI in code to connect to a database, you can add these three parameters
parameter instructions
readPreference Mode, enumerated values are: Primary (default), primaryPreferred, Secondary, secondaryPreferred, nearest
maxStalenessSeconds The value ranges from 0 to 90. -1 indicates that there is no maximum synchronization delay
readPreferenceTags Label, if the label is {“dc”: “ny”, “rack”: “R1”}, then the URI is readPreferenceTags=dc:ny,rack: R1

For example:

mongodb://db0.example.com,db1.example.com,db2.example.com/?replicaSet=myRepl&readPreference=secondary&maxStalenessSecond s=120&readPreferenceTags=dc:ny,rack:r1

  1. In the Mogo shell, you can use cursor.readpref () or mongo.setreadpref ()

The cursor.readpref () parameters are mode, tag set, and hedge options, for example

db.collection.find({ }).readPref(
    "secondary".// mode[{"datacenter": "B"}, {},// tag set
    { enabled: true }                 // hedge options
)
Copy the code

Mongo.setreadpref () is similar, but presets the request conditions so that you don’t have to follow each request with a readPref condition.

You can simply test the functionality in a set up cluster

  1. Log in to the primary node: mongo localhost:27018

  2. Db.nums. insert({name: “num0”})

    Find (db.nums.find())

    {“_id” : ObjectId(“5f958687233b11771912ced5”), “name” : “num0”}

  3. Log in to the replica node: mongo localhost:27019

Query: db. Nums. The find ()

The default query mode is primary, so an error will be reported on the replica node as follows:

Error: error: {
  "operationTime" : Timestamp(1603788383.1),
  "ok" : 0."errmsg" : "not master and slaveOk=false"."code" : 13435."codeName" : "NotMasterNoSlaveOk"."$clusterTime" : {
    "clusterTime" : Timestamp(1603788383.1),
    "signature" : {
      "hash" : BinData(0."AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
      "keyId" : NumberLong(0)}}}Copy the code

Db.nums.find ().readpref (” secondary”)

You can query the inserted data: {“_id” : ObjectId(“5f958687233b11771912ced5”), “name” : “num0”}

conclusion

The above content is to read the official MongoDB document, and then pick some simple and important points to do the summary, if you are interested in MongoDB, it is recommended to bite the official document directly.