Author: Ren Kun

Now living in Zhuhai, I have successively worked as a full-time Oracle and MySQL DBA, and now I am mainly responsible for MySQL, MongoDB and Redis maintenance.

Source of this article: original contribution

* Produced by the open source community of ecoson, the original content is not allowed to be used without authorization, please contact the small edition and indicate the source.


1 background

There are many online PITR cases based on mongo single instance, and the official file also has recovery steps about mongo cluser, but there are almost no PITR cases based on mongo cluster. Based on the experimental environment, this paper simulated the online environment to complete a PITR for Mongo Cluster.

Distribution of the original cluster instances:

172.16.129.170 shard1 27018 config 37017 mongos 47017 172.16.129.171 shard1 27017 shard2 27018 config 37017 mongos 47017 172.16.129.171 shard1 27017 shard2 27018 config 37017 mongos 47017 172.16.129.171 Mongos 47017 172.16.129.172 shard1 27017 shard2 27018 config 37017 mongos 47017 172.16.129.172 shard1 27017 shard2 27018 config

Considering that the actual data in the online environment will be relatively large, we just restore the shard to a single instance, which can be used for the development of query data. Instance distribution after recovery:

172.16.129.173 shard1 27017 shard2 27018 config 37017 mongos 47017
172.16.129.174 config 37017
172.16.129.175 config 37017

Percona 4.2.13 (mongo version 4.2.13) Deploy a hot backup script for each shard instance and config server. Deploy oplog timing backup scripts for each shard instance and config server simultaneously. Build the test data, log in to the cluster via mongos, create a hash sharding table and insert 10 pieces of data:

use admin
db.runCommand({"enablesharding":"renkun"})
sh.shardCollection("renkun.user", { id: "hashed" } )
use renkun
var tmp = [];
for(var i =0; i<10; i++){
tmp.push({ 'id':i, "name":"Kun " + i});
}
db.user.insertMany(tmp);

Physical standby is performed on SHARD1, SHARD2, and CONFIG SERVER. Continue to insert data:

use renkun
var tmp = [];
for(var i =10; i<20; i++){
tmp.push({ 'id':i, "name":"Kun " + i});
}
db.user.insertMany(tmp);

Take oplog backup of shard, shard2, and config server. The oplog backup file contains all the above operations. At this point, the user table has 20 entries, and the IDs are incremented from 0 to 20. Our requirement is to restore the cluster to the snapshot point when Max (id)=15.

The steps given by the official file are to restore the config server, shard, and mongo in turn, but there is no rollforward oplog operation on the official file.

In our case, we need to first parse the shard oplog file to find the time point to restore, so adjust the order to restore the shard first and then restore the config server.

Restore shard singleton instance

The physical backup of the shard and oplog are transferred to the target machine, and the physical backup is extracted directly into the data directory to start.

2.1 Confirm the snapshot point to be restored

Do the following for the oplog backups of the two shards, respectively

bsondump oplog.rs.bson > oplog.rs.json more oplog.rs.json | egrep "\"op\"\:\"\i\",\"ns\":\"renkun\.user\"" | grep "\"Kun 15 \ ""

Looked up on the oplog backup of shard2

{"ts":{"$timestamp":{"t":1623408268,"i":6}},"t":{"$numberLong":"1"},"h": {"$numberLong":"0"},"v":{"$numberInt":"2"},"op":"i","ns":"renkun.user","ui":{"$binary": {"base64":"uhlG0e4sRB+RUfFOzpMCEQ==","subType":"04"}},"wall":{"$date": {"$numberLong":"1623408268694"}},"o":{"_id":{"$oid":"60c33e8c1c3edd59f25eecb5"},"id": {" $numberDouble ":" 15.0 "}, "name" : "Kun 15}}"

If id=15, the timestamp is 1623408268:6, but the –oplogLimit parameter of mongoRestore is open, that is, it does not replay the oplog entry with the specified timestamp. So to get back to this point in time you have to add one more bit to it, so it becomes 1623408268:7.

2.2 Create a temporary account

Log in to the shard instance as root and create a user with an __system role named internal_restore. This user is not only used to roll forward oplog, but also to modify admin.system.version and delete local db, which the root account does not have permission to do by default.

use admin
db.createUser( { user: "internal_restore", pwd: "internal_restore", roles: [
"__system" ] })

Log in to both shard instances with internal_restore and delete the local db.

use local
db.dropDatabase()

2.3 Roll forward oplog to the specified time point

‐‐port 27017 ‐ oplogReplay ‐oplogLimit ‐h 127.0.0.1 ‐uinternal_restore ‐p"internal_restore" ‐port 27017 ‐ oplogReplay ‐oplogLimit "1623408268:7" ‐ ‐ authenticationDatabase admin/data/backup / 202106111849 _27017 / local/oplog. Rs. Bson mongorestore ‐ h ‐port 27018 ‐ oplogReplay ‐oplogLimit "1623408268:7" ‐ uploglimit "1623408268:7" ‐ ‐ authenticationDatabase admin/data/backup / 202106111850 _27018 / local/oplog. Rs. Bson

Log in to shard1 and shard2 respectively to view the data:

‐shard1 27017 > db.user.find(). Sort ({"id":1}) {" _id" : ObjectId("60c330322424943565780766"), "id": 3, "name" : "Kun 3" } { "_id" : ObjectId("60c330322424943565780769"), "id" : 6, "name" : "Kun 6" } { "_id" : ObjectId("60c33032242494356578076b"), "id" : 8, "name" : "Kun 8" } { "_id" : ObjectId("60c33e8c1c3edd59f25eecb1"), "id" : 11, "name" : "Kun 11" } { "_id" : ObjectId("60c33e8c1c3edd59f25eecb2"), "id" : 12, "name" : ‐shard2 27018 > db.user.find(). Sort ({"id":1}) {" _id" : ObjectId("60c330322424943565780763"), "id": 0, "name" : "Kun 0" } { "_id" : ObjectId("60c330322424943565780764"), "id" : 1, "name" : "Kun 1" } { "_id" : ObjectId("60c330322424943565780765"), "id" : 2, "name" : "Kun 2" } { "_id" : ObjectId("60c330322424943565780767"), "id" : 4, "name" : "Kun 4" } { "_id" : ObjectId("60c330322424943565780768"), "id" : 5, "name" : "Kun 5" } { "_id" : ObjectId("60c33032242494356578076a"), "id" : 7, "name" : "Kun 7" } { "_id" : ObjectId("60c33032242494356578076c"), "id" : 9, "name" : "Kun 9" } { "_id" : ObjectId("60c33e8c1c3edd59f25eecb0"), "id" : 10, "name" : "Kun 10" } { "_id" : ObjectId("60c33e8c1c3edd59f25eecb3"), "id" : 13, "name" : "Kun 13" } { "_id" : ObjectId("60c33e8c1c3edd59f25eecb4"), "id" : 14, "name" : "Kun 14" } { "_id" : ObjectId("60c33e8c1c3edd59f25eecb5"), "id" : 15, "name" : "Kun 15" }

The data has been restored. There are 16 pieces in total. Max (id)=15

2.4 modify the admin. System. Version

The shard has changed from a 3-node replica set to a single instance, and the host IP has also changed, so the corresponding metadata information needs to be modified. Log in to both shard instances as user INTERNAL_RESTORE and execute each:

use admin
db.system.version.deleteOne( { _id: "minOpTimeRecovery" } )
db.system.version.find( {"_id" : "shardIdentity" } )
db.system.version.updateOne(
 { "_id" : "shardIdentity" },
{ $set :
{ "configsvrConnectionString" :
"configdb/172.16.129.173:37017,172.16.129.174:37017,172.16.129.175:37017"}
}
)

Before being deleted and modified, these two records store the old config server information respectively, as follows:

{ "_id" : "shardIdentity", "shardName" : "repl", "clusterId" : ObjectId("60c2c9b44497a4f2e02510fd"), "configsvrConnectionString" : "Configdb / 172.16.129.170:37017172.16. 129.171:37017172.16. 129.172:37017"} {" _id ": "minOpTimeRecovery", "configsvrConnectionString" : "Configdb / 172.16.129.170:37017172.16. 129.171:37017172.16. 129.172:37017", "minOpTime" : {" ts ": Timestamp(1623383008, 6), "t" : NumberLong(1) }, "minOpTimeUpdaters" : 0, "shardName" : "repl" }

At this point, the two shard servers have been restored, and the next step is to restore the config server.

Restore config server

Transfer the physical backup to the target machine and unzip it directly to the data directory for use. First, start the config server as a single instance. After logging in with root account, create a user with __system role (same as above).

3.1 roll forward oplog

‐port ‐ oplogReplay ‐oplogLimit "1623408268:7" ‐ ‐ authenticationDatabase admin/data/backup / 202106111850 _37017 / local/oplog. Rs. Bson

3.2 Modify metadata

Log on to the instance as interal_restore and modify the shard metadata information for the cluster as follows:

use local db.dropDatabase() use config db.shards.find() db.shards.updateOne({ "_id" : "repl" }, { $set : { "host" : "172.16.129.173:27017"}}) db. Shards. UpdateOne ({" _id ":" repl2 "}, {$set: {" host ": "172.16.129.173:27018"}}) db. Shards. The find ()

3.3 Start the cluster

Close the Config Server and start in cluster mode. The corresponding configuration file opens the following parameters:

sharding:
clusterRole: configsvr
replication:
oplogSizeMB: 10240
replSetName: configdb

Execute after login

Rs. Initiate (rs). The add (" 172.16.129.174:37017) rs. Add (" 172.16.129.175:37017)"

At this time, the config server becomes a 3-node cluster and the recovery is completed.

4 configuration mongos

You can just copy the mongos configuration file from the original environment by modifying the sharding and bindIp parameters

Sharding: configDB: "configDB / 172.16.129.173:37017172.16. 129.174:37017172.16. 129.175:37017" net: port: 47017 bindIp: 127.0.0.1, 172.16.129.173

After startup, log in mongos to query user table, a total of 16 records, Max (id)=15, meet the expected results.

mongos> use renkun
switched to db renkun
mongos> db.user.find().sort({"id":1})
{ "_id" : ObjectId("60c330322424943565780763"), "id" : 0, "name" : "Kun 0" }
{ "_id" : ObjectId("60c330322424943565780764"), "id" : 1, "name" : "Kun 1" }
{ "_id" : ObjectId("60c330322424943565780765"), "id" : 2, "name" : "Kun 2" }
{ "_id" : ObjectId("60c330322424943565780766"), "id" : 3, "name" : "Kun 3" }
{ "_id" : ObjectId("60c330322424943565780767"), "id" : 4, "name" : "Kun 4" }
{ "_id" : ObjectId("60c330322424943565780768"), "id" : 5, "name" : "Kun 5" }
{ "_id" : ObjectId("60c330322424943565780769"), "id" : 6, "name" : "Kun 6" }
{ "_id" : ObjectId("60c33032242494356578076a"), "id" : 7, "name" : "Kun 7" }
{ "_id" : ObjectId("60c33032242494356578076b"), "id" : 8, "name" : "Kun 8" }
{ "_id" : ObjectId("60c33032242494356578076c"), "id" : 9, "name" : "Kun 9" }
{ "_id" : ObjectId("60c33e8c1c3edd59f25eecb0"), "id" : 10, "name" : "Kun 10" }
{ "_id" : ObjectId("60c33e8c1c3edd59f25eecb1"), "id" : 11, "name" : "Kun 11" }
{ "_id" : ObjectId("60c33e8c1c3edd59f25eecb2"), "id" : 12, "name" : "Kun 12" }
{ "_id" : ObjectId("60c33e8c1c3edd59f25eecb3"), "id" : 13, "name" : "Kun 13" }
{ "_id" : ObjectId("60c33e8c1c3edd59f25eecb4"), "id" : 14, "name" : "Kun 14" }
{ "_id" : ObjectId("60c33e8c1c3edd59f25eecb5"), "id" : 15, "name" : "Kun 15" }

At this point, a complete Mongo Cluster PITR operation was officially completed.

5 concludes

In mongo 4.x, transactions are introduced. In particular, 4.2 supports cross-shard transactions. According to the official document, it is necessary to use the specified tool to recover the data involved in the transaction.