How can WiredTiger restore MongoDB data?

Previously, I wrote an article about how to restore data when MongoDB cannot be started. I introduced several methods to restore data from MongoDB nodes that cannot be started, mainly including:

If multiple replica sets are configured, recover from other nodes (it is strongly recommended to save at least 2 copies of important data)
Restoring from the most recent backup set, typically important production data, requires continuous full/incremental backup of the data
The REPAIR mode is restored. If the metadata is faulty, the repair mode cannot work.
WiredTiger comes with tools to analyze and use metadata damage to recover as much data as possible.
Recovery data can be extracted by analyzing BSON data, but this is only valid for mmAPV1 engines without compression. By default, WiredTiger enables SNappy compression, and data cannot be extracted by analyzing BSON.

Methods 1-3 are relatively simple, while the fourth method is not familiar with the principle of WiredTiger engine and may not be able to start at all. This paper will introduce how to extract effective data through WiredTiger tools in detail.

Background knowledge

Each MongoDB collection/index corresponds to a WiredTiger Table; The mapping between the set name and the Table name is stored in the metadata, and the mapping can be obtained only through the metadata.
- The data in the collection corresponds to the collecton-uniqueid-hash.wt file
- The indexed data corresponds to index-uniqueid-hash.wt
The metadata of all MongoDB collections is stored in a special WiredTiger Table named _mDB_Catalog. As you can see, MongoDB metadata is just plain data for WiredTiger.
As long as the single table data is valid through WiredTigerwtTools can extract it. Default MongoDB source code will not compile wt tools, you canDownload WiredTiger source code to compileThe following introduction assumes that you have compiled the WT tool.

Recovering a single collection

If your database has only a small number of collections, depending on the size of the collection, you can easily determine how the collection name corresponds to the WiredTiger file name, for example

Somedb. Collection ===> somedb/collection-10--6822964274931136278 (no somedb/ prefix if no directoryPerDB option is specifiedCopy the code

As long as the data of this collection is intact and not damaged, we can restore it through the following steps

Step1: dump the collection data to a file using the wt tool

 ./wt -v -h some_db_home -C "extensions=[./ext/compressors/snappy/.libs/libwiredtiger_snappy.so]"  -R dump -f collection.dump  somedb/collection-10--6822964274931136278 
Copy the code

Dump is exported to collection.dump. If this step fails, the file cannot be parsed by WiredTiger and cannot be recovered.

Step2: Create a temporary collection on the new instance

mkdir some_dest_db_home mongod --dbpath some_dest_db_home --port some_port mongo --port some_port > use somedb > Db.createcollection ("collection") // Create temporary collection > db.collection.stats().uri // Check the WiredTiger table name corresponding to the collectionCopy the code

Assume that after a temporary collection is created, the mapping between the collection name on the new temporary instance and the WiredTiger table name is as follows

Somedb. Collection ===> somedb/collection-2--6822964274931136278 (no somedb/ prefix if no directoryPerDB option is specifiedCopy the code

Step3: Load the dumped data to a temporary collection

Stop the temporary instance and load the dump data from Step1 into the temporary collection

./wt -v -h some_dest_db_home -C "extensions=[./ext/compressors/snappy/.libs/libwiredtiger_snappy.so]" -R load -f  collection.dump -r somedb/collection-2--6822964274931136278
Copy the code

In this case, when accessing the target instance of somedb.collection, the data accessed is the data from the “corrupted source instance” of somedb.collection, but the id index and statistical metadata of this instance cannot be matched, but this does not affect the data access of the full table scan.

Step4: Mongodump/Mongorestore temporary collection, modify data statistics and index information

By backing up somedb.collection from the target instance with Mongodump, Mongodump only triggers sequential access to the collection data.

Then re-import through MongoRestore, and the restored data is the target data for recovery.

Recover a large number of collections

If a broken MongoDB has a large number of collections, restoring them one by one will take years. In order to automate, the key problem to be solved is to determine the mapping relationship between the MongoDB collection name and the WiredTiger table name, here only need to modify the MongoDB source, you can let the damaged Mongod in repair mode to output the mapping relationship

diff --git a/src/mongo/db/storage/kv/kv_database_catalog_entry.cpp b/src/mongo/db/storage/kv/kv_database_catalog_entry.cpp index 91afa40026.. 523cb1fa95 100644 --- a/src/mongo/db/storage/kv/kv_database_catalog_entry.cpp +++ B/SRC/mongo/db/storage/kv/kv_database_catalog_entry CPP @ @ + 260-260, 6, 8 @ @ void KVDatabaseCatalogEntry::initCollection(OperationContext* opCtx, const std::string ident = _engine->getCatalog()->getCollectionIdent(ns); + log() << "metadata mapping " << ns << " " << ident; + RecordStore* rs; if (forRepair) { // Using a NULL rs since we don't want to open this record store before it has beenCopy the code

Step1: Get the mapping between the MongoDB collection name of the corrupted instance and the WiredTiger table name, for example

somedb.collection1   collection-2--4775156767705741267
somedb.collection2   collection-4--4775156767705741267
.... 
Copy the code

Step2: Dump the set data one by one according to the above mapping

It is easy to automate this by writing a shell script that exports the same file name as the collection name

Step3: create the target instance and pre-create all the collections

Write a JS script, through mongo shell batch creation

Step4: obtain the mapping between the MongoDB collection name of the target instance and the WiredTiger table name, for example

somedb.collection1   collection-6--4335156767705741253
somedb.collection2   collection-8--4335156767705741253
.... 
Copy the code

Step5: load dump data to the target instance one by one

Similar to Step2, you need to write a shell script

Step6: Mongodump /mongorestore the whole target instance (or part of DB)

Modify ID indexes, statistics, etc

Wish you success ^_^ _ ^_^ _^