【 Technical summary 】 To understand the design and implementation of MongoDB

MongoDB

MongoDB is a type of NoSQL that is document-oriented storage. Why to use MongoDB, it starts with big data, one of the classic problems is to capture data from the Internet.

From the Internet we can grab a large amount of data, so it is faced with storage, update, search, error handling and other problems. In a nutshell:

1. how to save, update and find?

2. how to deal with failure?

3. how to deal with large data?

1. how to save, update and find

1) save

The first is how to preserve the vast amounts of data scraped from the Internet. This involves the storage format of the data: JSON. It is a lightweight data exchange format built on a collection of name/value pairs. It is easy to read, small in size, and fast.

MongoDB’s data is stored in Document, a JSON-like data structure consisting of name/value pairs. In other words, the data block is saved in JSON form called document. Many documents together are called collections, which are similar to MySQL tables.

Using JSON name/value pairs, document, and collection, a huge amount of data is saved.

1) update

The data captured from the Internet will contain many attributes, such as URL, time, author, title and content. And how do we add new attributes when we’re done saving the data? So we grab the URL, we grab the content, we store it, and then we find out we need to grab the title and save it. If the previous blocks of data are stored consecutively, there is no room to add this new attribute. An immediate idea is to delete the original block of data to be updated from its original location, add new attributes and then add them to the end.

However, this approach is also problematic in mongoDB: fragmentation occurs as data is moved from its original location during addition, leaving empty Spaces. To solve this problem, reserve space. After storing a block, leave a bit of padding space so that when adding a new property, you can add it directly to the padding without moving the block. The Padding size is a tradeoff. If it’s too small, it doesn’t work. If it’s too large, it wastes space.

So how do I size the padding? One way to do this is to leave 10% of the document, so the bigger the document, the bigger the padding. In addition, the percentage increases once a block moves, from, say, 10% to 15%, and again to 20%. This approach is similar to the connection retry algorithm in TCP/IP.

(See link: Connection Retry Algorithm (for TCP/IP connections).)

In mongoDB, Documenet typically stores less than 32K on disk. So when designing the padding, choose to create an exponent of 2 less than it, such as Documnet itself 28K, then apply a 32K space (the padding is 4K). So the block is always going to be an exponent of two. The advantage of this is that if the block is removed, leaving a full space of two exponentials (say 32K), the new block (say 10K) can be filled in directly. Disks are cut to regular lengths, greatly reducing fragmentation. Also, when the exponent of 2 is used as the addressing space, moving the pointer can be bitwise operated more quickly. This strategy is used for space allocation in MMAPv1 (a storage engine used by mongoDB).

In addition to adding new attributes discussed above, another option is to update the data itself, such as changing the ID from 123 to 1234. If we’re in JSON, we have to move the data around, because instead of having three characters, we now have four. An alternative is to store the id as an int, so that 123 and 1234 are in the 32 bits range and can be modified directly. The format for storing data is BSON. BSON, used in mongoDB, is a data storage format based on JSON format. One of the benefits of BSON is the inclusion of data types.

3) find

In addition to save and update, another important operation is find. First, let’s discuss the basic find: Scan. A Scan is what we know as a walk, which is to Scan data one by one. For example, to find a URL, we go block by block, data by data. But if we iterate directly, it’s inefficient, because we scan a lot of useless information, like content. To skip things that don’t need to be scanned, we can store the length of the data, and then we can use length to figure out the location of the next URL, so we can scan only the URL and skip content and other useless information. This is the second benefit of BSON, and a major improvement over JSON: It stores the length of each element of JSON in the element header, so that once the element length is read, it can be read directly to the specified point.

How do you make it faster? MongoDB provides multiple index types, such as B Tree. A B tree is similar to a binary tree, but can have multiple branches instead of just two. This reduces tree depth and also reduces disk I/O reads and writes.

For data, we have a better approach, but there is also disk gap to consider. We want the data to be stored contiguously on disk. To do this, you can request fixed space in disk. So how much is the application? We can use Double here. The idea is to apply for a small space first, if not enough, then apply for a double space, and continue. For example, first apply 64MB of space, if not enough, then apply 128MB of space, and then apply 256MB of space, until 2G stops.

2. how to deal with failure

Cloud computing, “error is normal”, because the cardinality is large. With a large amount of data, even if the probability of each piece of data failing is small, overall, failure is common. MongoDB may face failures such as Diskful, power Off, and disk failure. Here we focus on disk failure.

Before we talk about failure, let’s go back to storing data. In addition to disk, we have memory. Memory reads faster than disk.

1) How to deal with the disk failure?

For example, if A=3 is stored in both disk and memory, we want to change A to 5. We need to modify both disk and memory data. But this is slow because we’re dealing with reads and writes to disk.

Solution: Rewrite A in memory as 5.

New problem: If the machine crashes, the 5 written in A is gone.

Solution: Write log/journal and save the log to disk.

Although logs are also written to disk, writing logs to disk is faster than storing data to a random location in Disk. This is because logs are written in sequence, and writing data to disk takes much longer to move to new locations. There is also a tricky method that uses two disks, one for data and one for logs.

At this point we have another problem: how to write log? There are two types of logs: Behavior log and binary log. For example, let’s say I want to change A=3 to A=5. Behavior log is written to record all information: time, update, A, 3, 5. The binary log is relatively simple and records location and updated data. The first method is used in MongoDB for reasons explained below.

As mentioned earlier, the machine can crash at any time, and in order to ensure that the data can be read, we need a backup. This way, if a machine goes down, you can still use a backup. But a new problem arises, how to solve the data synchronization?

2) how to sync the primary and the secondary?

To synchronize data from the primary (P) machine with data from the secondary (S) machine, P needs to pass the log to S, which changes the data according to the log. This is why MongoDB uses behavior log, because in binary log, address is local, so the log in P has the address in P, even if it is sent to S, S still cannot find the data.

3. how to deal large data

1) how to save 100 TB of documents?

Today’s mainstream computer hardware is cheap and scalable, so large amounts of data can be stored (say, 100 terabytes) on different machines in a cluster.

In MongoDB, sharding is used to store data on different machines. Each shard is an independent database, and many shards can form a database. For example, a 1 TB collection can be divided into 4 shards, each storing 256 GB. If you split it into 40 shards, each shard only needs to manage 25 GB of data.

2) how to save document of 100TB?

If a document is 100 terabytes, how do you store it? We can divide 100 terabytes into smaller chunks. Split into 255K pieces. Why not use 256K? This is because we need to store metadata, and if we use 256K, there is no room to store metadata.

As you can see from the previous introductions, each data structure or technology has its own reason. Just like the emergence of MongoDB, because of the increasing amount of data today, traditional SQL has its limitations in handling massive data. To deal with new problems, MongoDB has grown.

【 Technical summary 】 To understand the design and implementation of MongoDB

Related Posts

Multithreaded source code learning -AQS

Java Intern Interview Review (8) : Volatile learning

[Redis LABS] Performance Test notes – Verify Redis network bandwidth bottlenecks