Mongodb architecture Design

Reference links: learnmongodbthehardway.com/schema/sche…

An overview of

Mongodb is a document database. Because it does not belong to a relational database, it does not have to comply with the three paradigms, and there is no Join keyword to support table Join. Therefore, the table structure design of Mongodb is very different from Oracle and MySQL. The following are examples for several different table design structures:

1 to 1 relationship model

In relational databases, the 1-to-1 relationship model is usually handled in the form of foreign keys. Let’s take the example of writer and address. Assuming the relationship between the two entities is 1 to 1, we might build a table like this

However, for the sake of convenience, in fact, when we design tables, we don’t strictly follow the three paradigms, we do some redundant data, and the actual situation may be like this table

So, we return to directing, in this no non-relational database, there is no standard of foreign keys (although we can establishing a connection manually, but this kind of table relationship fields can only exist between the application level, the database itself and put forward the concept of the foreign key constraints) no, we can how to establish the relationship between the table and process the table?

  1. Establish a connection

    This can be understood as a foreign key, in one of the tables to create the peer ID field

    User information document design {_id: 1, name:"Peter Wilkinson", age: 27} Document design that preserves address information for foreign keys {user_id: 1, street:"100 some road",
       city: "Nevermore"
     }Copy the code
  1. The embedded document

    Store the address information file directly as a field in the user information file

     {
        name: "Peter Wilkinson",
       age: 27,
       address: {
         street: "100 some road",
         city: "Nevermore"}}Copy the code

    The advantage of embedding documents directly is that we can use the user information document and the corresponding address information document for a specific user in a single read operation. Of course, this is when the user information and address information are strongly related, so direct embedding makes sense.

    The official document recommends that the 1-to-1 data model be embedded as much as possible to improve the efficiency of read operations and obtain document information fasterCopy the code

A pair of many relationship model

For a 1-to-many relationship model, we can simply take blogs and corresponding comment information as examples

The corresponding Mongodb table model is as follows

Document design of blog information {title:"An awesome blog",
  url: "http://awesomeblog.com",
  text: "This is an awesome blog we have just started"} Document design for comment information {name:"Peter Critic",
  created_on: ISODate("2014-01-01T10:01:22Z"),
  comment: "Awesome blog post"
}
{
  name: "John Page",
  created_on: ISODate("2014-01-01T11:01:22Z"),
  comment: "Not so awesome blog"
}Copy the code

In a relational database, we typically create two separate tables: a Blog table and a Comments table (from table, with blog_ID foreign key), and then associate the two tables with a join operation

However, since there is no Join keyword in Mongodb, we can obtain the following three solutions according to the characteristics of Mongodb:

  1. embedded

    Blog document design with comment information embedded {title:"An awesome blog",
       url: "http://awesomeblog.com",
       text: "This is an awesome blog we have just started",
       comments: [{
         name: "Peter Critic",
         created_on: ISODate("2014-01-01T10:01:22Z"),
         comment: "Awesome blog post"
       }, {
         name: "John Page",
         created_on: ISODate("2014-01-01T11:01:22Z"),
         comment: "Not so awesome blog"}}]Copy the code

    The advantage of the above table design is that we can directly retrieve the comment information under the specified blog. When the user adds a comment, we can directly insert a new value in the Comments array field under the blog document.

    But there are at least three potential problems with this table design that need to be noted:

    1. The array of comments on a blog may grow beyond the maximum document size limit of 16MB
    2. The second problem is related to write performance, as a result of comments are continuously added to the blog in the document, when there is a new collection of blog documents into mongo will become more difficult position to the original blog document location, in addition, the database also needs to open new extra memory space and copy the original blog document, update all index, This requires more IO interaction, which can affect write performance

It is important to note that only high write traffic is likely to affect write performance, and not so much for programs with low write traffic. It depends. 3. The third problem is that when you try to paginate comments, you find that with regular find queries, you have to read the entire document (including all comments) and then paginate comments in the applicationCopy the code
  1. The connection

    The second approach is to associate documents by creating foreign key-like ids

    {_id: 1, title:"An awesome blog",
       url: "http://awesomeblog.com",
       text: "This is an awesome blog we have just started"} Document design for comments {blog_entry_id: 1, name:"Peter Critic",
       created_on: ISODate("2014-01-01T10:01:22Z"),
       comment: "Awesome blog post"
     }
     {
       blog_entry_id: 1,
       name: "John Page",
       created_on: ISODate("2014-01-01T11:01:22Z"),
       comment: "Not so awesome blog"
     }Copy the code
One advantage of this design model is that it does not affect the original blog document as comment information grows, thus avoiding the situation where individual documents exceed 16MB. It also makes it easier to return paginated comments. The downside, however, is that if we have a very large number of comments on a blog document (say, 1000), then fetching all the comments will cause a lot of reads to the databaseCopy the code
  1. block

    The third approach is a mixture of the two. In theory, we try to balance the embedding strategy with the linking pattern. For example, we might split all comments into up to 50 comments, depending on the situation

    {_id: 1, title:"An awesome blog",
       url: "http://awesomeblog.com",
       text: "This is an awesome blog we have just started"} blog_entry_id: 1, page: 1, count: 50, comments: [{name:"Peter Critic",
         created_on: ISODate("2014-01-01T10:01:22Z"),
         comment: "Awesome blog post"},... ]  } { blog_entry_id: 1, page: 2, count: 1, comments: [{ name:"John Page",
         created_on: ISODate("2014-01-01T11:01:22Z"),
         comment: "Not so awesome blog"}}]Copy the code

    The biggest benefit of this design is that we can capture 50 comments in a single read operation, which makes it easy to pagination comments

    When to use the chunking strategy? This strategy can speed up document retrieval when you can split documents into batches a typical example is comment paging by hour, day, or quantityCopy the code

Many-to-many relationship model

The many-to-many relationship model, let’s take the author and the book he wrote as an example


  1. Two-way nested

    In MongoDB we can use bidirectional nesting to add the foreign keys of two documents to each other through array fields

    {_id: 1, name:"Peter Standford",
       books: [1, 2]
     }
     {
       _id: 2,
       name: "Georg Peterson", books: [2]} {_id: 1, title:"A tale of two people",
       categories: ["drama"],
       authors: [1, 2]
     }
     {
       _id: 2,
       title: "A tale of two space ships",
       categories: ["scifi"],
       authors: [1]
     }Copy the code
Var db = db.getsisterdb (var db = db."library");
    var booksCollection = db.books;
    var authorsCollection = db.authors;

    var author = authorsCollection.findOne({name: "Peter Standford"});
    var books = booksCollection.find({_id: {$in: author.books}}).toArray(); Var db = db.getsisterdb ("library");
    var booksCollection = db.books;
    var authorsCollection = db.authors;

    var book = booksCollection.findOne({title: "A tale of two space ships"});
    var authors = authorsCollection.find({_id: {$in: book.authors}}).toArray();Copy the code
  1. One-way nested

    The one-way nesting strategy is used to optimize read performance in the many-to-many relationship model by shifting two-way references to one-way references similar to one-to-many. This strategy has specific scenarios. For example, in our case, the author information document is designed to embed the book information as an array field in the author document. However, in reality, the number of books will increase rapidly, and it is likely to exceed the 16MB limit for a single document.

    In this case, we can see that the number of books is growing rapidly, but the book categories are really fixed and usually don’t change much, so we design the book category as a separate document, and then the author information as the book information embedded array reference, and the book category as the embedded array reference. The classification of books with relatively little change is taken as the primary table, the information of books with relatively large change is taken as the secondary table, and the id of the primary table is stored as the foreign key.

    {_id: 1, name:"drama"} use the foreign key to associate the corresponding book information document design {_id: 1, title:"A tale of two people",
       categories: [1],
       authors: [1, 2]
     }Copy the code

    The corresponding query statement is as follows

    Var db = db.getsisterdb ("library");
     var booksCol = db.books;
     var categoriesCol = db.categories;
    
     var book = booksCol.findOne({title: "A tale of two space ships"});
     var categories = categoriesCol.find({_id: {$in: book.categories}}).toArray();    Copy the code
Var db = db.getsisterdb ("library");
    var booksCollection = db.books;
    var categoriesCollection = db.categories;

    var category = categoriesCollection.findOne({name: "drama"}); var books = booksCollection.find({categories: category.id}).toArray(); When a many-to-many model has a very large number of models (say, up to 500,000) and a very small number of models (say, up to 3), as in this case, it is possible that only about 3 book categories can correspond to up to 500,000 books. In this case, the one-way nesting strategy should be used. If both book levels are small (perhaps five books at most), a bidirectional nesting strategy may be better.Copy the code