Using aggregation concept to guide MongoDB Schema design

Habits are powerful but often imperceptible. Often inadvertently, into the trap of habit is not aware of it.

In our project, to be able to store analysis reports and user-set report query criteria, we stored this information in MongoDB as report metadata. Metadata to store includes:

ReportCategory (ReportCategory)
Report
Report Query Condition (QeuryCondition)

A report category contains multiple reports. A report can belong to only one category. Each report provides multiple standard query criteria and user-defined query criteria.

I need to design MongoDB’s DB Schema for this metadata. The initial consideration was to define these three concepts together as a single record in a metadata table. Then it occurred to me that for a report, the query conditions of the report need to be added and deleted frequently, and it seemed that the query conditions should be separated separately. What about report categories and reports? Is it appropriate to separate the report as well? For Document databases such as MongoDB, the Embedded property of Report as ReportCategory is also feasible, at least without the data redundancy associated with relational databases. If you want to separate, you have to do an extra Link when you need to query all reports under a category.

What a mess! It seems that any design is feasible, but there is always something wrong with it.

While thinking about it, it occurred to me that for such a document-oriented NoSQL database, using aggregations to observe table records would be more appropriate. As quick and sharp as lightning, the idea came crashing into my head and ignited my design thinking.

Aggregation is not an expression of object relationships in object orientation, but rather a reflection of object boundaries in domain-driven design (DDD). Based on my past experience, I have sorted out five design principles for Aggregate design:

Aggregation, as a boundary, is primarily used to maintain business integrity by following invariants defined in business rules
Non-aggregate root entity objects that are inside aggregate boundaries and could be called separately by other callers should be separated out as separate aggregates
There should be direct or indirect reference relationship between the non-aggregate root object within the aggregate boundary and the aggregate root, and it can be referenced by the object. If you must refer to it by Id, then the referenced object is not part of the aggregation
If an object cannot exist without another object as its main object, it must be within the aggregate boundary of that main object
If an entity object may be referenced by multiple aggregations, the entity object should be considered as a separate aggregation first

These design principles are some of my thoughts in the exploration of convergence design, many times down practice, I think quite instructive value. It will not be rolled out here and will be detailed in a future article. For this example, how can we use these principles to think about the relationship between ReportCategory, Report, and QueryCondition?

Obviously, applying these principles, I think the confusion of the previous entanglement can be solved. From the perspective of business integrity, Report belongs to ReportCategory, but they do not have a strong constraint relationship, that is, there is no business Invariant. For example, ReportCategory can be an empty category without Report, or we can query all reports separately without ReportCategory. If we put the Report in the ReportCategory aggregation, the boundary protection of the aggregation becomes an unreasonable barrier because the Report may be called separately.

Therefore, we can draw the first conclusion: ReportCategory and Report should belong to two different aggregations.

Based on the fourth principle, we can ask the question: does a QueryCondition make sense when it lacks Report objects? The answer is clear: no Report, no QueryCondition. Skin without hair can be a blessing in disguise! The second conclusion comes naturally:Report and QueryCondition should belong to the same aggregation. And so the model came out:

The figure above is a domain model rather than a data model. From a domain-driven design perspective, this is the right opening position. Then, using the domain model to guide MongoDB’s Schema design, is it possible to mix the domain with technical implementation? From the perspective of design direction, the domain model should be considered first, and the technical implementation of DB should be designed to meet the domain model. Only when the domain model may hinder technical implementation, or the resulting Schema design based on the domain model does not meet performance or other quality attribute requirements, should the domain model be adjusted in reverse. For document-oriented databases like MongoDB, it is natural to use aggregation concepts to guide Schema design, which not only does not conflict with the feeling, but also makes Repository implementation easier and more natural.

In the process of project development, I preconceived the selection of technology and habitually started Schema design for MongoDB, forgetting the guiding principles of domain-driven design. Technical people tend to be complacent about technical implementation, so they ignore the driving force of domain design. Be careful!

Author: Zhang Yi
Links to this article: Zhangyi. Xyz/mongo – SCH…
Copyright Notice: All articles on this blog are licensed under a CC BY-NC-SA 3.0 license unless otherwise stated. Reprint please indicate the source!

Using aggregation concept to guide MongoDB Schema design

Related Posts

Redis publishing subscription: I think it should be the simplest and most popular article on the whole web.

Network Protocol – Network Layer (Physical layer & Data Link Layer)

CMDB event push implements synchronization of JumpServer asset groups