Ali Hema field drives design practices

preface

Design is a double-edged sword, there is no best, no better, but all roads lead to Hangzhou. Not designing at the same time and over designing are both problematic, and just the right design is the ultimate pursuit.

DDD (domain-driven Design) is a genre, far from overwhelming, and far from perfect. What I would like to share with you is whether we care about design itself, no matter what school of design, design is good.

From the code I saw, most of the internal code of Ali Group does not belong to DDD type, there is not much design, more like “noodle code”, from the end of a line to the database to complete an operation, only some design focused on the database. We rely on strong testing to ensure the external quality of the software (hats off to the helpless and miserable testers), while internal quality is often overlooked during tight project cycles and trapped in day-to-day technical liabilities.

I always wanted to write something to arouse everyone’s design awareness, but I didn’t know what to write. I moved to Hema last year and had more opportunities to write code and build a system from scratch. Hema is different from most businesses of the Group. Hema’s business is more oriented to the B-end, from supply chain to distribution chain, with strong integrity and complex relationship. No one can understand what is happening unless it is sorted out clearly. So design is very important here. If we don’t design the code, it will die tomorrow. No matter how long we stay in Hema, we can’t dig holes for our future brothers. In the module I was in charge of, we completely applied the DDD method to complete the whole system, including our own thinking and changes. Here I want to share with you, others can attack jade, you can learn.

Discussion on domain Model

1. Domain model design: database based vs. object based

We usually approach design from two dimensions:

Data Modeling: Abstracting system relationships through Data, also known as database design
Most architects start designing software systems with Data Modeling, while a few start designing software systems with Object Modeling. These two approaches to modeling are not in conflict and both are important, but the direction in which you start your design can make a big difference in the final shape of your system.

Data Model

Domain model (in this case, data model) is not a strange term for all software practitioners. The inherent quality of a software product may be determined by the clarity of the domain model. A good domain model can make the product structure clear, easier to modify, and cheaper to evolve.

In a development team, the architect is important. He determines the structure of the software, which determines its future readability, extensibility, and evolvability. Typically, architects design domain models, and developers develop based on the domain model. “Domain model” is a fashionable term. If we go back a decade or so, this model would be called “data dictionary”. In plain English, domain model is database design.

Architects constantly evolve and update this data dictionary during requirements discussions, and some designers write these dictionaries into SQL statements that form the development history of the product/project database, much like human embryo development: One cell (one table), many cells (many tables), a tail (wrong design), a tail (updated design), and finally a whoop.

In a traditional project, the architect hands the developer a thick outline design document, which contains nothing but text and a domain-specific database table design. Database design is fundamental, and all development revolves around this data dictionary, forming an architecture diagram similar to the following:

Manage most of the logic in the Service layer through our favorite manager, the POJOs, which are constantly transformed and combined as data in the hand of the Manager (hand of God). The Service layer in this case is a huge processing plant (a heavy layer) that completes the business logic around the DNA of the database.

As an inappropriate example, if we had two tables father and son, the resulting POJO would be:

__Mon Jan 22 2018 14:54:23 GMT+0800 (CST)____Mon Jan 22 2018 14:54:23 GMT+0800 (CST)__public class Father{... } public class Son{ private String fatherId; Public String getFatherId(){return fatherId; }... }__Mon Jan 22 2018 14:54:23 GMT+0800 (CST)____Mon Jan 22 2018 14:54:23 GMT+0800 (CST)__Copy the code

At this time, the son made a mistake, and the father slapped his son in the face. The father’s hand hurt and the son’s face hurt. Managers usually do this:

__Mon Jan 22 2018 14:54:23 GMT+0800 (CST)____Mon Jan 22 2018 14:54:23 GMT+0800 (CST)__public class SomeManager{ public Void fatherSlapSon(Father Father, Son Son){Father.setpainonhand (); son.setPainOnFace(); / / assuming painOnHand, __Mon Jan 22 2018 14:54:23 GMT+0800 (CST)____Mon Jan 22 2018 14:54:23 GMT+0800 (CST)__Copy the code

Here, the manager plays the role of God, slapping his old man for help.

Object Model

In 2004, Eric Evans released “Domain-Driven Design – Complexity in the Heart of Software” (Evans DDD), I recommend this book for its groundbreaking theory of domain drive.

When talking about DDD, I often make an assumption: if your machine has unlimited memory and never crashes, we don’t need persistent data, that is, we don’t need a database, how would you design your software? This is what we call Persistence Ignorance: Persistence is not about design.

Without the database, the domain model will be designed based on the program itself, and students who love design patterns can show their skills here. In procedural, functional and object oriented programming languages, object orientation is undoubtedly the best way of domain modeling.

Class and table are somewhat similar, but many people believe that table and class are corresponding, row and object are corresponding. I personally strongly disagree with such equivalence, and such recognition directly leads to meaningless software design.

There are several significant differences between classes and tables that can make a significant difference in the richness of domain modeling. With encapsulation, inheritance, and polymorphism, we can express domain models much more vividly and adhere to SOLID principles much more rigor:

Reference: The relational database table represents the many-to-many relationship is implemented using the third table, this domain model representation is not concrete, business students do not understand.
Encapsulation: Classes can design methods, data cannot fully represent the domain model, and data sheets can know a person’s three-dimensional dimensions, but not “a person can run”.
Inheritance and polymorphism: classes can be polymorphic, but it is impossible to identify the difference between human and pig in behavior except for 3d data. The data sheet does not know that “a person running is different from a pig running”.

Here’s another example of me slapping my son in anger:

__Mon Jan 22 2018 14:54:23 GMT+0800 (CST)____Mon Jan 22 2018 14:54:23 GMT+0800 (CST)__public class Father{ // Public void slapSon(Son Son){this.setpainonhand (); son.setPainOnFace(); } }__Mon Jan 22 2018 14:54:23 GMT+0800 (CST)____Mon Jan 22 2018 14:54:23 GMT+0800 (CST)__Copy the code

With this in mind, we gradually designed lifelike domain models in the object-oriented world, and the Service layer was based on these models to do business operations (it became thinner, with much of the action handed over to Domain Objects) : The domain model does not complete the business. Each Domain object performs its own behavior (single responsibility). Just like running, Person. run is a behavior unrelated to business. However, when the manager or service calls some Person. run, it can finish the 100-meter race and run to deliver food. This results in an architecture diagram similar to the following:

Let’s go back to the previous assumption, now remove the assumption, no one’s machine is infinite memory, never down, so we need a database, but the responsibility of the database is no longer carrying the heavy burden of the domain model, database returns to its nature, to accomplish the following two things:

Storage: Persisting object data to a storage medium.
Fetch: Returns data queries to memory efficiently.

Since domain modeling is no longer a feature, database design can become unconstrained, and any means that can speed up storage and search can be used. We can use column database, we can use document database, we can design very sophisticated intermediate tables to complete the query of big data. In general, database design is all about efficient access, not perfect representation of the domain model. Let’s look at the architecture diagram:

Here’S what I want to emphasize to you:

The domain model is used for domain operations and, of course, for queries (read), but this query comes at a cost. In this context, an aggregate may contain data that is not suitable for diversified queries except in a getById fashion, nor is the domain-driven design designed for diversified queries.
Queries are database-based, and all complex perverted queries should actually bypass the Domain layer and deal directly with the database.
To simplify things further: Field operations -> Objects, data queries -> Table Rows

2. Domain model: blood loss, anemia, hyperemia

The blood loss, anemia, hyperemia and bloating model was proposed by Martin Fowler, who explained how to define a model based on the fullness of the domain model, something like: thin, medium, robust and fat. The bloating model is too fat and we won’t discuss it here.

Blood loss model: In Java, for example, POJOs have simple field-based setters and getters. The relationship between POJOs is hidden in some object ID, which is interpreted by the external manager. For example, son.fatherId, son doesn’t know he is related to Father, but manager gets a Father from son.fatherId.

Anaemic model: it is not true that the son does not know who his father is, so he cannot check his DNA every time through the Manager (son.fatherid).

__Mon Jan 22 2018 14:54:23 GMT+0800 (CST)____Mon Jan 22 2018 14:54:23 GMT+0800 (CST)__public class Son{ private Father father; public Father getFather(){return this.father; } }__Mon Jan 22 2018 14:54:23 GMT+0800 (CST)____Mon Jan 22 2018 14:54:23 GMT+0800 (CST)__Copy the code

The Son class has become rich, but there is still a small inconvenience, that is, the Father cannot get Son, how can the Father not know who the Son is? Add this attribute to Father:

__Mon Jan 22 2018 14:54:23 GMT+0800 (CST)____Mon Jan 22 2018 14:54:23 GMT+0800 (CST)__public class Father{ private Son son; private Son getSon(){return this.son; } }__Mon Jan 22 2018 14:54:23 GMT+0800 (CST)____Mon Jan 22 2018 14:54:23 GMT+0800 (CST)__Copy the code

Now when you look at the two categories, it’s much fuller, and this is what we call the anaemic model, where the family is perfect, where the father and the son recognize each other. However, a closer look at these two classes reveals a problem: usually an object is obtained from a repository (database query) or factory (memory creation) :

__Mon Jan 22 2018 14:54:23 GMT+0800 (CST)____Mon Jan 22 2018 14:54:23 GMT+0800 (CST)__Son someSon = sonRepo.getById(12345); __Mon Jan 22 2018 14:54:23 GMT+0800 (CST)____Mon Jan 22 2018 14:54:23 GMT+0800 (CST)__Copy the code

This method takes a son object out of the database. In order to build the complete son object, sonRepo needs a fatherRepo to build a father to assign son.father. FatherRepo builds father and sonRepo builds son to assign father.son. This creates an undirected looped loop, a looped call problem that can be solved, but in order to solve it, the domain model becomes a bit nasty and settling. Directed acyclic is our design goal. To prevent this loop call, can we omit a reference in Father and Son? Modify the Father class:

__Mon Jan 22 2018 14:54:23 GMT+0800 (CST)____Mon Jan 22 2018 14:54:23 GMT+0800 (CST)__public class Father{ //private Son  son; Delete this reference to private SonRepository sonRepo; // Add a Son's repo private getSon(){return sonRepo. GetByFatherId (this.id); } }__Mon Jan 22 2018 14:54:23 GMT+0800 (CST)____Mon Jan 22 2018 14:54:23 GMT+0800 (CST)__Copy the code

This way we don’t have to construct a Son when we construct Father, but at the expense of introducing a SonRepository in Father, which means we’re referring to a persistent operation in a domain object, which is what we call a congestion model.

Congestion model: The existence of congestion model makes the Domain object lose the purity of its lineage. It is no longer a pure memory object. This object has a database operation buried in it, which is not friendly to testing. To ensure the integrity of the model, congestion model is necessary in some cases, such as a hema store can sell thousands of items, each item has hundreds of attributes. If I were building a store with all the products out, it would be inefficient:

__Mon Jan 22 2018 14:54:23 GMT+0800 (CST)____Mon Jan 22 2018 14:54:23 GMT+0800 (CST)__public class Shop{ //private List<Product> products; This list of goods is too large at build time. Private ProductRepository productRepo; public List<Product> getProducts(){ //return this.products; return productRepo.getShopProducts(this.id); } }__Mon Jan 22 2018 14:54:23 GMT+0800 (CST)____Mon Jan 22 2018 14:54:23 GMT+0800 (CST)__Copy the code

3. Domain model: Dependency injection

A quick word about dependency injection:

Dependency injection is a Singleton object at Runtime, and only objects that are in the scope of Spring’s scan (@Component) can use dependency injection via annotations (@autoWired). Objects that come out of new can’t be injected by annotation.
I recommend constructor dependency injection, which is test-friendly, construction-complete, and explicitly tells you which object you must mock/stub.

After dependency injection, let’s look at the congestion model:

__Mon Jan 22 2018 14:54:23 GMT+0800 (CST)____Mon Jan 22 2018 14:54:23 GMT+0800 (CST)__public class Father{ private SonRepository sonRepo; private Son getSon(){return sonRepo.getByFatherId(this.id); } public Father(SonRepository sonRepo){this.sonRepo = sonRepo; } }__Mon Jan 22 2018 14:54:23 GMT+0800 (CST)____Mon Jan 22 2018 14:54:23 GMT+0800 (CST)__Copy the code

Adding a SonRepository to Father is a very annoying way to write code. Can we inject SonRepository into Father by dependency injection? Father cannot be a singleton object here, it can be new in two scenarios: new, query, and from the Father construction process, SonRepository is not injected. This is where the factory model comes into play (many people think of the factory model as a decoration) :

__Mon Jan 22 2018 14:54:23 GMT+0800 (CST)____Mon Jan 22 2018 14:54:23 GMT+0800 (CST)__@Component
public class FatherFactory{
    private SonRepository sonRepo;
    @Autowired
    public FatherFactory(SonRepository sonRepo){}
    public Father createFather(){
        return new Father(sonRepo);
    }
}__Mon Jan 22 2018 14:54:23 GMT+0800 (CST)____Mon Jan 22 2018 14:54:23 GMT+0800 (CST)__Copy the code

Since FatheFactory is a singleton object generated by the system, SonRepository is naturally injected into theFactory. The newFather method hides the injected sonRepo. So the new Father object becomes clean.

4. Domain model: test friendly

The blood loss model and the anemia model are naturally testable (the blood loss model is not testable either) because they are pure memory objects. In practice, however, congestion models exist, or domain objects are pulled apart to become slightly less elegant (of course, anemia and congestion wars never end). Mock /stub out the database dependency when the object is congested with persisitence. Mock /stub out the database dependency when the object is congested. Mock /stub out the database dependency when the object is congested with persisitence.

__Mon Jan 22 2018 14:54:23 GMT+0800 (CST)____Mon Jan 22 2018 14:54:23 GMT+0800 (CST)__public class Father{ private SonRepository sonRepo; //=new SonRepository() private getSon(){return sonrebo.getByFatherid (this.id); } public Father(SonRepository sonRepo){this.sonRepo = sonRepo; } }__Mon Jan 22 2018 14:54:23 GMT+0800 (CST)____Mon Jan 22 2018 14:54:23 GMT+0800 (CST)__Copy the code

The whole point of putting SonRepository in the constructor is to be test-friendly, and unit testing can be done with mock/stub repositories.

5. Domain model: Implementation of Repository in Hema mode

According to the idea of Object Domain, the domain model exists in the memory objects, and these objects will eventually fall into the database. Because of getting rid of the constraints of the domain model, the database design is flexible. In Hema, how does the domain object enter the database?

In Hema, we designed Tunnel, a unique interface, through which we can achieve access to domain objects in different types of databases. Repository does not persist domain objects directly. Instead, it translates domain objects into POJos and sends them to tunnels for persistence. Tunnels can be implemented in any package. Domain objects+ Repositories and Persistence are completely separated, and domain package becomes a simple set of memory objects.

6. Domain model: Deployment architecture

Hema business has a strong integrity: from supplier procurement to delivery of goods to users, the relationship between objects is relatively clear. In principle, a large and complete domain model can be adopted, or boundedContext can be used to dismantle molecular domains and handle data transmission at the junction. Here is a picture of Lao Ma quoted:

I personally prefer the large domain approach, and MY preferred (so this is not the case) deployment structure is:

conclusion

Hema is still doing more exploration in the architecture design. Under the brand new business model of 2B+ Internet, there are many details that can be further discussed. DDD has taken a solid first step in Hema and has been tested in terms of business scalability and system stability. The Internet-based Distributed Workflow Engine (Noble) and the Fully Internet-based Graphics rendering engine (Ivy) are all in progress, and we are looking forward to more designs from Hema engineers in the coming months.

The authors introduce

Zhang Qunhui, Architecture Director of Ali Hema. With more than 10 years of practical experience in technology and management, he is the former engineering efficiency director of Ali Infrastructure Department. He has been guiding the architecture design of large and complex systems for a long time. DevOps, microservices architecture and domain driven design is one of the earliest practitioners in China. Advocating practice out of real knowledge, has been struggling in the technology line.

Thanks to Yuta Tian Guang for planning and reviewing this article.