Reconstruction: Improving the design of ele. me trading system

I joined the trading department of Ele. me in May 2017, and successively took charge of search, order, timeout, compensation, treaty, delivery, amount calculation and evaluation systems. Later, I began to upgrade the overall system. This article was formed after the first phase of trading system reconstruction, mainly reflecting on the thinking of making decisions in the process. I did not use the word “architecture”, because it gives people a sense of power and mystery. Talking about “architecture” makes people feel that they are making responsible decisions or in-depth technical analysis.

As Bi Xuan mentioned in this article on system Design routines:

Reviewed your done several system design, now found themselves when applied to the design of system really is will do as a routine, the routine is: the purpose of the design system – > system design goal – > the core design around the target – > around the core design principle of design form – > each subsystem, module of detailed design

Knowing the purpose and forming measurable goals is the first step in system design.

“Soft” ware

Software Uncle Bob is a soft ware

The code of the first version of the trading system before refactoring can be traced back to 8 years ago. During this period, it also underwent disassembly and refactoring. When I arrived in 2017, the main system was as follows:

System Name Main Function Order management on the Bosc end: user details and list Order management on the Nevermoreb end: Booking shopping cart by merchants, placing orders by Eos order center Loki order cancellation, Blink order delivery and fulfillment, and merchant order center

This system carries the business from millions of orders to tens of millions of orders. Judging from the pressure test performance, it can support the business several times more, that is to say, if nothing changes, it can continue to run stably, but if there are some changes, the answer may not be so sure.

In the past two years since I joined the company, the business of the system has been changing one after another: from a single catering takeout to a three-way parallel with new retail and brand catering, from the home mode to the store, followed by the continuous differentiation of business customization, as well as the requirements of parallel online. On the other hand, as the organizational structure of the company changes, some projects need to be coordinated among the three places to complete, and the cost of communication and cooperation has doubled and increased. The combination of these factors resulted in the development not having the energy to fully plan for the evolution of most systems.

A few months ago, the business asked for a simple request: automated reviews of transactions and penalties. The core “domain model” was evaluated as follows:

The pros and cons of the design itself will not be discussed here for the time being. It is just an example to illustrate that in order to meet this demand, several evaluation sub-modules will need to be changed, and the workload after development evaluation is far beyond the expectation. The business side is not satisfied with this, and similar conflicts often occur in other systems. But in fact, no lazy people in the team, to work as hard as before, just no matter how many people spent time, saved how many times the fire, add how much class, output always couldn’t get on, because of the development of most of the time on the system of tinkering, rather than do the actual real new features, has been robbing Peter to pay Paul, weeks and reciprocation.

Why can lead to such result, I think it should be because most system has evolved to difficult to respond to demand change, business think of small change, for the development is a major operation in the system, but the system should not have toward this direction, there is great difference between it and the hardware is that changes should be simple and flexible for software.

So we thought about the core goal of design: ** “Use good software architecture to save the labor cost of project construction and maintenance, make each change short, simple, easy to implement, and avoid defects, and meet the requirements of functionality and flexibility to the greatest extent with the minimum cost”.

Source code is the design

When it comes to software design, people may think of a clearly structured architecture diagram and think that all the secrets about software architecture are hidden in the diagram, but after experiencing some projects, it is often not enough. Jack Reeves published a 1992 paper, “Source Code as Design,” in which he made a point:

The design of high-rise structure is not a complete software design, it is only a structural framework of detail design. Our ability to rigorously validate high-level designs is very limited. Detailed design ultimately affects high-level design at least as much as anything else (or should be allowed to). Improving all aspects of the design is a process that should continue throughout the design cycle

After stepping on some of the pit, to emphasize the importance of detailed design point of view that it seems to me is really ground, simply: “the top-down design is often unreliable, coding is a part of the design process”, personally think: system design should be from the bottom up, with the enhancing the level of abstraction, evolve and get a good high-level design. **

Programming paradigm

From the bottom up, we should start from the code. The trading system of Ele. me was written in Python at the beginning, and Python was flexible enough to produce the SYSTEM version of MVP very quickly, which was also related to the development status of the company at that time: rapid product iteration and great pressure from new projects.

The recent reconstruction, both accords with the trend of the group, we use Java to write, but before that there is a small interlude: 17 at the end of the year, because the forecast to the current system framework in a single volume will be confronted with the bottleneck, to reach the next order of magnitude and so on some new business gradually began to use the language to write, but in this process, often hear some comments: Writing business in go is uncomfortable. Why is it uncomfortable? Roughly because there is no framework, no generics, and no try catch, it is true that GO is not an optimal choice in the larger context of solving business problems, but the syntax is simple and the probability of error is greatly reduced by the average programmer.

As for Python, everything has a double-edged sword. Although Python is expressive, its flexibility is spoiled by many people. The code is rough, and dynamic languages have too many bugs and are prone to errors. There is also some truth in the claim that flexibility is overrated — constraint is liberation

So as not to arouse the language, but talk more here, just want to draw out: I’m from c + + to Go, and from the Python written in Java, realized in the process, may be learning any programming paradigm to understand when a programming language is the most important terms, in simple terms it is programmers view program should have, but it is easy to be ignored. The code for trading the old system, no matter what the business logic, is almost OPP to the bottom, and similar code is everywhere in the system.

I’m not saying that OOP has to be perfect. I’m more of a fan of the “problem oriented” paradigm. Java, for example, is inherently OOP, but business processes don’t necessarily need OOP. Some trading business is the first step, the second step, adopting the OPP paradigm is a good solution. In this case, making very complex class designs is sometimes unnecessary and can cause trouble

In addition, the same problem can be broken down into different levels, and each level can use its own appropriate approach. For example, OOP can be used at the high level, and FP can be used in a specific execution logic. For example, for the calculation of the amount of orders, we use Go to write a version of THE underlying calculation service of FP. High performance, simple syntax and fewer errors are the advantages of the language, and the core is because this kind of problem is suitable for itself.

However, when it comes to the whole transaction world, the proper use of OOP design ideas has been proven to support complex and large software designs for complex and diverse business scenarios, so our first decision was to adopt a “hybrid” paradigm dominated by OOP.

Principles and Patterns

The difference between a bad programmer and a good one is whether he considers his code or his data structures more important. Bad programmers worry about the code. Good programmers worry about data structures and their relationships. — Linus Torvalds

No matter what programming paradigm or language you use, the building blocks are like bricks. If the bricks are not good, the building will not be strong at the end of the day. I understand it to refer to the interactive relationship between classes. The quality of “relationship” is usually equivalent to the quality of software design. Most poorly designed software structures have some common characteristics:

Rigidity: It is difficult to make changes to the software, often resulting in chain changes, such as adding a new marketing type when placing an order, which is perceived and changed by the order center and related upstream and downstream
Vulnerability: Simple changes can cause other unexpected problems, or even completely unrelated concepts
Robustness: There are parts in the design that are useful for other systems, but the risks and costs of disassembly are high. For example, the payment capacity of the order center for takeout scenarios cannot support the payment demands of virtual goods such as membership cards
Unnecessary complexity: This usually refers to over-design
Obscurity: Modules become harder to understand over time, code becomes harder to read, for example the core code of the shopping cart stage has grown into a function of nearly a thousand lines
.

With the right paradigm in place, we need to pull up a level and focus on the logic above the code. Years of software engineering have deposited some basic principles and patterns that have been proven to guide how to package data and functions and then organize them into programs.

SOLID

Some people have rearranged these principles, starting with the letters SOLID: SRP, OCP, LSP, ISP, DIP. Here are some examples of some of these principles.

SRP (Single Responsibility) : This principle is simple: any software module should only be responsible for one type of user, so code and data should be organized because they are closely related to one type of user. In fact, most of our job is to find responsibilities and break them down.

I think the core of this principle lies in the definition of user. When I went to Qcon in 2018, I heard Yu Jun’s share, one of which can be used to explain what user is. Yu Jun said, “User is not a person, but a collection of needs”. In the process of our reconstruction, debate about the delivery in the trading system link, the hungry? Support businesses since the match and platform managed and selecting distribution (running errands, for example), these a few kinds of distribution way of calculate the price, delivery logic, and different usage scenarios, so we did apart based on this, we all agree with the first decomposition method.

Later, however, the merchant group was adjusted, and the new retail merchant and catering merchant were separated, corresponding to the difference in the operation mode of the business side, resulting in different demands for each distribution mode. With these changes, we finally chose to carry out the second disassembly.

Here’s a tip for single responsibilities: If you’re really hard to analyze, look for code that conflicts because of branch merging, because it’s likely that everyone changed the same module at the same time for different needs.

DIP: Some people say that dependency inversion is the dividing line between OOP and OPP, because of the dependencies created in procedural design, the policy is detail dependent — that is, the high level depends on the low level, but this often leaves the policy vulnerable to changes in detail, for example: In the take-out scenario, once the user fails to receive the meal due to some reasons, the merchant will compensate the user with the voucher to appease the user. In this case, OPP can do as follows:

But after a while, because the voucher usually cannot be used across stores, the platform wants users to continue to purchase, so it wants to compensate for the universal red envelope to retain it. At this time, it needs to change the old code and increase the reliance on the red envelope compensation logic to meet the appeal. However, the problem may be solved more elegantly if DIP is adopted in a different way:

Of course, this example is a simplified version, and there are many more complex scenarios in the real world, but the essence is the same: using OOP takes the policy away from the details, makes the details dependent on the abstractions, and often the customer owns the service interface. The core of this process requires abstraction.

OCP (Open and Close Principle) : If carefully analyzed, we will find that this principle is actually the goal of the system design set at the beginning, and also the ultimate goal of other principles, such as: Through SRP, the modules of each line of business are disassembled to isolate changes, but the platform also needs to do some abstraction, precipitation the core business process, and open up to each line of business for its own definition, which is then applied to DIP.

I don’t want to mention the other principles, but there are other types of principles besides SOLID. For example, IoC: Using a takeout exchange platform for example, the merchant sells the food to the user, pays the money and delivers the food to the user, so basically, the user and the merchant need to have a strong coupling (must meet). At this time, ele. me platform comes out as a guarantee. Users deposit money on the platform, and the platform asks the merchant to take the order and then serve the meal. After the user receives the meal, the platform sends the money to the merchant. This is called inversion of control, in which buyers and sellers reverse their direct dependence and control on each other to make each other rely on the interface of a standard transaction model.

Can be found as long as summing rules, there is always the principle of one kind or another, but the use of the principle of each is not once and for all – the need to constantly make code changes according to the actual demand, the principle is not all, can’t use, unconditional can because otherwise follow too much may cause unnecessary complexity, often like to see some of the code to use the factory pattern, A new is a violation of the DIP.

Evolve into patterns

The pattern here is what we call a design pattern, and I use the word evolution because I think that pattern is not the beginning, but the end of design. The content of Design Patterns is not an invention of the author, but rather an abstraction from a large number of actual systems, most of which are already existing and widely used, but have not been systematically combed out. In other words, these patterns are likely to manifest themselves naturally in system code, as long as some of the principles described above are followed. In agile Software Development, there is a section that describes the evolution of a piece of code into observer mode as it is adjusted.

It is good to have patterns. For example, in the search system, a complete set of search parameter parsing templates can be defined through the Template Method pattern, and different query demands can be customized by adding configurations. What I want to emphasize most here is not to design pattern-driven programming. Take the state machine in the transaction system as an example (the state machine is too common, such as the table lamp used at home, which has an on and off state, but it will be more complicated in the transaction scene). In the catering takeout transaction, there is the following state flow model:

The most direct way to implement such a finite state machine is to use nested switch/case statements with shorthand code such as:

public class Order { // States public static final int ACCEPT = 5; public static final int SETTLED = 9; . // Events public static final int ARRIVED = 1; Public void event(int event) {switch (state) {case ACCEPT:
                switch (event) {
                    case ARRIVED:
                        state = SETTLED;
                        //to do action
                        break
                    case}}}}Copy the code

Because it is a simplified process, so the above code looks quite acceptable, but for the order status is so complex State machine, the switch/case statement will be infinite expansion, readability is very poor, another problem is the logic and the movement State didn’t open it, “design patterns” provides a State model, this particular way is:

This pattern does separate the actions and logic of the State machine, but adding more and more State classes as the State increases makes the system extremely complex and does not support OCP well: In the case of state switching, the addition of a new class will cause the modification of the state switching class, and the worst thing is that this method will hide the entire state machine logic in the scattered code. The old version of the trading system used an interpretive migration table. The simplified version looks like this:

# Order completionadd_transition(trigger=ARRIVED, src=ACCEPT, dest=SETTLED, on_start=_set_order_settled_at, Set_state =_set_state_with_record, // Change state on_end= _PUSH_to_transcore)...# engine
def event_fire(event, current_state):
    for transition in transitions:
        if transition.on_start == current_state && transition.trigger == event:
            transition.on_start()
            current_state = transition.dest
            transition.on_end()Copy the code

This version is very easy to understand, the state logic is centralized and not coupled to the actions, and it is extensible. The only drawback is the traversal time, but it can also be optimized with dictionary tables, but the overall benefit is more obvious.

But with the development of business, the trading system to support multiple sets of state machine at the same time, means that there will be a migration of multiple tables, but also extend the custom demand, according to the business do this solution leads to code was complicated, we adopted when refactoring secondary choreography + process engine to optimize the problem, but not in the scope of our discussion, I only want to emphasize the second decision here: code should have the flexibility to analyze problems through design principles and then solve them through appropriate design patterns, not design pattern-driven programming, such as when a global variable can replace a so-called singleton pattern.

Rich domain meaning

Once you try to explain beauty without referring to something that has that quality, you can’t explain it at all

To use a less appropriate phrase, if the previous strategy was for a static problem, now we need to discuss a solution for a dynamic problem: People don’t think of a leaf as stable even when there is no wind, so they define stability not in terms of the frequency of change, but in terms of the cost of change, because with a puff of wind the leaf will sway. In addition to writing current code that is clear and reasonable, we need to be able to write “leaf” code that responds to changing requirements.

Design for business change starts with understanding the core issues of the business and breaking them down into sub-domains. DDD, or domain-driven design, has proven to be a good place to start. Instead of learning it as a technology, I’m learning it as a methodology to guide development, as a third decision, and I’m still in the early stages, so I’ll just say a few deep points.

The common language

One of the most important behavioral aspects of a well-designed architecture is that it clearly and explicitly reflects the intent of the system design. In short, when you pull down the code for certain services, you can say at a glance: HMM, this “looks” like an application for a trading system. We can’t talk about business logic and then knock out another copy of code. In short, we can’t talk about business logic. Compare these two types of subcontracting, which is easier to understand:

One purpose of discovering a domain common language is to respond to requirements change by capturing the domain’s connotation, which requires many objective conditions, such as having a domain expert on the team. ** I once saw a programmer friend who worked in Dingxiang Garden buy a large number of medical books. Without asking, I guessed that he must have become a DDD believer.

In view of this point, we also did some work to make “source code as design” in this refactoring: visualizing domain elements, adding agreed annotations when some concepts in the system domain have been agreed with the product, and the code can be scanned and collected when compiled and sent to the front end for drawing.

Back to the previously mentioned evaluation domain model, after repeated communication products and then realized that, without hope evaluation so many kinds of products, goods, the rider, for it belongs to by the object of evaluation, from the point of the domain model, before the design is more in the face of the scene, rather than face the behavior, so the reasonable domain model should be:

Bounded context

This is very common in our development process. Take the User system for example: a User Object, if from the User’s own perspective, you can log in, log out, modify the nickname; If you look at it from other ordinary users, you can only look at nicknames and things like that. From the perspective of the background administrator, you can log out or kick out of the login. In this case, you need to define a Scope to indicate which Scope the User is. This is the concept of bounded context in DDD.

Bounded context can be very good isolation of the same things different connotation, through strict specification can enter the context object model, in order to ensure the consistency of the business abstract behavior, return to trading, hungry yao is one of the most began to support the super members play, in order to support the corresponding settlement, need access to the trading system to complete this business, In order to reduce the complexity, we decomposed the problem domain into member domain and transaction domain. In order to protect the super-meeting card from disrupting the business logic inside the transaction when entering the transaction domain, we made a mapping:

segmentation

When all the code is done, as the program grows and more people participate, it is important to divide the code into groups that can be maintained by individuals or teams to facilitate collaboration. Depending on the rate of software change, the code mentioned above can be broken down into several components:

Extension: expansion pack, here where the previously mentioned business custom package, the object-oriented thought, the core of the contribution is through polymorphism, allow pluggable switch, the logic of a program, the history of the development of software development technology is actually a easy to try to manage the increase of plug-ins, thus creating a scalable and maintainable system architecture of the process.
Domain: the Domain package, which houses the core business package with the Domain common language, is the most stable.
The difference between the Domain package and the Business package is that maybe the Domain package provides a people.run() method that he uses to run a food delivery or go to the gym.
Infra: Infrastructure packages that hold the dependencies on databases and various middleware, which are details outside the business logic.

Then there are the hierarchical dependencies. Martin Flower has provided a classic set of hierarchical encapsulation patterns, taking the simplified order module as an example:

However, if you avoid all types of conversions and do not want to strictly adhere to hierarchical dependencies, you may feel that some queries (here refer to Query, Query! = Read) can bypass the domain layer directly, thus becoming CQRS mode:

Ideally, however, the domain layer, as the core business logic, should not depend on the details of the infrastructure, and the testability of the code should be improved in this way

Once the components of a single application are broken down, one level up, we focus on four core services: Booking is divided into Cart, Buy and Calculate, Eos is divided into Procee, Query and Timeout, and Blink part of the functions related to merchant orders are divided into Process, Query and logistics Delivery separately into Delivery. The core services of the final transaction are disassembled as follows:

System Name Main Function Order management on the Bosc end: User details and list Order management on the Nevermoreb end: Calculate, Calculate, Calculate, Calculate, manage the life cycle of an order

Up to now, on this segmentation approach, add up to a total of four decisions, actually don’t have to sequence, they are around the core software flexibility this goal, from writing paradigm into components, and then to the layered, we take the initiative to choose or avoid some doctrine limit, so the business architecture in a sense, It’s also about limiting the behavior of the programmer in some domain, and getting him to code in the direction of the desired specification. Thus the whole system is flexible and reliable.

“No Silver Bullet”

“Individuals and interactions trump processes and tools”, # 1 of the Agile Manifesto

It doesn’t matter what the architecture looks like right now, because it may disassemble into something else over time, but it’s important to realize that there is no silver bullet for building a flexible trading system.

If you look closely, there are still a lot of problems waiting to be solved in the current system. For example, some cross-link changes can occur where a field is added to a service’s interface, causing upstream and downstream to change with it. To make things even more awkward, services are broken up to understand coupling, but sometimes there are publishing dependencies. System evolution is a lasting war. “Individuals and interactions trump processes and tools”, and talent is the core factor to win.

Over the past two years, we never stop thinking and practice, often can see trading disputes within the team members, small to a interface field changes, big to the domain boundary between, to get a reasonable technical scheme did a lot of discussion, it reminds me of the zen and the art of motorcycle maintenance mentioned in good quality, someone remarked A programmer might have the experience of writing great code and thinking “you didn’t write the code, it was there all along and you found it”.

Author: Sheng He, flower white tea

The original link

This article is the original content of the cloud habitat community, shall not be reproduced without permission.