The pain of ERP

Once upon a time, I worked in e-commerce and jewelry industry for more than 4 years and developed two large business systems (ERP) for these two industries. As an ERP system, the main functional modules of the system are nothing more than order management, commodity management, production and procurement, warehouse management, logistics management, financial management and so on. As a management system, everyone’s general development habit is to use. Net or Java technology, build a single block (single process) architecture application, only a SQLServer or MySql database. Then divide each module in the project file and organize the code development in a three-tier structure. Final test, delivery online.

At first, because the amount of data is not large, the system performance is good, all kinds of list query, report query, Excel data export functions are very smooth. However, with the development of the company’s business, the order quantity accumulated day by day, and the demand for report query and data export of various business departments increased in the later period, we gradually felt that the system was running slower and slower. The first solution we might think of is to optimize the bottleneck database. One possible attempt is to put the database on a separate server, separate the database from the application, or build various database table indexes, optimize the program code, and so on. After such a study and optimization, the performance of some functions of the system may indeed be greatly improved, but we still find that the data query export of some function lists is still very slow, or with the continued accumulation of data, the original fast list export function is becoming more and more slow. We tried all kinds of methods, and finally failed to achieve the ideal system performance speed.

In order to improve the system performance, we may take the initiative to learn from the technical experience of some Internet companies, such as high concurrency, high performance, big data, read and write separation and so on, and find that we have no way to start. We think because the business characteristics of the system are different. The concurrency of ERP system is not high, mainly because the business is complex, and the coupling degree of various services is much higher than those of Internet applications, so it is difficult to split. The data query logic is much more complex than the Internet system, and the data queried from a list page often need to associate 4 or 5 tables to get the results. Some report classes have even more. In addition, the transactional nature of various business operations and high requirements for data consistency often lead to our inability to further optimize the system.

There was a time when I was frustrated by one reason or another and thought ERP system was special and incurable, but then…

I don’t think so anymore, and there seems to be a new solution

Dawn first now

Before describing the specific plan, state your own ideas. I think first of all, before we do ERP system, we have to have the current Internet thinking. Let’s not try to make a monolithic system any more. We’re going to break up a big system into smaller systems. These small systems then communicate with each other through system interfaces. In this way to form a large system, specifically, is the “distributed”, “service-oriented” Internet thinking. Let the system in the architecture design is a congenital support highly extensible system.

So how do we do that? Specifically, order management, commodity management, production and procurement, warehouse management, logistics management and financial management are divided into subsystems. These subsystems can be designed and developed separately by exposing data interfaces to the requirements of various other subsystems. Each subsystem has a separate database. Even these subsystems can be developed and maintained by different teams, using different technology systems, and using different databases. It’s not all integrated into one big system, one big database, as it used to be.

What are his advantages for the new architecture of the system?

The first and most important thing is to solve the performance problem of the system. In the past, there was only one database instance, and it was not possible to scale out multiple instances to achieve load balancing by adding database instances under performance constraints. Some people might say that you can use read/write separation, but because of the characteristics of ERP systems, this solution is often not practical. For example, when working with inventory, you can’t read inventory from a reader and then write inventory to a writer. Because master slave replication is time-sensitive, written inventory cannot be written to the slave immediately. This scenario is also present in ERP. Besides, the write library cannot be extended, only one can exist. The new design is that the write library is separate, each subsystem has its own database.

Secondly, it is very convenient to update, and each subsystem exists in the way of backend microservices. The foreground is a single Web project that calls the service interfaces of these subsystems in the background. Such a design can be updated separately when a business subsystem needs to be updated. Without the single-process architecture of the past, a small update required the entire system to restart, resulting in the loss of user sessions and the need for a new login. The current design doesn’t have that problem.

Overall system design

System physical deployment view

The detailed design

Split application layer

Splitting the application layer is to practice the concept of “micro-services” architecture. The original large and complete single-process architecture is divided into independently deployable applications according to business modules, so as to achieve smooth system update, upgrade and convenient load expansion. In particular, it is technically possible to use restfull style interfaces, or to use approaches like the Dubbo framework in Java to simplify development. The ERPWeb side or other mobile side is also a separate application acting as the presentation layer. It is very thin and simply accepts parameters and calls the interfaces of other micro-service programs in the background to obtain the data needed to be displayed. Microservices act as the business logic layer, and each microservice is an independently deployable and online program that provides external data access interfaces.

Microservices can use a variety of popular RPC frameworks, such as Dubbo, can support a variety of call protocols Http, TCP, etc. These frameworks make coding easier, the framework encapsulates the underlying data communication details, so that the client to execute remote methods as simple as local methods.

Dubbo microservices architecture, also supports service governance, load balancing and other functions. In this way, not only the system availability can be improved, but also the performance of the application layer can be dynamically improved. For example, the warehouse management service is very busy and occupies a lot of CPU and memory resources. We can add another machine and deploy a warehouse management service separately. This allows the entire system to have two warehouse management services working simultaneously to balance the load. This is all done automatically under a service registry such as Zookeeper.

Microservice architecture, which naturally supports system update and upgrade operations. For example, the financial module has a new demand to go online, so we just need to restart the service replacing the financial module. This has little impact on users who have logged in to the system, so there is no need to log in again, and other module services will not be affected.

Split data layer

Database bottlenecks are a permanent blight on ERP systems. A lot of complex data query table join logic permeates the whole system. The key to the success of database vertical split is how to redesign the coupling of each module in the system data layer. If we can solve this problem, the permanent damage can be solved.

Let’s start with a typical data layer module coupling problem. Demand is to show the material inventory, list fields: material number, material name, category, warehouse, quantity

Material list:

Material ID

The name of the

Category ID

Z0001

Iphone6 red case

Z

Z0002

IPhone6 black case

Z

Inventory list:

Material ID

Warehouse ID

The number of

Z0001

W1

10

Z0002

W1

20

Category and warehouse list omitted…

Obviously, in a traditional database, we only need a simple join operation to associate these two tables, plus the associated category and warehouse table to query the data we want. But now in our architecture, the material table and the commodity table are not in the same database instance, we can not use join operation, so how can we implement the requirements?

The new architecture only allows us to obtain data through the service interface of the other side, not directly associated with the private database of the other service. At least architecturally, servitization does not allow direct access to the database of the other service. In this case, assuming that the Web module subsystem calls the warehouse subsystem to get the data, we need to create a Service method in the warehouse module to assemble the data. It then returns to the Web subsystem. As shown in the figure below, the warehouse management method firstly obtains the material code of the local inventory table and the warehouse name field information of the warehouse table, and finally prepares to return 20 pieces of data to the Web module after paging, and requests the commodity module subsystem with the material ID in these 20 pieces of data as a parameter. The commodity subsystem returns the commodity information related to the 20 material IDS to the warehouse management module, and then the warehouse management module reassembles the material name and category data required by the list to realize the data to be returned to the Web subsystem.

You might say, this is too much trouble, the performance of this method is certainly not as high as that of direct Join, and it does not solve the performance problem. At first glance, this seems to be the case, but after careful consideration, under the environment of low concurrency, small data volume and not busy business, the performance is indeed not as fast as the traditional join mode in a data. But let’s think about the future! Our current architecture design is to split a database into multiple databases, each database can run on a separate server, so that the pressure of the database can be loaded later. In general, this prevents the database from becoming a performance bottleneck during future busy times. It’s exciting to think about, isn’t it?

At this time someone will ask, that later system data volume, business is bigger, even you this split into a few databases is not enough to do? My approach is that each library can do read/write separation, use caching, etc., based on the split database. You can even go ahead and split the subsystem into multiple grandchild systems again. It depends on how busy the service module is.

Reporting system

Some list query logic is very complex, associated with more than ten tables, if the above method to split the data, it is a disaster ah! Yes, what you said is correct. In this case, my plan is to show the requirements of such more complex report level data query, and we can make a separate report system. The report database design adopts data warehouse method. For better read performance, we can design the database tables with many redundant fields, which is an anti-paradigm design, and create many composite indexes.

The key to the success of this system is the synchronization of data and the business library of the main ERP system. Generally, you can write a periodic synchronization program to directly generate the final or intermediate data required by the report view through selection and transformation of the data of the ERP main service system, thus simplifying associated query. Reporting systems can also be designed using microservices architecture. As shown below:

If the data required by the report requires real-time, we can let the ERP system trigger the request of data synchronization during the business operation, real-time synchronization to the report library.

Distributed transaction

Some people may ask again, ERP system many operations require transactional, you split the system how to achieve transactional, ensure data consistency?

That’s a good question, and the last one I thought about before I decided to write this article. In microservices architecture, it is not easy to implement quad-service transactions, at least not as convenient as local applications using local database transactions, with high performance and good data consistency.

You’ve probably heard of distributed transactions. There are two scenarios. One is an application that uses multiple databases and requires distributed transactions to ensure data consistency. There is another case for our architecture. Distributed transactions in a microservice environment, metaphorically speaking. Purchase warehousing is an operation designed in the warehouse management service. After warehousing, you need to update the amount of warehousing in the purchase order in the procurement subsystem. This process requires data consistency, that is, the quantity written into the inventory table after the purchase order is successfully stored, and the quantity in the purchase order table is updated. We cannot access the database in the procurement service directly from the warehouse service, we must access the service interface provided by the procurement service. If so, how can we ensure data consistency? Because it is very likely that the inventory table writes successfully, but the call purchase service fails to write the purchase order data. It could be a network problem, so the data is inconsistent.

In distributed transaction technology, there is a term for achieving final consistency, which means I don’t have to use a transaction as long as I can guarantee that the data on both sides is ultimately consistent. So there’s a plan. For example, the warehouse subsystem needs to increase the entry data and update the inventory data and other tables when handling the purchase entry. These tables are in the warehouse subsystem, and we can use a local transaction to ensure consistency of table data in the warehouse subsystem. Then call the procurement subsystem to update the inventory quantity in the purchase order. To prevent this process from suddenly stopping and causing the call to fail, we consider adding a message queue middleware such as ActiveMQ. If the interface fails to return, we write the processing request to MQ, and when the procurement subsystem recovers, MQ notifies the procurement subsystem to process the update operation. Because there will be no notification after the message is consumed, the update fails due to an exception during the procurement subsystem processing. The problem needs to be written into the local log library for notifying the administrator of subsequent compensation processing. In this way through a variety of methods to achieve the final consistency of the data. It sounds a little lame, but this is the solution. There’s nothing better. Or if the update fails, re-call the warehouse subsystem to roll back the entry list and inventory data to achieve final consistency! As shown in the figure:

Very lucky to share knowledge and experience, it is because of your selfless share, let us to grow and progress, and I share few things in recent years, sometimes because the work is very busy no time to write something, sometimes also because his lazy or nothing new can give everyone to share. Finally, I also hope that you can give criticism and correction to my shortcomings and make progress together!