preface

Microservice is a very popular technical framework at present. Through miniaturization and atomization of services and elastic scaling and high availability of distributed architecture, it can realize loose coupling between services, flexible adjusting combination of services and high availability of systems. It provides a good foundation platform for business innovation and business continuity. This article includes the following.

1. Design of multi-layer data architecture in microservices technical framework

2. Key points in data architecture design

3. Point 1: Data ease of use

4. Point 2: Master and slave data and data decoupling

5. Point 3: Separate databases and tables

6. Point 4: Multi-source data adaptation

7. Tip 5: Multi-source data caching

8. Point 6: Data marts

For easy understanding, this paper uses a simplified sales model to illustrate, as shown in the figure below. Figure 1 shows the relationship between customer, seller, commodity, pricing, and order (other elements of payment, logistics, and so on are omitted here).

Figure 1 sales model

In this sales model, sellers provide goods and set prices, while customers choose products to purchase and form sales orders. According to the design concept of micro-services, it can be divided into customer services, seller services, commodity services, pricing services, order services, and public services (such as authentication, permissions, notifications, etc.), as shown in Figure 2.

Figure 2. Microservices

Multi-layer data architecture design in microservices architecture

Distributed architecture generally divides a system into software-as-a-Service (Saas), platform-as-a-service (Paas), and Infrastructure as a Service (Iaas) layers. Among them, the Saas layer is responsible for providing business services externally, the Paas layer provides basic application platform, and the Iaas layer provides infrastructure. Microservices are vertically embedded in these three layers of services, independent of each other. Therefore, the data architecture design needs to take into account the data concerns of the three-tier services and the independence of microservices.

Hierarchical design of data architecture

Figure 3. Microservices technical framework

Figure 3 shows the physical infrastructure in which the Iaas layer provider runs (there is a lot of hardware and networking involved here, which is omitted in this article). The Pass layer is subdivided into three layers, the basic service layer, which is mainly responsible for data storage and processing. Transaction framework layer, mainly responsible for micro-service registration, scheduling management, distributed transaction processing; The application service layer, the API that mainly realizes each microservice, for other microservices to call directly and the service invocation of Saas layer.

Saas services are business services that are publicly available. The architecture techniques used in this article can be accessed free of charge in group 619881427. Those who are interested can join in.

The Data architecture is divided into the Raw Data layer, Logic Data (Inner) layer, and Logic Data (Outer) layer from bottom to top. (In Iaas, the basic hardware environment is mainly used, which is omitted in this article.)

The Raw Data layer is based on databases, files, or other forms of Data content. The Logic Data (Inner) layer is the logical Data used by the microservice API, such as customer Data, order Data, and so on.

The Logic Data (Outer) layer is the external service provision Data, such as customer order Data. Thus, the hierarchical results of our data architecture are shown in Figure 4.

Figure 4 data hierarchical architecture

In addition, much of the information will be presented in the form of pictures or reports. Therefore, on top of Logic Data(outer), Information blocks (commonly used Information blocks) can be built, and the View type (display mode) can be set to display the View.

As shown in Figure 4, the closer it is to the external service layer, the more influence the customer has on the designer, and the more considerations such as usability, ease of use and applicability are required. On the contrary, the farther away from the external service layer, the more concerned the design is with data storage.

The advantage of data three-tier architecture is to realize the layer by layer transition of data from system implementation to business implementation and to realize the loose coupling between business data and system data. At the same time, the flexible expansion of business and system is realized

Key points in data architecture design

The above describes the layered design of data architecture, and the following describes the key points of data architecture design.

Point 1: Data ease of use

No matter how the data is implemented, its ultimate purpose is for business (or customer) use. Therefore, ease of use of data is critical when providing services externally.

Figure 5 Data ease of use

As shown in Figure 5, customer information is stored in the Logic Data (Inner) layer by splitting personnel information into several sub-tables for Data softness and non-redundancy. For example, a staff address table can store unlimited customer address information. The advantage is that each time the personnel address is updated, a new address data is generated instead of directly updating the personnel address, and the original address information is saved as historical data, which is easy to quickly recover data and track historical information.

However, when the Logic Data (Outer) layer provides external Data, the first consideration is to provide enough information at once (after all, the query operation is much more than the modification operation) to reduce the information that is not needed in the business scenario. For example, when only three common addresses are provided to ordinary customers, addresses 1, 2, and 3 are placed in one table in the data design.

Point 2: Primary and secondary data and data decoupling

It is not practical to have completely independent data for each microservice API, such as an order that requires product, customer (including consignee), seller, and price data. If all this data is managed in the order service API, then changes in customer intelligence, price adjustments, and so on are synchronized to the order API data, and the data becomes very coupled.

When designing data, you need to consider reducing the interdependence between data. Therefore, you first need to determine the primary and secondary data for each microservice API. Master data refers to the core data of the microservice API. The addition, deletion and modification of this data are mainly concentrated in a microservice API, such as the order data in the order service API. Side data refers to data that references or maps to other microservice apis, such as commodity data, price data, etc., in the order service API.

Secondly, in order to reduce the degree of coupling between data, data association table is used to represent the relationship between data. If you want to remove the association between data, you can directly remove the association table, which has no impact on the data itself. See Figure 6 for details.

Figure 6. Primary and secondary data and data decoupling

Point 3: Separate databases and tables

As the volume of business data continues to increase, a large amount of data, such as order data, will accumulate in a single database or single data table. As time goes on and the number of customers increases, more and more order data will be generated. When data is accumulated to a certain extent, the performance of data operations will be significantly reduced, that is, we often say that the database “can not move”. Therefore, in the data architecture design stage, we should consider the database and table of data.

As shown in Figure 7, we divide order data into current data application database, historical database and historical archive database. The current data application library is used to support the generation of new orders and the increase, deletion, change and check of orders in execution. Historical databases (for example, the last 3 months and the last 1 year) are used when customers want to see past orders. Historical archived data (archived by year) shall not be directly disclosed to customers in principle for future reference and statistical analysis.

For the current data application library, you can continue to divide the library by the customer number range. In this way the size of each database can be effectively controlled. Split table, that is, one piece of information is stored separately in two or more tables. For example, the order information is divided into basic information and detailed information, which can be applied to the basic information query and order details query. In a word, the core of database and table division is to control the load of a single database (data amount and information amount), and to cope with the growth of business data volume through multiple tables and libraries.

Figure 7 Table and library

Point 4: Multi-source data adaptation

In addition to the traditional relational database, there are a variety of data sources, such as images, sound, video and other multimedia data files or data streams, CSV, TXT, Doc, EXCEL, PDF, XML and other heterogeneous numbers. These data need to be processed and converted into manageable data information. Therefore, in the design of data architecture, it is necessary to configure corresponding read and write adapters for data sources of different properties, as well as unified scheduling, as shown in Figure 8. The architecture techniques used in this article can be accessed free of charge in group 619881427. Those who are interested can join in.

Figure 8 Multi-source data adaptation

Tip 5: Multi-source data caching

In addition to the complexity of processing logic, a large part of the performance of data processing is the operation time of target data (including reading and writing to hardware disk devices and network transmission). Network speed, especially the use of optical fiber, has been greatly improved, but the disk read and write efficiency has not been significantly improved, so reducing disk read and write is an important way to improve efficiency.

Data caching is to put commonly used data (data that doesn’t change very often) and recently used data into memory. This can significantly reduce the operating overhead of the system on the hardware disk device and improve the performance of the overall data system, as shown in Figure 9.

Figure 9. Data cache

Point 6: Data marts

Data marts are a big topic. When the existing data cannot be simply associated by several tables and can be used by businesses after simple processing, it is necessary to consider the construction of data marts. Data mart analyzes and processes data from the point of view of data application. After a series of data operations such as multi-source data import, cleaning, processing and view making, it provides available and stable data sources for business.

For example, the concept of data mart is used in the multi-dimensional analysis of sales analysis, what kind of customers like what kind of goods, the influence of price on sales amount, and the correlation between sales amount and regional date, as shown in Figure 10.

Figure 10. Data mart

Data carries information, and good data architecture design makes business systems more fluid, easier to understand and maintain. This article just summarizes some experience in the actual project, for everyone to share. If there are shortcomings, please also supplement, give advice.

Here I recommend an architecture learning exchange group. Exchange learning group number: 478030634 inside will share some senior architects recorded video video: Spring, MyBatis, Netty source code analysis, high concurrency, high performance, distributed, microservice architecture principle, JVM performance optimization, distributed architecture and so on these become architects necessary knowledge system. I can also get free learning resources, which I benefit a lot from now