Source: Previously, the author wrote an article called FunData — Evolution of esports Big Data System Architecture. Portal: http://t.cn/RdgKWGW felt that I didn’t write something deep. I spent a few nights deciding to rewrite a different article. This article is collated by IT Da Jia said (wechat ID: Itdakashuo), reviewed and authorized by contributors and guests.

Read the words: 3497 | 9 minutes to read

Abstract

This paper is completely different from the previous one. It explains why the FunData system needs optimization from another perspective, and sort out some ideas on how to optimize the architecture, hoping to bring you some inspiration and thinking.

Let’s start with FunData’s two architecture diagrams.

Figure 1 ETL Architecture Figure 1.0

Figure 2 ETL overall architecture Diagram 2.0

Architecture 1.0 and 2.0 are both microservice architectures. Architecture 2.0 has made more improvements in data transmission model, data storage and rendering methods, and service governance on the original basis.

Data transfer model

We chose Python as the main language and master-slave as the system model. We built the data transmission model through the MQ library of Python. The processing logic is to pull the match data from the STEAM public interface, assign the match task to each Slave node for processing, and process the record metadata and the processed data to MySQL.

The queue mode of InMemory brings a lot of inconvenience to our system maintenance. Update and restart of the Slave node. Because the service is not registered to ensure that the node can be offline after the task is executed, the restart is random and cannot ensure that the task in the node is executed properly. When the Master node is updated and restarted, all Slave nodes need to be restarted once to establish a new connection. Of course, reconnection mechanism can be used here (this will be discussed in the context of the Service Mesh later).

Therefore, ARCHITECTURE 2.0 uses MQ service to decoupled the upstream and downstream systems, and adopts the message bus mode to push messages, which is transformed from the original master-slave mode to the controller-worker mode. Such advantages are on the one hand, there is no dependence on system update, and the whole system does not need to restart. On the other hand, a more detailed worker allows us to write workers according to different data requirements, making data processing more controllable (pluggable), and making worker expansion more targeted and resource utilization more accurate.

Data landing mode

At that time, there was no distributed MySQL and read-write separation mode. When the capacity bottleneck was reached, we could only add a new primary and secondary MySQL. It was really troublesome to configure one more DB entry in ETL and API layer for data storage and aggregation processing. In addition, the configuration to maintain very accurate; In addition, the team’s data point requirements are constantly changing, and structured storage is not conducive to the expansion of data points.

In 2.0, we resolutely choose NoSQL and distributed storage. For details, please refer to the design and thinking of data storage in FunData esports Big data system.

Service governance

The 1.0 system belongs to the product of rapid iteration online, the construction of bypass system is almost zero, and the system is often in the state of “running naked” online. In order to ensure the stability of the service, we introduced a lot of bypass construction.

  • K8S – Reduces service operation and maintenance pressure and increases system scalability.

  • Logging system – System processing exceptions can be traced

  • Grayscale system and load balancing – interface update and request flow control

  • Harbor + Registry – Mirror warehouse unified management, code by version

  • CI/CD – Quick launch, unified deployment and testing

  • Serverless – Allocate resource-consuming algorithms to Serverless services, do not independently maintain the system, pay-as-you-go

We continue to optimize FunData’s microservices architecture to handle more data and more esports data with better architectures and computational models.

After looking at FunData’s architecture optimization, you may wonder why we can’t reach the target in one step in the architecture design of the system. Is microservices architecture a “cure-all”?

In my opinion, there is no unified formula for system design. The key is to obtain an architecture form that conforms to business development through reasonable layering and combination in the process of system iteration. In the system architecture design, there are some ideas as follows for your reference.

Single point system

Early in the project or at the demo stage, a single point of service design approach is more appropriate. On the one hand, quick validation of ideas, on the other hand, there is not much cost pressure, do not consider HA/AZ disaster, etc., self-verification of functional implementation and business logic is the core.

Single point system

For small teams, a single point of service with a single REPO maintenance code is sufficient, and only one server is required to run the service.

Micro service

The use of microservices architecture can be considered in terms of the following points

  • As the team grew larger, the code of the individual REPO became bloated, with more than a dozen business functions, such as mall login, user system, shopping cart, payments, and so on. Such a panic on the whole mall design has been unable to meet the business development.

  • Resource utilization rate is uneven, you can consider the logic split out of the more detailed system to do a reasonable expansion, it is not easy to encounter a business bottleneck drag down the whole site.

  • When the system needs stratification and abstraction, for example, after splitting some logic, the whole system needs a unified access layer for request scheduling, and when the amount of data increases, it needs an independent data agent layer to handle data aggregation and batch insert operations.

Microservices architecture reference

Microservice architecture is generally designed in a hierarchical manner. The main logical link can be divided into the access layer, logical layer, and storage layer, and each bypass system serves various services at the main link layer.

For example, load balancing and gray scale system are introduced in the access layer to realize flow control, speed limiting and peak degradation. Logical layer access registration service and scheduling service to help deal with the reasonable allocation and high availability of internal RPC; The storage layer uses storage proxies to separate read and write data, batch write data, and aggregate data.

Such as log service, monitoring service and the underlying application management platform, cloud platform as any form of system are essential modules, serving the whole system.

FunData monitoring

FunData Log analysis

In addition to using the bypass system to protect the entire system, there is also a lot of work at the code level. For example, to improve the success rate and stability of internal system communication, we need to add retry mechanism; The unified SDK encapsulates queues, object storage, and registration services for internal use, and provides mechanisms such as connection pool, periodical update of service list, and registration of master.

So far, microservices architecture has been continuously practiced, and technology stacks around microservices architecture have emerged in an endless stream, such as Spring Cloud series microservices framework (as shown below), traditional ELK logging service, Prometheus/Falcon monitoring system, Etcd distributed storage, Kubernetes container service, etc.

However, the architecture of microservices is not a panacea. With the increase of systems, the management cost of the system will gradually increase, and the dependency between services will become more complex. For a large microservice system (with hundreds of nodes), engineers spend a lot of time understanding the call dependencies. Another point is I, in the development of the business logic, for the above mentioned retry mechanism, intelligent RPC scheduling and speed limit function such as flow, sometimes it’s not that I care about most, is the core business logic, but in the micro service architecture have to inject more protection code, or reference to a large general the SDK to complete these functions. For this reason, the concept of Service Mesh was proposed around 2017.

“Service Mesh is typically used to describe the microservice networks that make up these applications and the interactions between them. As service grids grow in size and complexity, they become increasingly difficult to understand and manage. Its requirements include service discovery, load balancing, failure recovery, metrics collection and monitoring, and often more complex operational requirements such as A/B testing, Canary publishing, traffic limiting, access control, and end-to-end authentication.

portal

What Is a Service Mesh?

Service Mesh is not a new architecture, but an upgrade of the microservice architecture. It abstracts the optimization of Service discovery, load balancing, and fault recovery at the system level in the architecture design, and serves as a layer of infrastructure for carrying various business logic. The grand idea is to let the algorithm focus on the algorithm and the business serve the business. Through Service Mesh, the optimization and protection of system Robust are separated from business development, which may be the difference between system engineer and business development engineer in the future.

Serverless

Finally, the Architecture of Serverless is something you can consider and choose. We also have a Serverless system practice internally.

The Serverless concept and the system architecture design portal: https://myslide.cn/slides/8295

There are several main considerations when considering the Serverless architecture

  • Cost – Serverless As a more fragmented architectural pattern, such services are generally billed for the number of core hours used, regardless of server deployment

  • Time-consuming computing or unstable traffic computing – The Serverless system generally carries a large number of machines and provides more than 100,000 cores. If some algorithms consume computing time or there are a large number of concurrent emergencies, you can consider connecting the service logic and algorithms to the Serverless system for processing to reduce operation and maintenance pressure.

  • Event-triggered Pipeline – For example, we will upload static resources such as images and videos to the object storage. A single image may be used in multiple scenarios, such as cropping, formatting, or enhancement. In this case, you can use Serverless service, which triggers various image processing algorithms when there is an image uploaded.

The portal is closed ~~ roll hearthstone back to town