Not long ago, AS an architect, I completed the middle-stage construction of a business of a well-known FMCG enterprise. After the system went online, it experienced the traffic peak of Double 11 activity, and the overall operation was stable. When I am free recently, I will sort out the ideas of the structure and share them with you in this blog. Instead of going into every technical detail, do a brief review of the technologies, frameworks, and tools used for future reference.

Business architecture

In terms of business architecture, the system, as the business center, is mainly responsible for customer asset management, including customer cards, vouchers and other virtual assets. Provide services by exposing standard restful interfaces. Service callers include apps and small programs from their own channels, as well as partner channels, including CMB and Ali. And the system itself will call other business system interfaces in the company through the service gateway, such as synchronizing member information through the customer center interface.

According to the current statistics, the daily service volume of this business is about 7 million times, and will exceed 10 million times when there are activities. Most transactions take place during work hours, lunch breaks and mid-afternoon (tea).

As customer business details are involved, the business architecture is not detailed here.

The technical architecture

Springboot-based microservices architecture is used in this case. Combined with the enterprise’s own infrastructure, K8S container deployment is carried out, and the Kong API Gateway is used to manage the EXPOSED API interfaces in each business in a unified manner.

Kong API Gateway

With the popularity of micro-service architecture in enterprises, the original large and complete system is split into the middle-stage with smaller granularity, and most functions in the system are replaced by services provided in the form of restful APIS, which enables IT systems to respond to the challenges brought by business changes more quickly. However, with the increase of services, How to manage these services effectively becomes a challenge.

For small to medium sized projects, we typically adopt the Spring Cloud stack and choose The Spring Cloud Gateway as the service Gateway. However, for some large enterprises, governance of services, gateway performance, and other extension capabilities need to be considered globally.

In this case, the enterprise uses Kong as the API gateway. The middle platform will need to open apis for external use, register through the gateway console, add certificates, and generate Auth keys for use by related parties.

Kong has the following features that well meet the needs of large organizations for a service gateway:

  • Open source (the enterprise version of Kong is used in this case, with original services)
  • Sub-millisecond response latency, thanks to ultra-high performance based on Nginx and OpenResty
  • Single node 25K TPS
  • Authentication, authorization, traffic limiting, data transformation (in this case the member ID was added to the request header), logging, statistical analysis

Application architecture

The whole system adopts Java development back-end and VUE development front-end, and the application part is divided into 4 service components, all of which are containerized deployment, and exposed services through Ingress Controller load balancing:

  • Asset Service: Provides service interfaces related to customer assets
  • Asset consumer Service: MQ listening service that processes asset-related requests asynchronously
  • Console service: asset management o&M service interface for the console front end
  • Console front-end services: Console front-end applications developed using Vue (as shown below)

SpringBoot

Except for the console front end, the other three components are developed by the current mainstream Java microservices framework SpringBoot 2.3.4 (for stability, the latest version of 2.4 is not used).

In this case, through the development of the application framework, the unification of the data expression form in the system, as well as the standard data conversion, verification, message binding, error processing and other functions. The architect is responsible for the application framework. A simple, efficient and unified application framework can improve development efficiency, produce consistent code, and ensure delivery quality.

Application frameworks are beyond the scope of this article, but the following techniques, or third-party packages, are used to build most Of our SpringBoot applications.

#### Customized MyBatis Data layer framework using MyBatis, in large applications, MyBatis can help programmers better control the data layer interaction, and tune. You can normally configure MyBatis in Applicaion.yml, but when you need MyBatis to support more custom features such as multi-database support, you can do so by defining the SqlSessionFactory Bean.

    @Bean
    public SqlSessionFactory sqlSessionFactory(DataSource dataSource) throws Exception {
        SqlSessionFactoryBean sfb = new SqlSessionFactoryBean();
        sfb.setDataSource(dataSource);
        sfb.setVfs(SpringBootVFS.class);
        Properties props = new Properties();
        props.setProperty("dialect", dataConfiguration.getDialect());
        props.setProperty("reasonable", String.valueOf(dataConfiguration.isPageReasonable()));
        PageHelper pagePlugin = new PageHelper();
        pagePlugin.setProperties(props);
        Interceptor[] plugins = {pagePlugin};
        sfb.setPlugins(plugins);

        ResourcePatternResolver resolver = new PathMatchingResourcePatternResolver();

        sfb.setMapperLocations(resolver.getResources("classpath*:mappers/"+ dataConfiguration.getDialect()+"/*.xml"));
        sfb.setTypeAliasesPackage("com.xxx.bl.core.data.model");

        SqlSessionFactory factory = sfb.getObject();
        factory.getConfiguration().setMapUnderscoreToCamelCase(true);
// factory.getConfiguration().addInterceptor(new CoreResultSetHandler());
        factory.getConfiguration().setCallSettersOnNulls(dataConfiguration.isCallSettersOnNulls());
        return factory;
    }
Copy the code

Use the Logback logging component

With the Logback logging framework, you can specify different logging levels and different appenders for different Spring profiles in different environments in the Logback configuration file. In addition, the spring-cloud-starter-sleuth dependency is introduced, and the traceId is set to ensure that the same traceId is displayed for all logs on the request link, facilitating collaborative troubleshooting of production problems between systems. In addition, asynchronous logging helps reduce I/O congestion.

   <springProfile name="stg">
        <root level="error">
            <appender-ref ref="STDOUT"/>
            <appender-ref ref="SAVE-ERROR-TO-FILE-STG"/>
        </root>
        <logger name="org.xxx" level="error" additivity="false">
            <appender-ref ref="STDOUT"/>
            <appender-ref ref="ASYNC-SAVE-TO-FILE-STG"/>
        </logger>        
    </springProfile>
    <springProfile name="prod">
        <root level="error">
            <appender-ref ref="STDOUT"/>
            <appender-ref ref="SAVE-ERROR-TO-FILE-PROD"/>
        </root>
        <logger name="org.xxx" level="error" additivity="false">
            <appender-ref ref="ASYNC-SAVE-TO-FILE-PROD"/>
        </logger>
    </springProfile>
Copy the code

SSL encryption and password security

Full-link transmission encryption has become an essential measure in enterprise security. SSL encryption for SpringBoot applications can be implemented by introducing CA-issued (or self-signed) JKS certificates in the CLASspath and simply configuring them in the Application configuration file.

  ssl:
    enabled: true
    key-store: classpath:xxx.net.jks
    key-store-type: JKS
    key-store-password: RUIEIoUD
    key-password: RUIEIoUD
    require-ssl: true
Copy the code

Passwords stored in plain text in configuration files are also insecure. You can encrypt passwords used in jasypt profiles or use key-Vault schemes directly, such as in this case Azure Key Vault in Microsoft cloud or Cyberark Conjur in on-premise IDC.

We do not use Spring Webflux to support Reactive because it increases the complexity of development, and although Webflux improves the blocking mechanism of the Web container, it does not fundamentally solve the blocking problem of high-concurrency requests.

In this case, a three-node RabbitMq image cluster is set up as the message middleware and supported by the application framework to achieve synchronous asynchronous switchover of services. We register externally provided services in the database and read into the Redis cache when the application is started. When a request comes in, the API code determines the response mode of the request: synchronous or asynchronous. Synchronous requests are processed directly, while asynchronous requests are sent to RabbitMq for asynchronous consumption by encapsulated consumer components, culminating in peak peaking.

For the developers, they only need to pay attention to the business logic development of the service, and the application framework uniformly handles the synchronization and asynchronous switchover of the service, the exception handling when the message is sent or fails, and the maintenance of the dead-letter queue.

Dockerfile

The four components in the case need to be containerized to create dockerfiles for the SpringBoot application and the Vue application.

A typical SpringBoot application of Dockerfile is as follows. In general, large organizations build private image repositories. Pulling images from private repositories is faster and saves CICD time.

FROM openjdk:11-jre
#FROM cargo.xxx.net/library/openjdk:11-jre
ARG JAR_FILE=console-service/build/libs/*.jar
COPY ${JAR_FILE} app.jar
EXPOSE 9002
EXPOSE 9003
ENTRYPOINT [ "java"."-jar"."/app.jar" ]
Copy the code

Vue application Dockerfile is as follows, also added SSL certificate, transmission encryption:

FROM cargo.xxx.net/library/nginx:stable-alpine
COPY /dist /usr/share/nginx/html/console
COPY nginx.conf /etc/nginx/nginx.conf
ARG KEY_FILE=stg.xxx.net.key
ARG PEM_FILE=stg.xxx.net.pem
COPY ${KEY_FILE} /etc/ssl/certs/cert.key
COPY ${PEM_FILE} /etc/ssl/certs/cert.pem
EXPOSE 80
CMD [ "nginx"."-c"."/etc/nginx/nginx.conf"."-g"."daemon off;" ]
Copy the code

Here are some considerations when writing a Dockerfile:

  • Basic image: The official image is recommended if possible
  • Select an appropriate version: If the selected base image is too large, more resources will be consumed after the startup, affecting system performance. If it is too small, critical functions may be missing.
  • Make use of caching: Write the unchangeable contents of a dockerfile to the front of the dockerfile.

## Database architecture In the account data of hundreds of millions, transaction data of tens of billions of systems, need to adopt separate database and table scheme. In this case, the database architecture scheme of MyCat+MySQL is adopted. Mycat agent Master and Slave, flexible Master/Slave switchover. The Slave can be used as the Master hot backup or a read library to separate read and write data. In addition to serving as quasi-real-time backup, the standby database can also serve as an operation and maintenance database or provide data extraction for big data platforms.

At the same time, the equipment room design of 1 active / 2 slave / 1 standby is adopted

  • Semi-synchronization from Master to Slave ensures data consistency in the Slave database.
  • If the Master is abnormal, run mycat to switch to the Slave, and the Slave becomes the new Master
  • After the Master recovers, set the original Master to Slave. After data synchronization is complete, switch to the official Master

High availability of mycat

Mycat uses K8S container operation, and k8S Service is used to achieve myCAT load balancing and high availability of myCAT cluster. If the myCAT container node is abnormal, the application automatically connects to another Mycat node.

The bulk of operations on the database are read operations, which generally account for more than 70% of all operations. Therefore, it is necessary to do read/write separation. If you do not do read/write separation, then it is a big waste of library. Mycat is easily configured to achieve read and write separation. Read operations are performed in the secondary library to improve resource utilization, and write operations are performed in the primary library to reduce the pressure on the primary library.

Depots table

  • Vertical repository: Separate data into different databases and servers by function. For example, data from different business fields such as accounts, assets and transactions are stored in different libraries to disperse pressure, reduce mutual influence and coupling, and independent modules are released independently
  • Horizontal branch library: when vertical branch library cannot meet the requirements, the model is segmented horizontally to disperse the data of the same entity and different scopes into different libraries, so as to maintain the number and pressure of single library and increase the number of connections to achieve the purpose of horizontal expansion.

Hot and cold data scheme

Hot data cache

  • For frequently used hot data, such as customer information that frequently uses App, appropriately add database Query Cache to improve database query performance.
  • At the application layer, memory cache such as Redis is used to reduce the request response time, improve system fluency and improve customer experience.
  • Read/write separation enables the secondary library to provide data query services, improving the utilization of hardware resources of the secondary library, reducing the read pressure of the primary library, and increasing the write performance of the primary library. Improve overall efficiency.

Cold data Archiving

  • Archive cold data that is rarely used or not used, such as historical transactions and historical cards, to improve the performance of the database.
  • It can also provide the historical transaction query function that is less frequently used and use the standby database to provide services.
  • It is recommended to divide transaction data into database and table by date. Daily transaction can be divided into one or more pieces. For historical transaction, such as transaction one year ago, regular migration and archiving can be carried out to improve database performance.

DEVOPS containerized deployment with K8S

####DEVOPS pipeline In this case, the application code is obtained from Github code base through Jenkin-based CICD platform, constructed by gradle (NPM is used to build the front end), and then deployed to K8S container platform after the image is made by Dockerfile.

In the process of continuous integration, security checks, compliance checks, and unit testing (JUnit for SpringBoot applications and Jest for Vue front-end applications) are added to ensure the quality of each release. ConfigMap ConfigMap is used to separate the configuration information of an application from that of an application. This method not only enables the application to be reused, but also enables more flexible functionality through different configurations. In this case, when the SpringBoot application is deployed in K8S, the application.yml file is mounted as a ConfigMap file. Note that SpringBoot reads the configuration file in the CLASspath first. Therefore, you need to exclude the configuration file when you export the SpringBoot application JAR package and specify the mounted application configuration file based on the container startup command parameters.

-spring.profiles.active=prod
-spring.config.location=/config/application.yml
Copy the code

### THE K8S container is deployed on the K8S deployment platform and can specify the initial resources, as well as the number of nodes, for each service. For example, we apply the initial configuration for SpringBoot, 2core 4G resource configuration, and the number of nodes is 20.

We can roll the number of PODS according to the need. Without causing service unavailability.

Alternatively, we can use elastic scaling to trigger the container to scale based on some key metric, such as the CPU usage of the container, as a threshold. In this case, the flexible scaling mechanism allows more PODS to be provided to the business service component during work hours and midday peak business hours, and in the evening, pods are withdrawn from the business component and provided to the service component that requires batch processing and asynchronous consumption.

Operation, maintenance and monitoring

ELK

ELK is a solution, not a piece of software, and three letters stand for three software products. E stands for Elasticsearch, which stores and retrieves logs. L stands for Logstash, which is responsible for collecting, filtering and formatting logs. K stands for Kibana and is responsible for the display statistics and data visualization of the log.

Dynatrace

Dynatrace is probably the best application performance management tool (APM) available today. It can monitor infrastructure such as servers and K8S containers, automatically discover and monitor dynamic microservices running within the container, understand how they perform and communicate with each other, and immediately detect underperforming microservices. In our case, we added the monitoring data we needed to focus on by customizing the dashboard.

Dynatrace also automatically identifies services and provides more refined detection data, making it a great help for development or operations personnel to locate problems.

Some think

  • Code intrusion problems brought by the database partition and table scheme: Although MyCat+MySQL has realized the partition and table physically, it has brought invasion problems for development, requiring special table structure design for the shard key, and additional consideration of the use of the shard key during the query to improve the query efficiency. For other things, such as transaction processing, due to the relationship of branch library, we no longer rely on transaction, but through the final consistency of data, error compensation and other ways to handle.
  • Future database selection: MyCat+MySQL adds complexity to database operation and maintenance. In the future, for applications with large data scale, if hardware resources allow, NewSQL solutions such as TiDB can be considered to replace them.
  • JVM optimization: After the application went live, there was an occasional Long GC problem under high concurrency, which was resolved by analyzing dump files and optimizing memory usage. In addition, for applications with large memory variations, you can also consider using JDK13 and enable ZGC.
  • Cache optimization: In this case, the service configuration information is cached by REDis, and redis needs to be read every time the service responds, which causes a lot of pressure on Redis. By introducing Guava cache, establishing a local cache copy to achieve multi-level cache, and setting a reasonable expiration time, the pressure on Redis can be significantly reduced.
  • Low code through An application framework: The investment in an application framework is well worth it, and the characteristics of a low code platform can be partially realized by concentrating common problems in an application framework. Developers can also focus more on the implementation of business logic.
  • Development management: By enabling every developer to fully understand the application framework and develop a common Pattern for solving similar problems, development efficiency can be significantly improved and low quality code can be reduced.

Today, I first recorded here, with the deepening of practice, I believe there will be more new supplements, but also welcome everyone to share experience.