In my last article, I looked in detail at some of the issues you need to think about designing a multi-million level concurrency architecture, as well as the solutions. In this article, we will focus on how to synchronously improve the performance of the architecture and reduce the average response time with a sufficient number of users in the workplace.

How do I lower RT

Continuing with the above figure, a request cannot be returned until the application in the Tomcat container completes execution. What does the request do during execution?

  • Querying the database
  • Accessing Disk Data
  • Perform memory operations
  • Invoking a remote service

Each step of these operations consumes time, and the current client request cannot be returned until all of these operations are completed, so the way to reduce RT is to optimize the processing of the business logic.

Optimization of database bottlenecks

When 18,000 requests come to the server and are received, the business logic is executed and the database is queried.

Each request has at least one query database operation, the need to query more than 3~5 times, we assume that according to the calculation of 3 times, then the database will form 54000 requests per second, assuming that a database server supports 10000 requests per second (there are many factors affecting the number of database requests, Such as the amount of data in the database tables, the system performance of the database server itself, and the complexity of the query statements), it would take 6 database servers to support 10000 requests per second.

In addition, there are other optimizations involved at the database level.

  • ERROR 1040: Too many connections Mysql: 1040: Too many connections Mysql: 1040: Too many connections

    show variables like '%max_connections%';
    Copy the code

    Concurrent connection request quantity is large, if the server is suggested to raise the value, to increase the number of parallel connection, of course this set up under the condition of the machine can support, because if the more the number of connections, between MySQL will provide connection buffer for each connection, will cost the more memory, so to adjust the value, not blindly improve the set value.

  • The data volume of the data table is too large, such as tens of millions or even hundreds of millions, in this case, SQL optimization is meaningless, because such a large amount of data query will inevitably involve operations.

    • Caching can solve the problem of high concurrency of read requests. Generally speaking, the read and write requests of the database also follow the 2/8 rule. Out of 54000 requests per second, about 43,200 are read requests, of which 90% can be solved by caching.

    • Sub-database sub-table, reduce the amount of single table data, the amount of single table data is less, then the query performance will naturally be effectively improved

    • Read/write separation, avoiding the impact of transaction operations on query performance

      • The write operation itself consumes resources

        Database write operations are IO writes, which usually involve operations such as uniqueness check, index building, and index sorting, consuming a lot of resources. The response time of a write operation is often several times or even tens of times that of a read operation.

      • Lock contention

        Write operations require locks, including table-level locks and row-level locks. These locks are exclusive. After a session occupies an exclusive lock, other sessions cannot read data, which greatly affects the data read performance.

        Therefore, MYSQL is often deployed in read/write separation mode. The master library is used to write data and some read operations requiring high timeliness, while the slave library is used to undertake most of the read operations. In this way, the overall performance of the database can be greatly improved.

  • Different types of data use different repositories,

    • MongoDB noSQL documented storage
    • Redis nosql key-value storage
    • HBase NOSQL is used for column storage, which is similar to a key-value database in nature.
    • Cassandra Cassandra is a distributed database from Apache that is highly scalable and can be used to manage large amounts of structured data
    • TIDB is an open source distributed relational database independently designed and developed by PingCAP. HTAP is a Hybrid Transactional and Analytical Processing (HTAP) converged distributed database product

Why does putting mysql data in redis cache improve performance?

  1. Redis stores data in K-V format. Time complexity is O(1), constant order, whereas the underlying implementation of mysql engine is B+TREE, time complexity is O(logn), logarithmic order. Redis is a little bit faster than Mysql.
  2. Mysql data stores are stored in tables. When searching for data, the table must be scanned globally or searched according to the index. This involves disk search. Redis is not so troublesome, itself is stored in memory, will be based on the location of the data in memory directly out.
  3. Redis is a single-thread multiplexed IO. Single-thread avoids the overhead of thread switching, while multiplexing IO avoids the overhead of IO waiting. In multi-core processors, improving the efficiency of the processor can partition data, and then each processor processes different data.
  • Pooling technology to reduce the performance cost of frequently creating database connections.

    Each database operation is preceded by a connection, then a database operation, and finally a connection release. This process involves the delay of network communication, the performance cost of frequently creating connection objects and destroying objects, etc. When the number of requests is large, this has a great impact on performance.

Disk data access optimization

For disk operations, nothing more than read and write.

For example, for the transaction system scenario, generally designed to the reconciliation file parsing and writing. For disk operations, optimization is nothing more than

  • The disk page cache can make full use of the system cache by using cache I/O to reduce the number of actual I/ OS.

  • Sequential reading and writing can be appended instead of random writing to reduce addressing overhead and speed up I/O writing.

  • SSDS replace HDDS, and SOLID-state drives have much higher I/O efficiency than mechanical hard drives.

  • If frequent read/write operations are required on the same disk space, mMAP (memory mapping) can be used instead of read/write to reduce the number of memory copies

  • In scenarios where synchronous writes are required, try to combine write requests rather than synchronously write each request to disk, that is, replace O_SYNC with fsync()

Use memory wisely

Make full use of the in-memory cache to store frequently accessed data and objects in memory to avoid reloading or performance degradation caused by database access.

Invoking a remote service

Remote service invocation, which affects IO performance.

  • A remote call waiting to return a result
    • Asynchronous communication
  • Network communication time
    • Network communication
    • Increasing Network Bandwidth
  • Stability of remote service communication

Asynchronous architecture

In microservices, complicated logic takes a long time to process. Under high concurrency, service threads are exhausted and no more threads can be created to process requests. To optimize this situation, in addition to continuous tuning in the program (database tuning, algorithm tuning, caching, etc.), we can consider making some adjustments in the architecture, first returning results to the client, so that users can continue to use the client’s other operations, and then the server’s complex logic processing module to do asynchronous processing. This asynchronous processing mode is suitable for the situation where the client is not sensitive to the processing result and does not require real-time processing, such as group email or message sending.

Asynchronous design solutions: multi-threaded, MQ.

Split application services

In addition to the above means, it is also necessary to split business systems into microservitization for the following reasons:

  • With the development of the business, the complexity of the application itself will increase, which will also cause entropy increase.
  • With more and more functions of business systems, more and more people are involved in development iterations. Many people maintain a very large project, which is prone to problems.
  • Horizontal capacity expansion is difficult for a single application system. In addition, due to limited server resources, all requests are sent to a certain server node in a centralized manner, resulting in excessive resource consumption and system instability
  • Testing and deployment costs are getting higher
  • .

Actually, ultimately want is a single application in performance bottlenecks are difficult to breakthrough, that is to say if we want to support 18000 QPS, certainly can’t support a single service node, so split the benefits of the service, can use more computer is the stage of a large-scale distributed computing network, through the network communication way to complete a set of business logic.

How to Split services

The question of how to split the services seems simple, and many students will say, just split the services.

However, in the actual implementation, some boundary problems will be found. For example, some data models can exist in module A or module B. How to divide them? Also, how should the granularity of service separation be divided?

In general, service unassembly is implemented on a business basis, and then based on DDD guides the demarcation of microservices boundaries. ** Domain-driven design is a set of methodologies that define domain models through domain-driven design methodology to determine business and application boundaries and ensure consistency between business models and code models. ** Whether DDD or microservices, follow the basic principle of software design: high cohesion and low coupling. Services have high internal cohesion and low coupling between services. In fact, a domain service corresponds to a set of functions, which must have some common features. For example, with order services, the clearer the boundaries of the domain are for creating orders, modifying orders, and querying order lists, the more cohesive the functions are and the less coupling between services.

Service unbundling also needs to be done based on the current state of the technical team and the company.

If it is a start-up team, there is no need to pursue micro-services too much, otherwise business logic will be too scattered, technical architecture is too loaded, and the team’s infrastructure is not perfect, resulting in a long delivery time, which will have a great impact on the development of the company. So there are several factors to consider when doing service unbundling.

  • Given the nature of the market in which the company’s business is currently engaged, if it is a sensitive project in the market, the initial stage should be to produce things first, and then to iterate and optimize.
  • The maturity of the development team, whether the team technology can undertake.
  • Are basic capabilities adequate, such as Devops, operations, test automation, etc. Whether the team has the ability to support the operation and maintenance complexity of running a large number of service instances, and whether the service can be well monitored.
  • The execution efficiency of the test team. If the test team cannot support automated testing, automatic regression, stress testing and other means to improve the test efficiency, it will inevitably bring about a significant increase in the test workload and lead to the delay of the project launch cycle

If it is for the transformation of an old system, it may involve more risks and problems, so before starting to change, need to consider several steps: the preparation stage before the separation, design the separation of the transformation plan, the implementation of the separation plan

  • Before splitting it, you should first sort out the current architecture, the dependencies of each module, and the interfaces

    The preparation stage is mainly to sort out the dependency relationship and interface, then you can think about how to disassemble, where the first cut, which can achieve the goal of quickly turning a complex monomer system into two smaller systems, and can minimize the impact on the existing business of the system. Try to avoid building a distributed monolithic application, a so-called distributed system that contains a bunch of tightly coupled services that must be deployed together. Forced demolition without clear analysis, may accidentally cut off the aorta, immediately out of A class A big failure, endless future problems.

  • The points of separation are different at different stages, and the focus of each stage should be focused

    The separation itself can be divided into three stages: the separation of core business and non-business parts, the adjustment and design of core business, and the separation of core business.

    • In the first stage, the core business was slimmed down, the non-core parts were cut open, and the size of the system needed to be dealt with was reduced.

    • Phase two. Re-design the core business parts as microservices;

    • In the third stage, the core business part is reconstructed and designed.

    There are also three ways to split: code split, deployment split, data split.

In addition, each stage needs to focus on one or two specific goals, otherwise too many goals will make it difficult to do one thing well. For example, the microservice unassembly of a system has the following goals:

  1. Performance indicators (throughput and Latency) : At least double the core transaction throughput (TPS: 1000->10000), reduce Latency by half (Latency: 250ms->125ms), and reduce Latency by half (Latency: 70ms->35ms).
  2. Stability indicator (availability, fault recovery time) : Availability >=99.99%, Class A fault recovery time <=15 minutes, quarterly times <=1.
  3. Quality indicators: write perfect product requirement documents, design documents, deployment operation and maintenance documents, and over 90% single test coverage of core transaction code and 100% automatic test case and scenario coverage, so as to achieve sustainable performance testing benchmark environment and long-term continuous performance optimization mechanism.
  4. Scalability index: the code, deployment, runtime and data are split reasonably. After the reconstruction of the core system, each block of business and transaction modules, as well as the corresponding data stores, can be scaled and expanded by adding machine resources at any time.
  5. Maintainability index: to establish a comprehensive monitoring index, especially the real-time performance of the whole link index data, covering all the key business and the state, shorten disposal monitoring alarm response time, cooperate with operations teams to achieve capacity planning and management, problems arise can pull system in a minute or roll back to a version available (start time < = 1 minute).
  6. Ease of use indicators: Realizing the new API interface through reconstruction is reasonable and simple, which greatly meets the use and needs of users at all levels, and customer satisfaction continues to rise.
  7. Business support indicators: For the development of new business requirements and functions, the development efficiency is doubled and development resources and cycle are halved under the premise of ensuring quality.

Of course, don’t expect to accomplish all of your goals at once. Choose one or two high-priority goals for each phase.

Problems with microservitization architecture

Microservices architecture is first a distributed architecture, second we expose and provide business service capabilities, and then we need to consider the various non-functional capabilities around those business capabilities. These scattered services themselves need to be managed and transparent to the callers of the services so that there is a functional requirement for registration discovery of the services.

Similarly, each service may have multiple instances deployed on multiple machines, so we need routing and addressing capabilities for load balancing to increase the scalability of the system. With so many different service interfaces provided externally, we also need a mechanism to control their unified access, and put some non-business policies into this access layer, such as permission related, which is the service gateway. At the same time, we found that with the development of business and some specific operation activities, such as second kill promotion, the flow will appear more than ten times the surge, at this time we need to consider the system capacity, strong and weak dependence between services, service degradation, circuit breaker, system overload protection and other measures.

The complexity of these microservices has resulted in application configuration and business configuration being scattered all over the place, so the need for a distributed configuration center has also emerged. Finally, after the decentralized deployment of the system, all calls across the process, we also need to have online link tracking, performance monitoring of a set of technologies to help us always understand the internal status and indicators of the system, so that we can analyze and intervene in the system at any time.

Overall Architecture

Based on the above overall analysis from micro to macro, we can basically design an overall architecture diagram.

  • Access layer, the gateway between external requests and internal systems, all requests must pass through the API gateway.

  • The application layer, also known as the aggregation layer, provides the aggregation interface for the related business, which is assembled by invoking the central service.

  • Mid-platform services, also the business services layer, provide business-specific interfaces to business latitude. China provides the whole architecture is the nature of the ability of reuse, commenting systems, for example, in the classroom and Gper goo bubble cloud community need, so when this comment system in order to design more reusability, cloud classroom or can’t coupling Gper community customization demand, so as a design reviews China, there is no need to do a very deep thinking, How to provide a reusable capability for different scenarios.

    You will find that when the service is mechanized, it becomes a BaAS service.

    Service providers provide customers (developers) with integrated cloud back-end services, such as file storage, data storage, push service, and authentication service, to help developers develop applications quickly.

What is high concurrency

To summarize what high concurrency is.

There is no specific definition of high concurrency. High concurrency mainly refers to the scenario with high burst traffic.

If in the course of an interview, or in a real job, your manager or your interviewer asks you how to design a system that can handle tens of millions of traffic, you should follow the method I described.

  • Make sure you have quantifiable metrics like QPS, DAU, total users, TPS, peak access
  • Based on these data, start to design the overall architecture
  • And then landing and executing

Macro metrics in high concurrency

A system that meets high concurrency needs to meet at least three macro-level goals rather than pursuing high performance:

  • High performance. Performance reflects the parallel processing capability of the system. With limited hardware input, improving performance means saving costs. At the same time, performance also reflects the user experience, with response times of 100 milliseconds and 1 second respectively, giving the user a completely different experience.
  • High availability: indicates the normal service time of the system. A year-round non-stop, trouble-free; Another one every three accidents, downtime, users must choose the former. In addition, if the system is only 90% usable, it can be a significant drag on business.
  • High expansion: indicates the expansion capability of the system. Whether the system can be expanded in a short time during peak traffic hours and handle peak traffic events more smoothly, such as Singles Day and celebrity divorce.

The micro indicators

Performance indicators

Performance indicators can measure existing performance problems and serve as a basis for performance optimization evaluation. Generally, the interface response time over time is used as an indicator.

Average response time: most commonly used, but the defect is obvious and insensitive to slow requests. For example, for 10,000 requests, 9900 of which are 1ms and 100 of which are 100ms, the average response time is 1.99ms. Although the average time is only increased by 0.99ms, the response time of 1% requests has increased by 100 times.

2. The quartile value of TP90 and TP99: the response time is sorted from small to large, TP90 represents the response time in the 90th quartile. The larger the quartile value is, the more sensitive it is to slow requests.

Availability metrics

High availability means that the system has a high failure free operation capability. Availability = average failure time/total system running time.

For high-concurrency systems, the basic requirement is to guarantee three or four nines. The reason is simple, if you can only do two nines, that means 1% failure time, like some big companies that generate over $100 billion GMV or revenue per year, 1% is a billion business impact.

Scalability metrics

In the face of sudden traffic, it is not possible to temporarily modify the architecture, and the fastest way is to add machines to linearly increase the system’s processing capacity.

For a business cluster or base component, scalability = performance increase ratio/machine increase ratio. The ideal scaling capability is: resource increase several times, performance increase several times. Typically, scalability is maintained at more than 70%.

But from the overall architecture perspective of a high-concurrency system, the goal of scaling is not just to design services to be stateless, because when traffic increases by 10 times, business services can be rapidly expanded by 10 times, but the database may become a new bottleneck.

Stateful storage services like MySQL are often technical difficulties for scaling, and if the architecture is not planned in advance (vertical and horizontal split), it will involve the migration of large amounts of data.

Therefore, high scalability needs to be considered: service clusters, databases, middleware such as caches and message queues, load balancers, bandwidth, and dependent third parties. When the concurrency reaches a certain level, each of these factors may become the bottleneck point of expansion.

practices

Universal design method

Scale-up

Its goal is to improve the processing capacity of a single machine, and the solution includes:

1. Improve the hardware performance of a single machine: increase the memory, CPU cores, storage capacity, or upgrade the disk to SSD heap hardware to improve.

2. Improve the software performance of a single machine: use cache to reduce I/O times, and use concurrent or asynchronous methods to increase throughput.

Scale-out

Because there will always be a limit to the performance of a single machine, horizontal scaling will eventually be introduced to further improve concurrent processing capability through cluster deployment, which includes the following two directions:

1. Hierarchical architecture: This is the advance of horizontal expansion, because high-concurrency systems tend to have complex businesses. Hierarchical processing can simplify complex problems and make horizontal expansion easier.

2. Horizontal expansion of each layer: stateless horizontal expansion and stateless fragment routing. Service clusters can be stateless, while databases and caches are stateful. Therefore, partition keys must be designed to fragment storage. You can also improve read performance by synchronizing primary and secondary data and separating read and write data.

High performance practices

1. In cluster deployment, load balancing is used to reduce single-node stress.

2. Multi-level cache, including the use of CDN for static data, local cache, distributed cache, etc., and the processing of hot key, cache penetration, cache concurrency, data consistency and other issues in the cache scene.

3, sub-database sub-table and index optimization, and with the help of search engines to solve complex query problems.

4. Consider using NoSQL databases, such as HBase and TiDB, but the team must be familiar with these components and have strong operation and maintenance capabilities.

Asynchronization, secondary processes are processed asynchronously through multi-threading, MQ, or even delayed tasks.

6. Traffic limiting, including traffic limiting at the front end, Nginx access layer, and server, needs to be considered first (for example, traffic limiting is allowed in seckill scenarios).

7, the flow of peak filling valley, through MQ to undertake the flow.

8. Concurrent processing, parallelization of serial logic through multithreading.

9, estimated calculation, such as grabbing red envelope scene, can be calculated in advance of the red envelope amount cache, send red envelope can be used directly.

10. Cache preheating: Preheating data to local cache or distributed cache through asynchronous tasks.

Reduce I/O counts, such as batch reads and writes to databases and caches, batch interface support for RPC, or eliminating RPC calls through redundant data.

12. Reduce the packet size during I/O, including using lightweight communication protocols, appropriate data structures, removing redundant fields in interfaces, reducing the size of cache keys, and compressing cache values.

13. Program logic optimization, such as blocking the predisposition of judgment logic of execution process with high probability, optimizing calculation logic of For loop, or adopting more efficient algorithms.

14, the use of various pooling technologies and pool size Settings, including HTTP request pool, thread pool (consider CPU intensive or IO intensive setting core parameters), database and Redis connection pool, etc.

15. JVM optimization, including generation and generation sizes, GC algorithm selection, etc. to minimize GC frequency and time.

16, lock selection, read more write less scenario use optimistic lock, or consider the way to reduce lock conflicts by segmental lock.

High availability practices

1. Peer failover. Both Nginx and the service Governance framework support access from one node to another after a node fails.

2. Failover of non-peer nodes through heartbeat detection and master/slave switchover (such as redis sentinel mode or cluster mode, MySQL master/slave switchover, etc.).

3. Timeout setting, retry strategy and idempotent design at interface level.

4. Downgrade processing: guarantee core services, sacrifice non-core services, fuse when necessary; Or if the core link fails, there are backup links.

5. Traffic limiting: Reject the requests that exceed the system’s processing capacity or return error codes.

6. Guarantee message reliability in MQ scenarios, including retry mechanism on producer side, persistence on broker side, ack mechanism on consumer side, etc.

7, grayscale release, can support small flow deployment according to the machine dimension, observe system logs and business indicators, such as smooth operation and then push the full volume.

8. Monitoring and alarm: comprehensive monitoring system, including the most basic CPU, memory, disk, network monitoring, as well as Web server, JVM, database, all kinds of middleware monitoring and business indicators monitoring.

9. Disaster recovery drill: Similar to the current “chaos engineering”, some destructive means are carried out on the system to observe whether local faults will cause availability problems.

The high availability solution mainly focuses on redundancy, trade-off, and system operation and maintenance. In addition, it requires a supporting duty mechanism and troubleshooting process. When an online problem occurs, it can be handled in a timely manner.

Highly scalable practices

1. Reasonable layered architecture: for example, the most common layered architecture of the Internet mentioned above, in addition, micro-services can be further stratified in finer granularity according to the data access layer and business logic layer (but the performance needs to be evaluated, there may be one more hop in the network).

2. Split storage layer: split vertically according to the business dimension and further split horizontally according to the data characteristic dimension (divided into databases and tables).

3. Separation of business layer: the most common one is disassembly according To business dimension (such as goods and services, order services, etc.), core interface and non-core interface, and request (such as To C and To B, APP and H5). Pay attention to [follow Mic learning structure] public account, get more boutique original