What is high concurrency

High Concurrency is one of the most important things to consider when designing an Internet distributed system architecture. It usually refers to the idea of designing a system that can handle many requests in parallel.

Some commonly used indicators related to high concurrency include Response Time, Throughput, Query Per Second (QPS), number of concurrent users, Java concurrent programming learning notes, etc.

Response time: Time for the system to respond to a request. For example, if it takes 200ms for the system to process an HTTP request, this 200ms is the system response time.

Throughput: The number of requests processed per unit of time.

QPS: number of response requests per second. In the Internet domain, the distinction between this metric and throughput is not so obvious.

Number of concurrent users: specifies the number of concurrent users who use system functions. For example, in an instant messaging system, the number of simultaneous online users to some extent represents the number of concurrent users of the system.

2. How to improve the concurrency of the system

There are two methods to improve the concurrent capability of the Internet distributed architecture: Scale Up and Scale Out.

Vertical expansion: improve the processing capacity of single machine. There are two ways to scale vertically:

(1) Enhance the hardware performance of the single machine, for example, increase the number of CPU cores (such as 32 cores), upgrade better network cards (such as 10 gigabit), upgrade better hard disks (such as SSD), expand hard disk capacity (such as 2T), and expand system memory (such as 128G);

(2) Improve the performance of single-machine architecture, such as Cache to reduce I/O times, asynchronous to increase single-service throughput, and lockless data structure to reduce response time;

In the early stage of the rapid development of Internet business, if budget is not a problem, it is strongly recommended to use the method of “enhancing the performance of single machine hardware” to improve the system concurrency capacity, because at this stage, the company’s strategy is often to develop business to rush time, and “enhancing the performance of single machine hardware” is often the fastest method.

There is a fatal drawback to improving both hardware and architecture: there is always a limit to standalone performance. So the Internet distributed architecture design high concurrency ultimate solution or horizontal expansion.

Horizontal scaling: Linearly scaling system performance by increasing the number of servers. Horizontal extension is a requirement for system architecture design. How to design horizontal extension in each layer of architecture and the common practice of horizontal extension in each layer of Architecture of Internet companies are the focus of this paper.

Common layered Architecture of the Internet

The common Distributed Architecture of the Internet is as follows:

(1) Client layer: The typical caller is browser or mobile application APP

(2) Reverse proxy layer: system entrance, reverse proxy

(3) Site application layer: realize core application logic, return HTML or JSON

(4) Service layer: If servitization is realized, there is this layer

(5) Data-cache layer: cache accelerates access to storage

(6) Data-database layer: database solidifies data storage

How is the horizontal expansion of each level of the whole system implemented?

Fourth, the practice of hierarchical level extension architecture

Horizontal extension of the reverse proxy layer

Horizontal extension of the reverse proxy layer is implemented through DNS polling: The DNS-server is configured with multiple resolution IP addresses for a domain name. Each DNS resolution request to access the DNS-server returns these IP addresses in a polling manner.

When nginx becomes a bottleneck, simply increasing the number of servers, deploying new Nginx services, and adding an external IP address can extend the performance of the reverse proxy layer to achieve theoretically unlimited concurrency.

Horizontal extension of the site layer

Horizontal extension of the site layer is achieved through “nginx”. Multiple Web backends can be set up by modifying nginx.conf.

When the Web backend becomes a bottleneck, you can extend the performance of the site layer by increasing the number of servers, deploying new Web services, and configuring the new Web backend in the Nginx configuration to achieve theoretically unlimited concurrency.

Horizontal scaling of the service layer

Horizontal scaling of the service layer is achieved through “service connection pooling”.

When the site layer calls the rpc-server of the downstream service layer through the Rpc-Client, the connection pool in the RPC-Client will establish multiple connections with the downstream service. When the service becomes a bottleneck, as long as the number of servers is increased, new service deployment is added, and new downstream service connections are established at the RPC-Client. You can extend the performance of the service layer to achieve theoretically infinitely high concurrency. If you want to gracefully expand the service layer, you may need to configure automatic service discovery in the center.

Horizontal scaling of the data layer

In the case of a large amount of data, the data layer (cache, database) involves the horizontal expansion of data, the data originally stored on a server (cache, database) horizontal split to different servers, in order to achieve the purpose of expanding the system performance.

There are several common horizontal splitting methods of Internet data layer. Take database as an example:

Split horizontally by scope

Each data service stores a range of data, as shown in the figure above:

User0 library, storage uid range 1-1kW

User1 library, storage UID range 1KW-2kW

The benefits of this scheme are:

(1) The rule is simple. The service can route to the corresponding storage service only by checking the UID range;

(2) Good data balance;

(3) It is easy to expand and can add a UID [2KW, 3KW] data service at any time;

Shortage is:

(1) The load of requests may not be balanced. Generally speaking, newly registered users will be more active than old users, and the service request pressure of large range will be greater;

Hashed horizontally

Each database stores part of the hash data of a certain key value. The following figure is an example:

The user0 library, which stores even uid data

The user1 library, which stores odd uid data

The benefits of this scheme are:

(1) The rule is simple. The service only needs to hash the UID to route to the corresponding storage service.

(2) Good data balance;

(3) Good uniformity of request;

Shortage is:

(1) It is not easy to extend. If a data service is extended, data migration may be required when the hash method is changed;

It is important to note that horizontal splitting to expand system performance is fundamentally different from the way master/slave synchronous read/write separation is used to expand database performance.

Extend database performance with horizontal splitting:

(1) The amount of data stored on each server is 1/ N of the total, so the performance of a single server will also be improved;

(2) There is no intersection of data on N servers, and the union of data on that server is the complete set of data;

(3) The data is split horizontally to N servers, theoretically, the read performance is expanded by N times, and the write performance is also expanded by N times (actually far more than N times, because the amount of data on a single machine is changed to the original 1/ N);

Extending database performance with synchronous master/slave read/write separation:

(1) The amount of data stored on each server is the same as the total amount;

(2) The data on n servers are the same and are all complete;

(3) Theoretically, the read performance is expanded by n times, but the write performance remains unchanged.

The horizontal splitting of the cache layer is similar to the horizontal splitting of the database layer, which is mostly in the way of scope splitting and hash splitting, so it is no longer expanded.

Five, the summary

High Concurrency is one of the most important things to consider when designing an Internet distributed system architecture. It usually refers to the idea of designing a system that can handle many requests in parallel.

There are two methods to improve system concurrency: Scale Up and Scale Out. The former vertical extension can improve the concurrency by improving the hardware performance of the single machine, or improve the performance of the single machine architecture, but the performance of the single machine is always limited, the ultimate solution of Internet distributed architecture design with high concurrency is the latter: horizontal extension.

In the hierarchical architecture of the Internet, the practices of the expansion at different levels are different:

(1) The reverse proxy layer can be extended horizontally through “DNS polling”;

(2) The site layer can be extended horizontally through Nginx;

(3) The service layer can be extended horizontally through the service connection pool;

(4) The database can be expanded horizontally according to the data range or data hash;

(5) Concurrent programming in Java

After the implementation of horizontal expansion of each layer, the system performance can be improved by increasing the number of servers, so that the theoretical performance is infinite.