Stateless services for highly available architecture design

Talk about architectural design

The occurrence of accidents is the result of quantity accumulation, and nothing is as simple as it seems. In the process of software operation, with the increase of users, failure will occur sooner or later without considering high availability. High availability design should not be considered in advance, and high availability is a huge knowledge

Do you want to know what I would consider when designing a high availability system? During the architectural design process

Consider the pitfalls of solution selection and, in the worst case, consider emergency solutions for failure
The system needs to be monitored and aware when a fault occurs
Automated recovery plans are required, automated pre-processing warning plans are required
At the code level, you need to consider processing speed, code performance, and error handling
Also consider minimizing failures: service degradation, limiting traffic, circuit breakers
, etc.

This article focuses on how to ensure high usability of stateless services at the architectural level

u

Stateless service: The service does not store data (except cache) at any time. It can be destroyed and created at will. User data will not be lost and can be switched to any copy without affecting users

“

The high availability of stateless services aims to ensure that data is not lost under any circumstances, services are not failed, and in the event of a failure of some services, the impact is minimal and recovery can be fast

It can be considered from these aspects

Redundant deployment: Deploy at least one more node to avoid single points of failure
Vertical expansion: increases stand-alone performance
Horizontal expansion: The system can quickly expand capacity when traffic surges

Redundant deployment

In a single point architecture, as the amount of data increases, the load on a single point is too heavy, which leads to service breakdown and unavailability. For stateless services, you can deploy services on multiple nodes to disperse the load

For how to schedule incoming requests, you can refer to the way of load balancing to ensure full utilization of server resources

Stateless service: a service that does not need to store data and does not lose data even after a node is restarted
Load balancing: An algorithm that distributes a large number of requests to different nodes

Load balancing for stateless services

You can use the four algorithms provided in load balancing

Random equalization algorithm: given the list of back-end servers, random request, the larger the amount of data tends to be balanced
Polling algorithm: request backend server in turn

The problem of the first two algorithms is that when the back-end server has different load pressures or server configurations, it cannot guarantee the multi-allocation with low pressure and the small allocation with high pressure, so it is introduced

Weighted round robin algorithm: Assigns a higher weight to the backend server based on the workload of the backend server to reduce the risk of downtime
Weighted random method: like the weighted rotation training algorithm, the difference is that the allocation is random according to the weight. For example, there are many cases with the same weight, random access, it has the same problem with the random algorithm. When the data amount is large, it tends to be balanced, and when the data amount is small, it is possible to repeatedly visit the same machine with the same weight
[Weighted] Minimum connection number algorithm: this is the most intelligent algorithm, based on the current number of connections to the server, easier to hit the fast server

The algorithm above is used for stateless applications and is needed if communication state is to be saved

Source ADDRESS hashing algorithm: Hash the source address to ensure that the same request is sent to the same machine without repeated connection establishment

How to select a load balancing algorithm?

First, discard the random algorithm and use the basic rotation algorithm for the simplest configuration. This algorithm applies to the scenario where the server configurations are consistent, for example, when VMS are used, the server configurations can be dynamically adjusted. At the same time, ensure that dedicated VMS do not have other applications deployed on them

However, multiple applications are often installed on the server, so consider choosing between a weighted rotation and a minimum number of connections

Weighted rotation training appliesShort connectionScenarios, such as HTTP services, in K8S because each POD is independent, the default service policy is unweighted rotation

The minimum number of connections applies to long connections, such as FTP

If the scenario without cookie function is considered in the system architecture, the source address hash algorithm can be used to map the source IP to the same RS all the time, which is called session persistence mode in K8S, and forward to the same POD each time

Advice:

If the container is directly handed over to K8S for scheduling, cookies are used for session persistence, and default rotation is used for algorithm. Specific scheduling will be introduced in k8S articles in the future
For long-connection applications (FTP, socket, or for downloading connections), select the weighted minimum number of connections
Short-link applications (static websites, micro-service components, etc.) choose weighted rotation training, use cookies to maintain sessions, and reduce the design of sessions. This will not only improve the code complexity, but also increase the load on the server, which is not conducive to distributed applications

Identification of high concurrency applications

The primary metric QPS is the number of responses processed per second, such as 10W PV per day

Formula (100000 * 80%)/(86400*20%) = 4.62 QPS (peak QPS)Copy the code

How it works: 80% of your daily visits are focused on 20% of the time, which is called peak time.

For example, the system I made hosts up to 5W machines, and each machine has PV once a minute, with relatively uniform time. That is

((60*24)*50000)/(86400)=833 QPS
Copy the code

Generally, the magnitude of hundreds can be called high concurrency. The information found on the Internet is that the system with more than 100 million PV per day is generally 1500QPS and the peak value of 5000QPS.

In addition to QPS, there are service response time and number of concurrent users for reference

When the server load is high, the processing speed is slow, the network is disconnected, the service processing fails, and exceptions are reported. Specific problems need to be analyzed

You can obtain server performance status through monitoring, and dynamically adjust and retry to ensure service availability and reduce maintenance costs. In general, vertical expansion can be considered when a single server is under great pressure

Vertical scaling

Vertical scaling is to increase the processing capacity of a single server, mainly in three ways

Server upgrade: Focuses on the CPU, memory, swap, disk capacity, or NIC
Hardware performance: SSD disk and system parameter adjustment
Architecture adjustment: asynchronous, cache, lock-free structure is used at the software level

The way to enhance the performance of a single machine is the fastest and easiest way, but there are limits in the performance of a single machine. At the same time, if a failure occurs during the deployment of a single machine, it is fatal to the application. We should ensure that the application is always in the available state, that is, to ensure the reliability of the five nys as the saying goes

Horizontal automatic expansion

Knowing the limitations of a single machine, consider scaling horizontally

Horizontal scaling is the time when the pressure increases, add new nodes to share the pressure, but it isn’t enough to deploy more, for the continued growth of the business, there is always one day will break through the service pressure limit, if the surge of traffic scene, artificial response would be caught off guard, so I need a kind of automatic telescopic means

For private cloud deployments, the scheduler can be manually implemented to detect system status and scale through the IaAS layer
You can also use the elastic scaling service provided by the cloud server
For containers, you can configure automatic scaling and scheduling policies when the IaAS layer is flexibly scaled or there are sufficient nodes to prevent single-node failures

u

Iaas infrastructure as a Service (IAAS) is a service that manages hardware resources such as servers, storage, and networks

“

Note: Elastic scaling applies to stateless service scenarios

In addition, stateless machines are not enough to bear the request traffic, and the threshold for horizontal scaling is usually thousands of QPS. At the same time, there will be pressure on the database. Therefore, it is recommended not to deploy stateful services on the horizontal scaling server

The stateful service stress dispersion will be covered in a later article

CDN and OSS

For a website, the user interaction page is a special service, which contains many static resources, such as pictures, videos and pages (HTML/CSS/JS). These resources need to be downloaded on site when users request them. The download speed determines the loading speed

At this level, we can consider using CDN content distribution network (see [XXX] for details) to cache front-end static data to edge servers

u

Edge server (edge node), can be understood as the server that interacts with the user, can also be understood as the server node that is close to the user, because the proximity to the user, reduces the network transmission time

“

If the CDN web service is used, you can bind the HTTPS certificate to the CDN, configure the return source timeout and 301/302 status codes in the return source policy, and intelligently compress web pages and customize error pages

Oss is a special storage scheme in the form of objects that can theoretically store an unlimited number of files

Consider using the OSS object store and CDN to store media resources on the object store, or compress and archive cold data on the OSS

Most common video websites use OSS, and the data of Weibo n years ago should be archived into the object storage

conclusion

This article introduces the common high availability architecture designs for stateless services, which are

Redundant deployment
Six algorithms of load balancing and algorithm selection
Advantages and disadvantages of vertical scaling
Horizontal expansion and automatic horizontal scaling
Which services can use CDN and OSS

Note that stateless applications should not store sessions, nor should they store data

This paper introduces six algorithms of load balancing, but it does not introduce the specific implementation of each algorithm, which is left for you to study, these schemes will have certain difficulty in the actual use of the service is not available any of the fault cause is broad and profound knowledge, programmers are not only writing code

This is just part of the high availability scenario for stateless services. What else do you know about stateless services and design at the code level?

Sometimes in more demanding cases, with no more server resources, how can you improve code performance with limited servers?

Follow me and know more about the technology you don’t know