How to design high concurrency, high performance, high availability, and high security architectures?

High concurrency

Is a capacity concept, the maximum number of tasks a service can accept. According to the 80/20 law, the architecture needs to account for 20% of the burst traffic in that segment, but not every system needs to account for 20% and needs to be analyzed based on specific sample data

Timeout, retry, fusing, current limiting, double transmission, load balancing, automatic elastic scaling
Space swap time (multi-level cache architecture – consider the hit ratio of each layer)
Time-for-space compression (solve bandwidth problem)
asynchronous
Service split + service isolation
parallelization

A high performance

It’s a concept of speed, the number of tasks that can be handled per unit of time

Measure index

Response Time (RT)

Average user request wait time

Our most commonly used metric, the average response time (AVG), captures the average processing capacity of the service interface. Avg. Request Waiting time of a server = Total Number of requests/Total Number of Requests Avg. Request waiting time of a user = Avg. Request waiting time of a server x Number of concurrent requests The average response time is slow unless a serious problem occurs during a period of time. Because high concurrent applications have very large requests, the impact of long tail requests is quickly averaged out, resulting in slower requests for many users, but this is not reflected in the average time spent. Another metric that's often used to solve this problem is the percentile.Copy the code

percentile

Requests that exceed N% are returned in X time. For example, TP90=50ms, which means that requests exceeding 90th will be returned within 50ms. We generally divide it into TP50, TP90, TP95, TP99, TP99.9 and other segments. The higher the requirement for the high percentile value, the higher the requirement for the stability of the system response capability. The goal is to eliminate the long-tail requests that seriously affect the systemCopy the code

Number of concurrent

Is how many user requests the system can handle at the same time; Designing for response time is generally a panacea. As response times decrease, the number of requests that can be processed at the same time inevitably increases. It is important to note that even a seckill system will have a small number of concurrent requests that eventually reach a node after many layers of filtering

throughput

Is the number of requests processed per unit of time, is a widely used performance evaluation metric

From a business perspective: throughput can be measured in requests per second, pages per second, etc
From a network perspective: throughput can be measured in bytes per second
From an application point of view: throughput metrics reflect the pressure of server maturity, that is, the load capacity of the system

The throughput of a system is closely related to the CPU consumption, bandwidth consumption, IO and memory resource consumption of a request processing. Throughput is more concerned with data volume, system Throughput, QPS is more concerned with the number of queries per second

Second open rate

Generally speaking, different page opening criteria can be set according to the business situation, for example, the percentage of data less than 1 second is open in seconds.

Noun explanation

QPS (Queries Per Second) Number of Queries Per Second TPS (Transactions Per Second) HPS (Http requests Per Second) PV (Page View views) DAU (Daily active) QPS= Number of Concurrent requests/Average User waiting time QPS=1/ Average server request processing time Number of Concurrent requests =QPS* Average User waiting time Service A invokes service B + service C simultaneously, QPS=2, TPS=1 Single interface request, single machine, QPS= TPS. Throughput = (number of requests)/(total time) number of requests= Total time = Test end time - Test start time Test end time = MAX(Elapsed Time) = Elapsed Time MAX(Elapsed Time) = Elapsed Time MAX(Elapsed Time) = Elapsed Time MAX(Elapsed Time) = Elapsed Time Throughput is the optimization of parallel execution through rational use of computing resources. Our usual optimizations focus on response speed, because once the response speed increases, the overall throughput naturally increases as well. But for highly concurrent Internet applications, both response speed and throughput are required. These applications are billed as high-throughput, high-concurrency scenarios, where users have poor latency tolerance, and we need to find a balance between using limited hardware resources.Copy the code

Conversion example

Ab - n100 - c10 http://localhost:11000/test/a # - n to the total number of concurrent request times - c Document Path: / test/a Document Length: 27 bytes Concurrency Level: 10 Time taken for Tests: 6.007 seconds Complete requests: 100 Failed requests: 0 Write errors: 0 Total transferred: 16600 bytes HTML transferred: 2700 bytes Requests per second: [#/ SEC] (mean) # SQL = SELECT * from 'cmf_theme_file' where 'theme' = 'simpleboot3' and 'file' = 'public/private' LIMIT 1 [#/ SEC] (mean) # [ms] (mean, across all concurrent requests) # 2.70 [Kbytes/ SEC] Received Connection Times (ms) min mean[+/-sd] Median Max Connect: 0 0 0.1 0 0 Processing: 503 517 41.8 508 918 Waiting: 502 514 35.2 507 849 Total: 503 517 41.8 508 918 Percentage of the requests served within a certain time (MS) # 50% 508 66% 512 75% 516 80% 523 90% 533 95% 538 98% 546 99% 918 100% 918 (Longest Request) Average request waiting time of a server = Time spent by all requests/Total number of requests =6.007/100=0.06007s Average Request waiting time of users = Average request waiting time of servers x Number of Concurrent requests =0.06007s x 10=0.6007s QPS= Number of concurrent requests/Average Request waiting time of users =10/0.6007s=16.65 requests /s QPS=1/ Average server request processing time =1/0.06007s=16.65 requests /s Number of concurrent requests =QPS x Average User request waiting time =16.65 requests /s x 0.6007s=10 Throughput=(Total number of requests)/(total time)= 100/6.001s =16.65 /sCopy the code

High availability

measure

7×24 hours no interruption no abnormal services are provided

describe	N个9	Availability level	Annual downtime
Basic available	2 and 9	99%	87.6 hours
High availability	3 and 9	99.9%	8.8 hours
Automatic recovery capability is available	4 to 9	99.99%	53 minutes
Highly available	5 and 9	99.999%	5 minutes

Optimization means

Redundancy (machine (hybrid cloud), service (stateless)), solving single point problem, solving split brain problem. Failover. Cold, hot backup current limiting, demotion, fusing, elastic design flexibility, out (for general need not personalized recommendation, if no can read some static text to display, the next layer of tolerance error) isolation (seconds kill a separate server) contingency plans, exercise automatic monitoring, alarm, monitoring, business monitoring, hardware service monitoring (SQL, Call times, delay, error rate), each end monitoring, buried point monitoring) live in the same city (try to live in the same city, server delay is small), live in different places, two places and three centersCopy the code

High safety

Front-end code security:

Code confusion, adding unassociated code, anti – debugging

Backend code, server security:

Server level protection, risk control, blacklist and whitelist, anti-crawler, WAF firewall (Ddos), encryption, HTTPS HTTP2 HTTP3 brush protection, limit and weight protection, captchao, SQL security, social engineering security, address hiding

Other:

Office computer personnel, computer, email, chat messaging security, development tools, middleware, weak passwords

Commonly used optimization methods and ideas

The first paragraph in the user and browser side, mainly responsible for sending requests, and receive response data for calculation rendering display to the user; The second section is on the network, responsible for the transmission of request data and response data; The third section is on the website server side, responsible for processing the requested data (executing programs, accessing databases, files, etc.) and returning the results to the first path: JS, CSS compression merge (Varinish), removing useless comments (based on security considerations) picture: With JPEG lossy format, nondestructive format using Webp streaming, static resource CDN, go for the browser cache size images, app cache The second path: bandwidth (i.e., according to the size of the response data at a time, multiplied by the number of PV, divided by the corresponding peak period, which is roughly estimate the bandwidth requirements) connectivity The third path: Code best practices optimization of business processes optimal asynchronous, MQ parallelization, divide and conquer, lockless programming, coroutine file static application layer cache, distributed cache (multi-level cache) database optimization (for commercial database, separate libraries and tables, hot and cold data separation, read and write separation, NoSql- Massive data, NewSql, Hardware upgrade (CPU, memory, bandwidth, SD disk array, number of servers, rooms, hardware load) High performance RPC framework, serialization framework NIO, connection pooling high performance NGINx, HTTP2, HTTP3, GZIP/BR and parameter optimizationCopy the code

Summary of 10 years of development and management experience:

Failure oriented design, the architect needs to think of himself as A pessimist. He needs to consider various scenarios of failure and external attack at the beginning of the system design phase. Failure oriented should be part of the system design,Plan A,Plan B. The easiest way in the world to solve a computer problem is to "happen" not to have to solve it. Problems can be avoided by optimizing business processes and designing appropriately. Redundant thinking (most problems can be solved by adding an intermediate layer) divide-and-conquer thinking (divide a large task into many smaller tasks to solve) spatial and temporal interchangeability thinking concurrent thinking Design mode thinking pooling thinking pessimism thinkingCopy the code

Reference:

Blog.csdn.net/z69183787/a…
www.cnblogs.com/kumufengchu…
Blog.csdn.net/qq_37651267…
Juejin. Cn/post / 684490…