Welcome to Tencent cloud community, get more Tencent mass technology practice dry goods oh ~

  • Carrying capacity is the reason for the existence of distributed systems

When an Internet service gains popularity, the most obvious technical problem is that the server is very busy. When 10 million people visit your site every day, no matter what kind of server hardware you use, it can’t be hosted on a single machine. Therefore, when Internet programmers solve server-side problems, they must consider how to use multiple servers to provide services for the same Internet application, which is the source of the so-called “distributed system”.

However, the problem caused by large numbers of users accessing the same Internet service is not simple. On the face of it, the most basic requirements for meeting many users’ requests from the Internet are the so-called performance requirements: users respond slowly to web pages, or slow actions in online games, etc. These requirements for “service speed” actually include the following components: high throughput, high concurrency, low latency, and load balancing.

High throughput means that your system can handle a large number of users at the same time. We’re looking at the number of users that the entire system can serve simultaneously. This throughput is certainly not possible with a single server, so multiple servers need to work together to achieve the desired throughput. In the collaboration of multiple servers, how to effectively use these servers, so as not to make a part of the server become a bottleneck, thus affecting the processing capacity of the whole system, this is a distributed system, in the architecture of the need to carefully weigh the problem.

High concurrency is an extended requirement for high throughput. When we are hosting a large number of users, we want each server to work as well as it can without unnecessary consumption and waiting. However, software systems are not simply designed to handle “as many” tasks at once. Many times, our program will incur additional consumption by choosing which task to handle. This is also a problem that distributed systems solve.

Low latency is not a problem for sparse services. However, if we need to be able to return results quickly even when a large number of users are accessing, this is much more difficult. Because in addition to a large number of user access may cause requests in the queue, there may be too long queue length, resulting in memory exhaustion, bandwidth full and other spatial problems. If a retry policy is adopted because of queuing failure, the overall delay will be higher. So a distributed system uses a lot of request sorting and distribution to get more servers to handle requests as quickly as possible. However, due to a large number of distributed system, must need to put the user’s request after many distribution, the delay may be because the distribution and transfer operations, become higher, so the distributed system in addition to distribute the requests, but also try to find a way to reduce the number of distribution level, so that the request can be processed as soon as possible.

Since Internet users come from all over the world, they may come from different networks and lines with different delays in physical space, and may come from different time zones in terms of time. Therefore, to effectively cope with the complexity of user sources, it is necessary to deploy multiple servers in different Spaces to provide services. At the same time, we also need to allow simultaneous requests to be efficiently hosted by multiple different servers. The so-called load balancing is an inherent task of distributed systems.

Because distributed system, is almost the most basic method to solve the Internet business carrying capacity problem, so as a server programmer, master distributed system technology becomes extremely important. However, the problem of distributed system can not be easily solved by learning to use several frameworks and libraries, because when a program is run on one computer, it becomes coordinated run on countless computers at the same time, which will bring great difference in development, operation and maintenance.

Distributed system to improve the capacity of the basic means

Hierarchical model (routing, proxy)

The simplest way to use polymorphic servers to collaborate on computing tasks is to have each server complete all requests and then send them to any server at random. The earliest Internet applications, the DNS polling is this: when a user to enter a domain name to access a web site, the domain name will be interpreted as one of multiple IP addresses, then the site access request, has been sent to the corresponding IP server, so that multiple servers (multiple IP address) can handle a large number of user requests together.

However, simply asking for random forwarding will not solve everything. For example, many of our Internet businesses require users to log in. After logging in to a certain server, the user will initiate multiple requests. If we randomly forward these requests to different servers, the user’s login status will be lost, resulting in some request processing failures. It is not enough to simply rely on a layer of services to forward, so we will add a batch of servers, which will forward the cookies of users or the login credentials of users to the later specific business processing servers.

In addition to the need to log in, we also found that a lot of data is needed to deal with the database, and our data often can only be centralized in a database, otherwise in the query will lose the data stored on other servers. So we also tend to separate the database into a batch of dedicated servers.

At this point, a typical three-tier structure emerges: access, logic, and storage. However, this three-tier result is not a panacea. For example, when we need to let the user interactive online (online games is typical), then split in different logic on the server online status data, is impossible to know each other, so we need to do a special similar interactive server dedicated system, let the user login, records a data to it at the same time, also indicates that a user login on a server, And all interactive operations, to go through the interactive server, can correctly forward the message to the target user’s server.

For example, when we use the online forum (BBS) system, it is impossible for us to write articles into only one database, because too many people’s reading requests will drag the database to death. We often write to different databases by forum section, or to multiple databases simultaneously. In this way, the article data can be stored on different servers to handle the large number of operation requests. Users, however, when reading the article, then you need a special program, where to find the specific articles on a server, then we will set up a special agent layer, put all the articles requests to it first, by it in accordance with our default storage plan, to find the corresponding database access to data.

Based on the above example, distributed systems typically have three layers, but are often designed with multiple layers based on business requirements. In order to forward requests to the right process, we have designed many processes and servers dedicated to forwarding requests. These processes are often named as Proxy or Router, and a multi-tier structure often has a variety of Proxy processes. These proxy processes, in many cases, connect back and forth through TCP. However, TCP is simple, but it is not easy to recover after a failure. And the network programming of TCP is also a little bit complicated. As a result, people have devised a better mechanism for interprocess communication: message queues.

Although a powerful distributed system can be built through various Proxy or Router processes, its management complexity is very high. So people on the basis of the layered mode, come up with more methods to make the layered mode of the program more simple and efficient methods.

Concurrency model (multi-threaded, asynchronous)

When we write server-side programs, we know that most programs will handle multiple requests arriving at the same time. So we can’t compute the output from a simple input like HelloWorld. Because we’re going to get a lot of inputs at the same time, we’re going to have to return a lot of outputs. In the process of these processes, we often encounter “wait” or “block” situations, such as our program to wait for the database processing results, waiting to request results from another process and so on… If we processed requests one after the other, this idle wait time would be wasted, resulting in increased response latency for users and an extreme drop in overall system throughput.

So there are two typical solutions in the industry for handling multiple requests at the same time. One is multithreading, the other is asynchronous. In early systems, multithreading, or multi-processing, was the most common technique. This technique is easier to code because the code in each thread must be executed in sequence. But because you have multiple threads running at the same time, you can’t guarantee the order of code between them. This is a serious problem for logic that needs to process the same data, as in the simplest case of displaying the number of views of a particular news item. Two ++ operations run at the same time, and it is possible that the result only adds 1 instead of 2. So in multithreading, we often have to add a lot of data locks, and these locks in turn may cause thread deadlocks.

Therefore more popular than multi-thread asynchronous callback model in the following, in addition to the multithreaded deadlock problem, asynchronous can also solve the multithreading, thread repeatedly switching lead to unnecessary overhead problem: each thread requires a separate stack space, in a multithreaded parallel operation, the stack data may require a copy of the back and forth, the extra consumed CPU. And because each thread takes up stack space, memory consumption is also huge when there are a large number of threads. The asynchronous callback model solves these problems well, but the asynchronous callback is more like the “manual” parallel processing and requires developers to figure out how to do it themselves.

Asynchronous callbacks are based on non-blocking I/O operations (network and file), so we don’t have to “get stuck” in a function call when we call read/write functions, but return “data or no data” results immediately. The Epoll technology of Linux makes use of the mechanism of the underlying kernel, so that we can quickly “find” connection \ files with data to read and write. Because each operation is non-blocking, our program can handle a large number of concurrent requests with only one process. Because there is only one process, so all data processing, its order is fixed, it is impossible to appear in multithreading, two functions of the statement staggered execution of the situation, so there is no need for a variety of “locks”. From this perspective, asynchronous non-blocking techniques greatly simplify the development process. With only one thread and no overhead such as thread switching, asynchronous non-blocking is the preferred choice for many systems with high throughput and concurrency requirements.

Int epoll_create (int size); // Create a handle to epoll. Size tells the kernel how many listeners there areCopy the code
Int epoll_ctl(int epfd, int op, int fd, struct epoll_event *event);Copy the code
int epoll_wait(int epfd, struct epoll_event * events, int maxevents, int timeout);
Copy the code

buffering

In Internet services, most user interactions need to return results immediately, so there is a requirement for delay. And similar services such as online games, the delay is required to be reduced to tens of milliseconds. So to reduce latency, buffering is one of the most common techniques used in Internet services.

In early WEB systems, if every HTTP request was read and written to the database (MySQL) once, the database would quickly stop responding because the connection count was full. Because the general database, support the number of connections are only a few hundred, and the WEB application of concurrent requests, easily to several thousand. This is also a lot of bad design of the website people a lot of dead card the most direct cause. To minimize connection and access to the database, many buffering systems have been designed to store the results of a query from the database to a faster facility and, if there are no associated changes, read directly from there.

The most typical WEB application caching system is Memcache. Due to PHP’s threaded structure, there is no state. Early PHP itself didn’t even have a way to manipulate “heap” memory, so persistent state had to be stored in another process. Memcache is a simple and reliable open source tool for storing temporary state. Many PHP applications now use processing logic that reads data from the database and then writes it to Memcache. When the next request comes in, try to read the data from Memcache first, which can greatly reduce the database access.

However, Memcache itself is a separate server process, which itself does not have special clustering capabilities. In other words, these Memcache processes are not directly organized into a unified cluster. If a Memcache is not enough, we manually use code to assign which Memcache process to which data. For a truly large distributed site, managing such a buffer system is a tedious task.

So people started thinking about designing more efficient buffering systems: in terms of performance, every Memcache request had to travel over the network to pull data from memory. This is definitely a bit wasteful, because the requester’s own memory can also hold data. This is where many caching algorithms and techniques make use of requester memory, the simplest of which is the LRU algorithm, which places data in heap memory in a hash table structure.

Memcache’s lack of clustering is also a pain point for users. So a lot of people started designing how to keep the data cache on different machines. The simplest idea is called read/write separation, which means that each cache write is written to multiple buffer processes, whereas reads can be read to any process at random. This works well when business data has a significant read/write imbalance gap.

However, not all businesses can simply use read-write separation to solve the problem, such as some online interactive Internet businesses, such as communities, games. The data read and write frequencies of these services are not very different and require high latency. So the idea was to combine local memory with the memory cache of remote processes, allowing data to be cached at two levels. At the same time, a data is not replicated in all cache processes at the same time, but distributed in multiple processes according to a certain law. The most popular algorithm used for this distribution is known as “consistent hashing.” The advantage of this algorithm is that when a process fails, there is no need to reposition all the cached data in the entire cluster. As you can imagine, if our data cache distribution is simply modulating the number of processes by the ID of the data, then once the number of processes changes, the location of each process where the data is stored may change, which is not good for the fault tolerance of the server.

Orcale has a product called Coherence, which is designed to run on caching systems. This product is a commercial product that supports collaboration using local memory caching and remote process caching. The cluster process is fully self-managed and supports user-defined computations (processor capabilities) in the process where the data cache resides, making it not just a cache but a distributed computing system.

Storage Technology (NoSQL)

I believe CAP theory is familiar to everyone. However, in the early days of Internet development, when everyone was still using MySQL, many teams were racking their brains on how to make the database store more data and bear more connections. Even in many businesses, where the primary data storage is files, the database becomes a secondary facility.

However, when NoSQL emerged, people suddenly realized that the data format of many Internet businesses is so simple that many times the root does not need the complex tables of relational databases. Requests for indexes are often based only on primary indexes. And more complex full-text search, the database itself can not do. Therefore, NoSQL is now the preferred storage facility for a large number of highly concurrent Internet services. The earliest NoSQL databases are MangoDB, and the most popular one seems to be Redis. Even some teams consider Redis part of the buffering system, in effect recognizing Redis’s performance advantages.

In addition to being faster and carrying more capacity, NoSQL stores data in such a way that it can only be retrieved and written to a single index. The distribution advantage of this requirement constraint is that we can define the process (server) where the data is stored by this primary index. In this way, the data of one database can be conveniently stored on different servers. In the inevitable trend of distributed systems, the data storage layer has finally found a way to distribute.

Access to distributed essence theory: High Throughput, high availability, scalability (2)

This article is from the official wechat account of HANu

reading

Tencent Cloud Game-tech Salon — Global GAME voice solution Tencent cloud Game-tech Salon work review — Tencent GAME cloud ecological product planning and the latest progress to avoid 1.26 billion marketing resources plunder, Tencent cloud intelligent security preparation double 12

This article has been published by Tencent Cloud Technology community authorized by the author

The original link: cloud.tencent.com/community/a…