When a Web system gradually increases from 100,000 daily visits to 10 million, or even more than 100 million, the pressure on the Web system will be increasing. In this process, we will encounter a lot of problems. To address these performance pressures, we need to implement multiple levels of caching at the Web system architecture level. In different stress phases, we encounter different problems and build different services and architectures to solve them.

Web Load Balancing

Load Balancing, simply defined as assigning work to our cluster of servers, is important to protecting the back-end Web servers.

There are many strategies for load balancing, but let’s start with the simple ones.

1. HTTP redirection

When a user sends a request, the Web server returns a new URL by modifying the Location tag in the HTTP response header, and the browser continues to request the new URL, effectively redirecting the page. Load balancing is achieved through redirection. For example, when we download the PHP source package, when we click on the download link, it will return a nearby download address in order to solve the problem of download speed in different countries and regions. The HTTP return code for the redirect is 302, as shown below:

If you use PHP code to do this, here’s how:

This redirection is very easy to implement and you can customize various policies. However, it does not perform well on a large scale. In addition, the user experience is not good, the actual request is redirected, increasing the network latency.

2. Reverse proxy load balancing

The core work of the reverse proxy service is to forward HTTP requests. It acts as the intermediary between the browser and the background Web server. Because it works at the HTTP layer (application layer), the seventh layer of the network’s seven-layer structure, it is also known as “seven-layer load balancing.” There are many kinds of software that can do reverse proxy. One of the most common is Nginx.

Nginx is a very flexible reverse proxy software that can customize forwarding policies and assign weight to server traffic. A common problem with reverse proxies is session data stored by Web servers, because load balancing policies randomly allocate requests. Requests from the same login user cannot be guaranteed to be allocated to the same Web machine, resulting in the failure to find sessions.

There are two main solutions:

1. Configure forwarding rules for the reverse proxy to ensure that requests from the same user are sent to the same machine (by analyzing cookies). Complex forwarding rules consume more CPUS and increase the burden on the proxy server.

2. It is recommended to store session information with an independent service, such as Redis /memchache.

You can also enable caching for the reverse proxy service. If caching is enabled, it will increase the burden of the reverse proxy. Therefore, use caching with caution. This load balancing strategy is simple to implement and deploy and performs well. However, it has a “single point of failure” problem, which can cause a lot of trouble if it fails. Moreover, as Web servers continue to proliferate in the later years, they may themselves become a system bottleneck.

3. IP load balancing

IP load balancing services work at the network layer (change IP address) and transport layer (change port, layer 4) and perform much better than working at the application layer (layer 7). It modifies the IP address and port information of packets at the IP layer to achieve load balancing. This approach is also known as “four layer load balancing”. The common load balancing mode is Linux Virtual Server (LVS), which is implemented using IP Virtual Server (IP Virtual Service).

When receiving an IP packet from a client, the load balancing server changes the destination IP address or port of the IP packet and delivers the packet to the internal network intact. The packet is then forwarded to the actual Web server. After the actual server processes the data packet, it sends the data packet back to the load balancing server. The load balancing server changes the destination IP address to the user IP address and returns the data packet to the client.

The above method is called LVS NAT. In addition, THERE are also LVS-RD (direct routing) and LVS-TUN (IP tunnel). However, there are certain differences between the three methods.

IP load balancing performs much better than Nginx’s reverse proxy, which processes packets up to the transport layer without further grouping and forwards them directly to the real server. However, it is complicated to configure and set up.

4. DNS load balancing

The Domain Name System (DNS) resolves Domain names. The URL of a Domain Name is an alias of a server. The actual mapping is an IP address. A domain name can be configured to correspond to multiple IP addresses. Therefore, DNS can also serve as a load balancing service.

This load balancing strategy is simple to configure and has excellent performance. However, rules are not freely defined, and it is troublesome to change mapped IP or machine failures, and there is a delay in DNS taking effect.

5. DNS/GSLB load balancing

Content Delivery Network (CDN) commonly used by us is actually a further step on the basis of mapping the same domain name to multiple IP addresses through GSLB (Global Server Load Balance, Global load balancer maps IP addresses of domain names based on specified rules. Generally, the IP address nearest to the user is returned to the user based on geographical location to reduce the hop consumption between routing nodes in network transmission.

In the “Search up” figure, the actual process is that the Local DNS (LDNS) obtains the top-level Root Name Server (for example,.com) from the Root Name Server, obtains the authorized DNS of the specified domain Name, and then obtains the IP address of the actual Server.

In the Web system, CDN is generally used to solve the loading problem of large static resources (HTML /Js/Css/ pictures, etc.), so that the content that is relatively dependent on network download can be as close to users as possible and improve user experience.

For example, I visited an image on imgcache.gtimg.cn (Tencent’s self-built CDN doesn’t use the qq.com domain name because it prevents HTTP requests from carrying extra cookie information), and the IP I got was 183.60.217.90.

This approach, like the previous DNS load balancing, not only provides excellent performance, but also supports multiple policies. However, set-up and maintenance costs are very high. First-tier Internet companies establish their own CDN services, while small and medium-sized companies generally use the CDN provided by a third party.

The establishment and optimization of cache mechanism in Web system

Now that we have covered the external network environment of the Web system, let’s focus on the performance of our Web system itself. Our Web sites will encounter many challenges as traffic increases, and solving these problems is not just about adding more machines, but building and using appropriate caching mechanisms.

In the beginning, our Web system architecture might look like this, and there might be only one machine at a time.

Let’s start with the basic data store.

MySQL database internal cache use

MySQL cache mechanism, starting from MySQL internal, the following content will focus on the most common InnoDB storage engine.

1. Build appropriate indexes

The simplest is to create an index, which can quickly retrieve data when the table is large, but there are costs. First, it takes up a certain amount of disk space, of which composite indexes are the most prominent and should be used with caution. They can produce indexes that are even larger than the source data. Secondly, the data insert/update/delete operations after the index is established will increase the time because the original index needs to be updated. Of course, in fact, our system as a whole is dominated by SELECT queries, so the use of indexes still has a significant performance boost.

2. Database connection thread pool cache

If every database operation request requires a connection to be created and destroyed, this is a huge overhead for the database. To reduce this type of overhead, thread_cache_size can be configured in MySQL to indicate how many threads are reserved for reuse. When there are not enough threads, it is created again, and when there are too many idle threads, it is destroyed.

In fact, there is a more radical approach, using PConnect (database Long connection), once a thread is created, it is maintained for a long time. However, in the case of high traffic and a large number of machines, this usage is likely to result in “database connection exhaustion” as connections are established and not reclaimed until max_connections (maximum number of connections) of the database is reached. Therefore, the use of long connections often requires implementing a “connection pool” service between THE CGI and MySQL, controlling the CGI machine to “blindly” create connections.

There are many ways to do this, but in PHP, I recommend swoole, a web communication extension of PHP.

3. Innodb cache Settings (Innodb_buffer_pool_size)

Innodb_buffer_pool_size This is an in-memory cache used to hold indexes and data. If the machine is MySQL owned, 80% of the machine’s physical memory is recommended. In the case of fetching table data, it can reduce disk IO. Generally, the larger the value, the higher the cache hit ratio.

4. Database/table/partition.

MySQL database tables are generally subjected to millions of data levels, and when the data volume increases, the performance will be significantly reduced. Therefore, when we predict that the data volume will exceed this level, it is recommended to divide the database/table/partition and other operations. The best practice is to design the service as a separate database and table storage mode at the beginning of construction, fundamentally eliminate the risk in the middle and late period. However, some convenience is sacrificed, such as tabular queries, and maintenance complexity is increased. But by the time we get to the tens of millions or more, it’s all worth it.

Ii. MySQL database service construction

A MySQL machine, in fact, is a high-risk single point, because if it dies, our Web service won’t be available. Also, as Web system traffic continued to increase, one day we found that one MySQL server could not support us, and we needed to use more MySQL machines. Many new problems arise when multiple MySQL machines are introduced.

1. Set up the primary and secondary MySQL databases for backup

This approach is purely to solve the “single point of failure” problem of switching to the slave library in the event of a failure of the master library. However, this is actually a bit of a waste of resources because the slave library is effectively idle.

2. Separate read and write from primary database and read from secondary database.

The two databases do read and write separation, the master library is responsible for writing classes, the slave library is responsible for reading operations. In addition, if the master library fails, it still does not affect the read operation, and at the same time, all the read and write can be temporarily switched to the slave library (need to pay attention to the traffic, may be due to heavy traffic, the slave library also brought down).

3. Master and Master.

The two MySQL databases are each other’s slave libraries and also the master libraries. This scheme not only achieves the pressure of traffic diversion, but also solves the “single point of failure” problem. When one fails, there is another set of services available.

However, this scheme can only be used in two machine scenarios. If the business is still growing fast, you can choose to separate the business and set up multiple masters.

Data synchronization between MySQL database machines

Every time we solve a problem, a new problem is born on top of an old solution. When we have multiple MySQL servers, it is very likely that there will be delays in data between the two libraries during peak business hours. In addition, network and machine load will also affect the delay of data synchronization. We’ve seen it take days for the slave database to catch up with the master database in a special scenario where daily visits approach 100 million. In this scenario, the slave library is basically useless.

So, to solve the synchronization problem, is the next point we need to focus on.

MySQL comes with multithreaded synchronization

MySQL5.6 is starting to support data synchronization between master and slave libraries. However, the limitation is also quite obvious, can only be in the unit of library. MySQL data is synchronized through the binlog log. The operations written to the binlog by the master database are sequential, especially if the SQL operation contains the modification of the table structure and other operations, which will affect the subsequent SQL statement operations. Therefore, data synchronization from the library must go through a single process.

2. Their own implementation of binlog parsing, multithreading write.

In the unit of database tables, the binlog is parsed to synchronize data from multiple tables. This can make data synchronization more efficient, but there are also write order issues if there are structural relationships or data dependencies between tables. This approach can be used for some stable and relatively independent data tables.

Most of China’s first-tier Internet companies speed up data synchronization efficiency in this way. A more radical approach is to parse the binlog directly, ignore the table units and write directly. However, the implementation of this approach is complex, and its scope is more limited. It can only be used in some scenario-specific databases (no table structure changes, no data dependencies between tables and other special tables).

Build a cache between the Web server and the database

In fact, to solve the problem of high traffic, not only at the database level. According to the 80/20 rule, 80% of requests focus on only 20% of hot data. Therefore, we should establish a caching mechanism between the Web server and the database. This mechanism can be used as a disk cache, also can be used in memory cache. They block most hot data queries in front of the database.

1. Make the page static

When a user visits a page on a web site, most of the content on the page may remain the same for a long time. A news story, for example, rarely changes its content once published. In this way, static HTML pages generated by CGI are cached locally to the disk of the Web server. The local disk files are returned directly to the user, except for the first time, when they are retrieved through a dynamic CGI query database.

This seemed perfect when Web systems were small. But once the Web system gets big, like when I have 100 Web servers. There would be 100 files on these disks, which would be a waste of resources and difficult to maintain. At this time, some people will think, can centralize a server to store, hehe, instead of looking at the following way of caching, it does so.

2. Single memory cache

From the example of page statics, we can see that having a “cache” built into the Web machine’s native memory is not maintainable and causes more problems (in fact, PHP’s APC extension allows manipulating the Web server’s native memory through Key/value). Therefore, the memory caching service we choose to build must also be a separate service.

Memory cache selection, mainly redis/memcache. In terms of performance, there is not much difference, and in terms of feature richness, Redis is superior.

3. Memory cache cluster

Once we had built a single memory cache, we had a single point of failure, so we had to turn it into a cluster. A simple way to do this is to add a slave as a backup machine. But what if the number of requests is really high and we find that the cache hit ratio is low, requiring more machine memory? Therefore, we prefer to configure it as a cluster. For example, something like Redis Cluster.

Redis cluster In a Redis cluster, Redis are the master and slave groups of each other, and each node can accept requests, facilitating cluster expansion. A client can send a request to any node and return the content directly if it is “responsible” for the content. Otherwise, look up the node that is actually responsible for Redis, then tell the client the address, and the client requests again.

This is transparent to the client that uses the caching service.

The memory cache service is at risk when switching. Switch from A cluster to cluster in the process of B, B must ensure that the cluster in advance “warm up” (B cluster of hot spots in the data memory, should try to be the same as the A cluster, otherwise, the switch moment request content in great quantities, can’t find in the B cluster memory cache, flow directly impact the backend database service, is likely to lead to database downtime).

4. Reduce database “writes”

All of the above mechanisms reduce the number of database “reads”, but the write operation is also a big stress. Write operations, while not reducable, can be reduced by merging requests. At this point, we need to establish a change synchronization mechanism between the memory cache cluster and the database cluster.

First, the change request takes effect in the cache, so that the external query display normal, and then put these SQL changes into a queue to store, queue full or every period of time, merge into a request to update the database.

In addition to changing the system architecture to improve write performance, MySQL can also adjust the disk write policy by configuring the innodb_flush_log_at_trx_COMMIT parameter. If the cost of the machine permits, you can choose an old Redundant Arrays of Independent Disks (RAID) or a newer Solid State drive (SSD) to solve the problem on the hardware level.

5. No storage

Regardless of whether the database is read or written, when the traffic increases further, it will eventually reach the “manpower shortage” scenario. Continue to add the cost of the machine is relatively high, and can not really solve the problem. At this time, part of the core data, you can consider using NoSQL database. Redis is a memory cache and can also be used as a storage, allowing it to directly drop data to disk.

In this way, we can separate some of the frequently read and write data from the database and put it in our new Redis storage cluster, which further reduces the pressure on the original MySQL database. At the same time, because Redis is a memory level Cache, the read and write performance will be greatly improved.

Domestic first-tier Internet companies adopt many solutions similar to the above solutions in terms of architecture. However, the cache service they use is not necessarily Redis. They will have richer alternatives and even develop their own NoSQL services according to their own business characteristics.

6. Empty node query problem

When we have built all the services mentioned above, we think the Web system is strong. We’re still saying the same thing. New problems will come. Empty node queries are requests for data that do not exist in the database. For example, if I request to query a non-existent personnel information, the system will search from all levels of cache level by level, and finally find the database itself, and then come to the conclusion that it can not be found, and return to the front end. Since all levels of cache are invalidated, this request is very resource-intensive, and can impact system services if a large number of empty nodes are queried.

In my previous work experience, I suffered from it. Therefore, in order to maintain the stability of Web system, it is necessary to design an appropriate empty node filtering mechanism.

The way we did it was to design a simple record map. The existing records are stored in a memory cache, so that if there are empty nodes to query, the cache level is blocked.

Remote deployment (geographically distributed)

With that architecture in place, is our system powerful enough? The answer, of course, is no, there is no limit to optimization. Although the Web system on the surface, seems to be more powerful, but to give the user experience is not necessarily the best. Because the northeast students, visit a website in Shenzhen, he will still feel some network distance on the slow. At this point, we need to do remote deployment, let the Web system closer to the user.

First, core concentration and node dispersion

The classmate that has played large-scale net swims can know, net swims is to have a lot of area, it is to divide according to region commonly, for example Guangdong special area, Beijing special area. If a player in Guangdong goes to the Beijing zone to play, then he will feel significantly better than in the Guangdong zone card. In fact, the names of these regions already indicate where their servers are located, so if a player in Guangdong tries to connect to a server in Beijing, of course the network will be slower.

When a system or service is large enough, you have to start thinking about offsite deployment. Make your service as close to your users as possible. We have already mentioned that static resources on the Web can be stored in the CDN and then distributed “all over the country” through DNS/GSLB. However, CDN only solves the problem of static resources, but does not solve the problem that the huge back-end system services are only concentrated in a fixed city.

At this point, offsite deployment begins. Remote deployment generally follows the following rules: centralized core and decentralized nodes.

1. Core concentration: In the actual deployment process, there are always some data and services that cannot be deployed multiple sets, or the deployment cost is huge. For these services and data, the same set will still be maintained, and the deployment location will be selected as a regional center, and communication with each node will be carried out through the internal dedicated line of the network.

2. Node dispersion: Some services are deployed in multiple sets and distributed on nodes in various cities so that users can choose the nearest node to access services.

For example, we chose Shanghai as the core node, and Beijing, Shenzhen, Wuhan, and Shanghai as the dispersed nodes (Shanghai itself is a dispersed node). Our service architecture is shown as follows:

It should be added that the Shanghai node and the core node in the figure above are in the same machine room, while the other scattered nodes have their own machine rooms.

The home has a lot of large net swim, it is to follow afore-mentioned structure roughly. They will put the user’s core account with small amount of data in the core node, and most of the online game data, such as equipment, tasks and other data and services in the regional node. Of course, there is also a caching mechanism between the core node and the region node.

Node Dr And overload protection

Node DISASTER recovery (Dr) means that a mechanism needs to be established to ensure that services are still available when a node fails. There is no doubt that the most common Dr Method here is to switch to a nearby city node. If the Tianjin node of the system fails, then we switch the network traffic to the nearby Beijing node. For load balancing, it may be necessary to switch traffic to several nearby geographical nodes simultaneously. On the other hand, core nodes need to perform disaster recovery and backup by themselves. Once a core node fails, national services will be affected.

Overload protection means that a node has reached its maximum capacity and cannot receive more requests. The system must have a protection mechanism. If a service is fully loaded and continues to receive new requests, the result may be downtime, affecting the entire node service. In order to ensure the normal use of at least the majority of users, overload protection is necessary.

To solve the overload protection, generally two directions:

1. Denial of service. When the load is full, the system does not accept new connection requests. For example, queue in online game login.

2. Traffic flows to other nodes. In this case, the system implementation is more complex and involves the problem of load balancing.

summary

With the increase of access scale, Web systems will gradually grow from a single server to a large cluster of “behemoths”. And the process of making the Web system bigger is actually the process of solving the problem. At different stages, different problems are solved, and new problems are born on top of old solutions.

There are no limits to system optimization, and software and system architectures have been evolving rapidly. New solutions solve old problems, but also bring new challenges.