How much you know about a network request process, one is to demonstrate your expertise; The second is a deep understanding, so that you can make a more suitable and reliable architecture in the large site architecture. DNS is the starting point of all this, this article with a common architecture diagram, to describe the process.

Deployment architecture

For large Web services, our general deployment architecture is shown below. I’ll explain it first.

So here’s why we want to structure it this way. First of all, the client request obtains the corresponding server IP address (actually the IP address of LB) through DNS. There is DNS load balancing at this layer, and in the case of static station resources, they enter the CDN. How DNS and CDN connect to each other will be explained in detail later. When the request reaches the LB layer (the application layer protocol is HTTP), this layer will perform load balancing again (possibly using LVS or Nginx). Here we have two different approaches, one path to the agent cluster and the other path directly to the application cluster. Why is that?

LB to an agent cluster

Through the top layer LB load balancing to reach the agent machine, here not directly into the application cluster, but also set up a layer of agent is mainly convenient for us to carry out various advanced operations in the agent cluster.

For example: request log collection, custom cache, custom load balancing, custom routing rule making (across rooms, routing groups)

LB to application cluster

With all the benefits of going up to the agent layer, why is there a path around the agent layer? This is mainly for heavy traffic services. Because of the extra operations at the agent layer, the response is longer, the path is increased, and there is an extra round trip to the next cluster.

Therefore, in case of heavy traffic services, the extranet LB loads the load to the application cluster to prevent the agent from being overloaded and respond faster.

After the above segmentation, it will eventually reach the application cluster, and we will deploy an Nginx on each machine to transfer to the corresponding service by domain name. Of course, it may not be Nginx at all, such as microservice, which may be a SideCard agent. The main point here is to make it clear that we are going to treat all of this as Nginx. The service invokes DB Cache through domain names for load balancing purposes. When a request is made, domain names are resolved through the Intranet DNS service and the IP address of the Intranet LB is obtained. Then the Intranet load balancing is performed here. It checks whether you return IP for write or read operations according to the port of the domain name. A regular point is guaranteed to be single-point write and multi-point read. To ensure data consistency.

This is the general process. Next, we will talk about the working principle of DNS and CDN in detail.

How does DNS implement IP lookup

In order to clarify the CDN, here is the DNS resolution process. Of course, there are plenty of such articles online. But I want to explain how DNS works as I understand it.

There are four important concepts in the whole DNS process, which are explained below.

DNS Resolver – A recursive Resolver that receives domain name resolution requests from clients and sends DNS query requests. The client does not need to wait for the DNS Resolver to tell it the result of transferring the domain name to the IP address.

Root Server – This is the first query performed by switching IP addresses. The Root Server does not store specific domain NAME IP mapping information. It acts like an index Server and tells you which TLD Server to query next.

TLD Server – This is a top-level domain name Server, which is the second step in performing an IP query. This tells the DNS Resolver the address of the authoritative domain name Server.

Authoriative Server – Authoritative domain name Server is a domain name containing a complete machine name, for example: www.example.com, which stores the IP address corresponding to the specific domain name on the machine.

Here’s what each of the 10 steps is doing.

  1. A user enters example.com in the browser, and a DNS query is generated to enter the DNS Resolver.
  2. The Resolver will enter the root server for query.
  3. The root server returns the ADDRESS of the TLD server, and the query request is redirected to the top-level domain service, in this case the.com server.
  4. The recursive parser sends a request to the.com server;
  5. The TLD Server returns the address of the example.com authority server upon receiving the request;
  6. The recursive parser sends another query to the authoritative server, which queries its own mapping table to get the IP address.
  7. The result shows that the queried IP address is assigned to the DNS Resolver.
  8. The DNS Resolver returns an IP address to the browser. The browser will use this IP address to establish connections and initiate requests.
  9. From this IP address, the client initiates an HTTP request.
  10. The server parses the request and returns the data to the browser.

It should be added that each of the above steps actually has a DNS cache design. Such as:

  • The browser will cache DNS results, (Chrome ://net-internals/# DNS)
  • The DNS module of the operating system caches the cache
  • Each subsequent level also has caches

So a lot of the time, our parsing process doesn’t have to go through these eight steps sequentially. This is the same as our own application services, layer cache, cache to read the cache results, cache implementation to execute the complete process.

DNS resolution classification

DNS has a variety of parsing records that can be set, and I present three common designs.

A record – referred to as IP pointing, the user set from the domain name to the corresponding IP host. If you want to use A record to achieve load balancing needs the support of host vendors. CNAME record – this is equivalent to setting an alias for a host name, and the record cannot use IP directly, only another host alias. CDN mainly uses this record to accomplish. If A record exists at the same time as the CNAME record, the A record will be used preferentially. In other words, the CNAME record will not take effect. NS record – Used to set the authoritative server path for a domain name. This record is valid only for subdomain names. This is where you can set the IP or you can set the domain name of another authoritative server. It is important to note that it has A higher priority than the A record, and it skips steps 2,3,4,5 in the DNS resolution process.

After knowing the steps of DNS, we will proceed to the analysis of CDN.

CDN access acceleration

What is A CDN? The Chinese translation is content delivery network. See the picture.

Without CDN, no matter where users visit our site, they need to go to our data center to get data (pure DNS process). With CDN, users will choose the nearest cache data center to obtain data according to their geographical location. You don’t go to the source (application server) every time to get the data. To understand this process, how do we implement CDN in the full DNS process?

Now we need to answer two questions.

  1. What benefits does CDN bring?
  2. How to resolve to CDN.

Benefits of CDN

It’s better to know what it does and what the benefits are before you know anything about it. And then we’ll look at how it works. CDN has the following benefits.

Improve page loading speed

This is one of the most obvious advantages, through the above figure, we can also intuitively feel that the user access to the nearest machine, the speed must be the fastest. And the faster the loading speed of the site, the better the user experience, your site will be more liked by the corresponding users. As for how to achieve the nearest access, the principle of the following section is introduced.

Add redundancy to the content

CDN is a typical distributed architecture. By increasing data redundancy, on the one hand, it ensures that multiple servers can provide the same data in the face of heavy traffic. On the other hand, when some machines fail, automatic transfer can be carried out.

Save bandwidth

If you’ve ever bought a cloud service yourself, you know that every bit of bandwidth goes up in price. After using CDN, the bandwidth requirement of the original machine is naturally reduced because the traffic is shunt. Of course, bandwidth costs are lower, and you also have to pay for CDN.

Ensuring service safety

CDN protects against: DDOS attacks, which overwhelm your bandwidth with massive traffic and incapacitate you. Then, because of the existence of CDN, it diverts huge traffic. So the pressure at the source station is naturally lower. This is also a consideration for high concurrency.

At present, CDN can not only cache static HTML, CSS, JS and VIDEO, but also CDN that can cache dynamic interface content, which provides us with more means to choose when we construct high-concurrency services.

How CDN works

When introducing DNS, you can see how clients obtain IP addresses. So with CDN, how to deal with this process?

The CDN is more like a layer of cache between the application server and the user. Therefore, if DNS is returned to the client with the IP address of the CDN machine instead of the IP address of the application, it will naturally go to the CDN machine.

In order to achieve the above purpose, we will configure A CNAME for the domain name (note the CNAME and the priority of A record), then how will this CNAME be resolved to the corresponding CDN machine? In fact, the process is the same as DNS resolution. When a domain name is found with a CNAME set, the DNS resolver continues to resolve the CNAME alias (which is essentially another domain name). Global load DNS resolution is used to resolve this CNAME, which returns the corresponding IP (the IP of the CDN machine) based on the geographical location of the visitor. So the client actually gets the IP address of the nearest CDN machine.

What if the user accesses the CDN, but there is no corresponding content in the CDN? At this time, the CDN machine will actually obtain the IP of the source station according to the domain name according to its own special DNS resolution service, and then send a request to the source station to obtain data, and cache these data locally for convenient subsequent use. At the same time, the result is returned to complete the access request.

It should be noted that CDN is also layered. The node closest to the user is called the edge node. The CDN’s central cluster of servers is called level 2 cache. This is the source site for application deployment. In general, edge nodes go to the secondary cache when they have no data, and the secondary cache goes to the source station when they have no data (known as back source).

summary

As for the process of DNS, the paper mainly introduces the process, and ignores the dependent protocol and transmission process in more details. CDN is also a means of performance improvement that we often use, and it will be used to improve performance in the subsequent seckill related articles. In particular, the distributed design and analysis process of CDN is of great reference significance when we design application architecture.

My official account is dayuTalk