You probably know what the DNS protocol is? Do you know what the full DNS query process is? Is it based on TCP or UDP? What parts of DNS are TCP and UDP responsible for?

The introduction

This article from the following aspects step by step into the DNS protocol, its complete query process and the bottom is based on UDP or TCP implementation?

  • What is the DNS protocol?
  • Domain structure
  • Optimize the domain name resolution cache
  • What are the DNS query methods
  • Complete DNS query process
  • Why do YOU choose DNS query based on UDP instead of TCP?

What is the DNS protocol?

The Domain Name System (DNS), like HTTP, FTP, and SMTP, is an application-layer protocol used to resolve the host Name (Domain Name) provided by a user to an IP address.

Simply put, DNS is like an automatic telephone directory. We can call the other party directly at 47.105.127.0, but it is not easy to record and remember. DNS provides a means for us to call the other party directly at www.pzijun.cn

👆 resolves the domain name www.baidu.com into an IP address: 1.1.1.1. Think 🤔

  • How does DNS resolve a domain name into an IP address?

Domain structure

The core system of DNS is a three-tier tree and distributed service, which basically corresponds to the structure of domain name:

  • Root DNS Server: Manages top-level DNS servers and returns the IP addresses of top-level DNS servers such as com, net, and cn
  • Top-level DNS Server: An authoritative DNS Server that manages its own domain name. For example, a COM TOP-LEVEL DNS Server can return the IP address of the Apple.com DNS Server
  • Authoritative DNS Server: Manages the IP addresses of hosts in its own domain name. For example, the apple.com Authoritative DNS Server can return the IP address of www.pzijun.cn

With the 👆 system, any domain name can be queried from top to bottom in the above structure. For example, to access “www.pzijun.cn”, you need to perform the following three queries:

  • Access the root DNS server and it will tell you the “cn” top-level DNS server address;
  • Visit the “cn” top-level DNS server and it will tell you the address of the “pzijun.com” DNS server.
  • Finally, a visit to the “pzijun.com” domain name server gives you the “www.pzijun.cn” address

But if the world’s domain name resolution is crowded into this system, the system may be jammed, or not, the resolution speed will be greatly reduced, so DNS has adopted a very effective method to solve this problem: caching

Domain name cache optimization

There are two ways to cache domain names:

  • Non-authoritative DNS server (local DNS server) Cache: Each major service provider or company has its own DNS server. Generally, it is deployed close to the user to access the core DNS system on behalf of the user. It can cache the previous query results
  • Local computer DNS record cache
    • Browser cache: After obtaining the actual IP address of a website domain name, the browser caches the IP address. After encountering the cached results before the query of the same domain name, the browser can effectively reduce the loss of network requests. Each browser has a fixed DNS cache time, such as Chrome’s expiration time is 1 minute, during which no DNS requests will be made again
    • Operating system cache: The operating system has a special “host mapping” file, usually an editable text, in Linux/etc/hostsIn WindowsC:\WINDOWS\system32\drivers\etc\hostsIf the operating system cannot find DNS records in the cache, it will look for this file

DNS Query Mode

DNS queries can be made in two ways: recursive and iterative.

In general, the DNS server used by the DNS client setup is a recursive server, which is responsible for handling the CLIENT’s DNS query request until the final result is returned

The iterative query mode is used between root DNS servers to avoid excessive pressure on the root DNS server

Recursive query

Iterative query

Complete DNS query process

Combining the DNS query request process between DNS servers and the domain name cache described above, the full query process 👇 :

  1. You start by searching your browser’s DNS cache, which maintains a table of domain names and IP addresses
  2. If 😢 is not matched, the search continues in the DNS cache of the operating system
  3. If 🤦♀️ is still not matched, the operating system sends the domain name to the local DNS server. The local DNS server queries its DNS cache and returns the result if the search succeeds. (Note: the search between the host and the local DNS server is recursive.)
  4. If 🤦 is not matched in the DNS cache of the local DNS server, the local DNS server queries the upper-layer DNS server using the following methods (note: The query between the local DNS server and other DNS servers is performed iteratively to prevent excessive pressure on the root DNS server) :
  • First, the local DNS server sends a request to the root DNS server. The root DNS server is the highest level. It does not directly indicate the IP address of the domain name, but returns the IP address of the top-level DNS server
  • After the local DNS server obtains the IP address of the TOP-LEVEL DNS server, it sends a request to obtain the IP address of the domain name server
  • The local DNS server sends requests to the DNS server based on the IP address of the domain name server and finally obtains the IP address corresponding to the domain name
  1. The local DNS server returns the IP address to the operating system and caches the IP address 📝
  2. The operating system returns the IP address to the browser and caches the IP address itself 📝
  3. At this point, the browser gets the IP address corresponding to the domain name and caches the IP address 📝

These remote queries are based on UDP and usually use port 53.

Why do YOU choose DNS query based on UDP instead of TCP?

One measure of how fast a computer communicates is response time, which is the time it takes from the time a user issues a communication command (type in a web address and hit enter) to the time it takes the user to see a complete page. That is:

Response time = DNS domain name resolution time + TCP connection establishment time + HTTP transaction time

Among them, it takes three waves to establish a TCP connection, which is impossible to reduce with each HTTP transaction. Therefore, keep the DNS domain name resolution time as small as possible

TCP

What if DNS queries continue using TCP connections? As a reliable transport protocol, TCP creates a connection with the following additional overhead:

  • TCP requires three times of network communication to establish a connection.
  • TCP to establish a connection needs to transmit ~130 bytes of data;
  • TCP connection destruction requires four network communications;
  • TCP destruction requires the transmission of ~160 bytes of data;

Assuming that the time spent on network communication is negligible, if we only consider the data transmitted when TCP connections are established, we can do a simple calculation:

Using TCP (total 330 bytes) :

  • Three-way handshake – 14×3(Ethernet) + 20×3(IP) + 44 + 44 + 32 bytes
  • Query the protocol header – 14(Ethernet) + 20(IP) + 20(TCP) bytes
  • Response protocol header – 14(Ethernet) + 20(IP) + 20(TCP) bytes

Note that we calculated the result here on the premise that the DNS resolver only needs to communicate with one nameserver or authoritative server to get a DNS response, but in a real scenario, the DNS resolver might recursively communicate with multiple nameservers. This also doubles down on the cost of the TCP protocol.

UDP

If UDP is used (total 84 bytes)

  • Query the protocol header – 14(Ethernet) + 20(IP) + 8(UDP) bytes
  • Response protocol header – 14(Ethernet) + 20(IP) + 8(UDP) bytes

If the request body and response of a DNS query are 15 and 70 bytes respectively, then TCP adds ~250 bytes and ~145% overhead over UDP

So when the size of the request body and the response is small, transmitted through TCP protocol not only needs to transmit more data, will consume more resources, many times the time cost of communication and information transmission in the DNS query more hours cannot be neglected, the reliability of a TCP connection to bring the DNS scenario failed to play the role of is too big.

Weaknesses of UDP transport

Due to historical reasons, the minimum MTU of physical links on the Internet is 576. DNS packets transmitted based on UDP are limited to 512 bytes in order to limit the number of packets to 576.

In this way, once the DNS query response exceeds 512 bytes, the UDP-based DNS will only truncated to 512 bytes, and the user will get an incomplete DNS response.

To overcome this difficulty, we for the first time specified in the DNS protocol the specification that “when a DNS query is truncated, TCP should be used for retry”. Although the transaction may take longer, it is better to get a complete answer than an incomplete one.

At the same time, when the packet is large enough, the percentage of the extra overhead caused by the TCP three-way handshake becomes smaller and smaller, approaching zero relative to the overall packet size:

The EDNS mechanism is introduced in RFC6891, which allows us to use UDP to transmit 4096 bytes of data at most. However, due to the limitation of MTU, the transmitted data is fragmented and lost (in actual production, once the data in the packet exceeds the maximum transmission unit MTU of the transmission link, The current packet may be fragmented and discarded), making this feature unreliable;

conclusion

It is important to note that DNS uses UDP protocol to obtain the IP address corresponding to the domain name. This is true, but it is a bit partial. To be precise, DNS queries mainly use UDP protocol for communication at the beginning of the design, and TCP protocol was added to the specification during the evolution and development of DNS:

  1. At the beginning of DNS design, TCP protocol was introduced in the zone transport and UDP protocol was used in the query. It occupied port 53 of both UDP and TCP
  2. When DNS exceeded the 512-byte limit, we made it clear for the first time in the DNS protocol that “if a DNS query is truncated, TCP should be used to retry”.
  3. The EDNS mechanism introduced later allows us to use UDP to transmit 4096 bytes of data at most. However, due to the limitation of MTU, data fragmentation and loss make this feature unreliable.
  4. In recent years, we’ve redefined DNS to support both UDP and TCP, and TCP is no longer just an option for retries;

Refer to the address

  • Geek Time perspective HTTP protocol
  • Ultra-detailed DNS protocol resolution
  • Why does DNS use UDP

Three minutes a day, advance one