1 BASIC KNOWLEDGE of DNS The Internet is based on TCP/IP. To facilitate the management of hosts on the network, the Internet is divided into several domains, and each domain can be further divided into several subdomains. For example,.com,.org, and.edu are the top-level domains, while google.com is the subdomain below.com.

Any host on the network belongs to a domain and has its own name, called hostname. For example example.com is a host in the.com domain whose name is example.com (or the difference between example, hostname and domain name, see here).

Domain name/host name is for the convenience of people to remember, and the ultimate communication between the machine or IP address, so need a host name (domain name) into IP address service. The DOMAIN Name Service (DNS) does just that, and the corresponding Server is called a Domain Name Server.

For example, when using a browser to access example.com, the browser first visits the DNS server, looks up the IP address for example.com, establishes a TCP connection with that IP address, and then initiates an HTTP request.

A domain name can correspond to one IP address or multiple IP addresses. For the latter, the DNS service algorithm selects one of the addresses to return. In order to achieve high availability, most network services correspond to multiple addresses. As we will see later, Baidu.com corresponds to multiple IP addresses.

DNS service access may be unstable in some scenarios, such as incorrect DNS server Settings, packet loss on the network, and host DNS configuration errors. Let’s look at a couple of cases.

2 prepare test environment For you to follow the hands-on practice, this article will build a container environment.

Pull Docker image:

$sudo docker pull alpine:3.8 Run the container, note that there must be a –privileged parameter [2], otherwise the following part of the TC command cannot be performed:

$sudo docker run -d -- Privileged --name CTN-1 alpine:3.8 sleep 3600D $sudo docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 233BC36bDE4b alpine:3.8 "sleep 3600d" 1 minutes ago Up 14 minutes CTN-1Copy the code

Enter the container:

$sudo docker exec it ctn-1 sh

/ # ifconfig eth0 Link encap:Ethernet HWaddr 02:42:AC:11:00:09 inet ADDR: 172.17.0.9bcast :0.0.0.0 Mask:255.255.0.0Copy the code

3 DNS Configuration 3.1 Checking DNS Configuration The DNS configuration on Linux is in /etc/resolv.conf. Let’s look at the container configuration first:

/ # cat /etc/resolv.conf
# Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8)
#     DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE OVERWRITTEN
nameserver 192.168.1.11
nameserver 192.168.1.12
Copy the code

If you run cat /etc/resolv.conf on the host, you will see the same result.

3.2 Modifying DNS Configurations You can modify nameserver in /etc/resolv.conf to configure the DESIRED DNS server. For example, the Intranet environment may use its own DNS server, because the DNS server provides Intranet domain name resolution and public domain name resolution faster than the network provider’s public DNS server.

4 DNS Troubleshooting This section simulates several scenarios that slow DNS query. If similar phenomena occur in the actual environment, you can perform troubleshooting in these directions.

4.1 Failure to Search for a domain name when the DNS server is Not Configured on the Machine symptom: The network is connected (for example, ping IP address is connected), but DNS query always fails. Possible cause: The DNS server is not configured on the machine Solution: Modify the /etc/resolv.conf file to configure an appropriate DNS server for the host. Sometimes, the DNS is not configured for a newly started machine (physical machine, VM, or container). As a result, the domain name cannot be accessed. So let’s do it again. In a normal container, use nsLookup to view the IP address corresponding to the domain name:

Name: / # nslookup example.com example.com Address 1:93.184.216.34 Address 2:2606:2800-220:93 luxuriously 8:18:25 c8:1946Copy the code

As you can see, we have obtained an IPv4 address and an IPv6 address for the domain name.

Note out the DNS server list in /etc/resolv.conf with # to simulate the scenario where no DNS server is configured.

Test again:

/ # nslookup example.com

Nslookup: can’t resolve ‘example.com’: Try again You can check whether a DNS server is configured in /etc/resolv.conf.

4.2 THE DNS Service Is Too Slow Symptom: THE DNS query is too slow Possible cause: The DNS server is incorrectly configured. Solution: Modify /etc/resolv.conf to configure an appropriate DNS server. Each company has a self-maintained DNS server, which is used to resolve Intranet DNS and public domain names at a faster pace.

Dig is another more powerful DNS query tool installed:

/ # apk update && apk add bind-tools

/ # dig example.com ... Example.15814 IN A 93.184.216.34; Query time: 0 msec ;; SERVER: 192.168.1.11 # 53 (192.168.1.11)Copy the code

You can see it’s very fast, within 1ms.

We then tested what the latency would be if Google’s public DNS server 8.8.8.8 [1] was used.

Conf, comment out other nameserver, and add nameserver 8.8.8.8.

Test again:

/ # dig example.com ... Example.15814 IN A 93.184.216.34; Query time: 150 msec ;; SERVER: 8.8.8.8 # 53 (8.8.8.8)Copy the code

The delay became 150ms, more than 150 times larger than before.

Therefore, if DNS query is very slow, check whether the DNS server is properly configured.

4.3 Hardcode /etc/hosts Causes DNS query to be skipped: The access to a domain name is slow, a domain name always points to the same IP address (in the case of multiple IP addresses), and a domain name cannot be accessed by a specific machine

Possible cause: /etc/hosts has the hardcode domain name and IP address. Solution: Modify /etc/hosts. As mentioned above, most public domain names correspond to multiple IP addresses.

/ # ping baidu.com (220.181.57.216): 56 data bytes 64 bytes from 220.181.57.216: Seq =0 TTL =45 time=26.895 ms 64 bytes from 220.181.57.216: Seq =1 TTL =45 time= 26.971 ms ^C / # 56 data bytes 64 bytes from 123.125.115.110: seq=0 TTL =43 time=27.587 ms 64 bytes from 123.125.115.110: Seq =1 TTL =43 time=27.757 ms ^CCopy the code

It can be seen that the IP addresses obtained during the two ping tests (internal search for the IP address of Baidu.com) are different. Nslookup shows that they are all the IP addresses of Baidu.com:

/ # nslookup baidu.com
Name: baidu.com
Address: 220.181.57.216
Name: baidu.com
Address: 123.125.115.110
Copy the code

In /etc/hosts, you can harcode the IP address of a domain name directly, which will cause the machine to skip DNS query and take this IP address as the IP address of the domain name. So let’s verify that.

Modify /etc/hosts, add 123.125.115.110 baidu.com, and ping again

/ # ping baidu.com (123.125.115.110): 56 data bytes 64 bytes from 123.125.115.110: Seq =0 TTL =43 time= 27.761 ms ^C -- baidu.com ping statistics -- 3 packets transmitted, 3 packets transmitted, 0% packet loss round-trip min/avg/ Max = 27.861/27.861/27.861 ms / # 56 data bytes 64 bytes from 123.125.115.110: seq=0 TTL =43 time=27.614 ms ^CCopy the code

No matter how many times it is executed, the IP address of Baidu.com will not change. In fact, this IP address is not necessarily the best IP address, or it may even be unavailable, resulting in failure to access Baidu.com. Therefore, hardcode in /etc/hosts should be avoided at all costs in practice.

4.4 DNS Query Is Unstable The DNS query is unstable sometimes fast sometimes slow Possible cause: There are TC or Iptables rules on the machine. As a result, the packet sent to the DNS server is slow or lost. Modify or delete tc/iptables rules We use TCS to simulate network latency:

/ # apk add iproute2

/ # tc-p qdisc ls dev eth0 has no rules by default.

Then we add: 600ms per packet delay:

/ # tc qdisc add dev eth0 root netem delay 600ms / # tc -p qdisc ls dev eth0 / # qdisc netem 8001: Root refcnt 2 LIMIT 1000 delay 600.msCopy the code

Testing:

/ # dig example.com ... Example.15814 IN A 93.184.216.34; Query time: 600 msec ;; SERVER: 192.168.1.11 # 53 (192.168.1.11)Copy the code

As you can see, the DNS query becomes 600ms.

Here we are testing for fixed latency, which is easy to spot. We can also test random delay, or proportional delay, etc. [2] :

/ # tc qdisc change dev eth0 root netem delay 600ms 10ms 25%
/ # tc qdisc change dev eth0 root netem delay 600ms 20ms distribution normal
Copy the code

Such rules result in more random DNS query speed.

Delete the TC rule:

The / # tc qdisc del dev eth0 root iptables rule also causes a similar problem.

Many software will add TC or iptables rules to the host after running, such as OpenStack, K8S, etc. So when you run into this random delay problem, you can first check to see if there are TC or iptables rules on your machine.

4.5 The following problem occurred on the DNS Reverse Query Unstable Line: When ping an Intranet domain name from a machine, each ping packet seems to delay for 5 to 30 seconds. However, after the CTL-C disables the ping, the printed statistics show that there is no packet loss and the ping delay is very low (in milliseconds), which is strange. The following:

Dig, very fast, milliseconds level, indicating that there is no problem in DNS query, DIG can see the CORRESPONDING IP address of the domain name, ping the IP address directly, find that there is no delay, still ping the domain name, use tcpdump to capture packets, tcpdump -i eth0 hostand ICMP, Ping packets are found to respond immediately, confirming the fact that the ping latency is very low in the statistics. According to the above information, the problem of ping stalling is caused by the machine, and it should be the time-consuming operation of the ping program itself. Continue to:

Ping the domain name again, and in the meantime, trace the ping process with ltrace -p and find it stuck in a function called gethostbyaddr()

Check the document and find that this function reversely queries hostname according to IP, which needs to interact with DNS. Basically confirm the DNS server reverse query problem, we use several other command line tools to verify, the following three commands are reversely query hostname according to IP:

nslookup

host

dig -x

Sure enough, all three commands stuck. After modifying /etc/resolv.conf and changing the DNS server, the problem disappears. Next, check out the DNS server.