In the previous two articles, I introduced IP protocols at the data link layer and the network layer, respectively. Although the focus of this tutorial series is to nail down TCP/IP, don’t worry, after this brief introduction to IP protocol-related technologies, the next article will be a formal, detailed introduction to the transport layer and TCP. This article introduces the DNS, ARP, and NAT protocols. Although these protocols are not directly related to TCP, understanding their principles helps you consolidate basic knowledge and better understand the working principles of networks.

The DNS

The IP address is used to identify the addresses of the communication parties, but it is a long string of numbers, which is not easy to remember. People want the host to have its own name, which is unique and easy to remember. Thus, the concept of “domain name” was born. A domain name is a hierarchical name that identifies a host name and an organization name. For example, in the domain neu.edu.cn, neu is the host name and edu and cn are the organization names at different levels.

Both a domain name and an IP address can uniquely correspond to one host. The DNS protocol converts a meaningful domain name into an IP address that is not easy to remember.

Domain names are layered, and each layer has its own DNS server to handle DNS resolution requests. The advantage of this is that the server at each layer does not have to pay much attention to the information, it only needs to know its own layer of DNS information. Take resolving domain name: www.ietf.org as an example:

The root server does not actually know the IP address of www.ietf.org, but it does know the address of the itef.org domain server, so it forwards the query request to the itef.org domain server. DNS requests are sent layer by layer until the corresponding IP address is found.

ARP protocol

The Address Resolution Protocol (ARP) is used to locate the MAC Address of the next network device to receive packets based on the destination IP Address. If the destination host is on the same data link, the MAC address of the destination host can be obtained directly; otherwise, the MAC address of the next router can be obtained.

The ARP protocol works in two parts: ARP request and ARP response. First, the source host broadcasts an ARP request: “I want to talk to the host at IP address XXX. Who knows its MAC address?” .

All hosts on the data link receive this message and check their IP addresses. If the IP address in the ARP request packet is the same as that in the ARP request packet, the host sends an ARP response packet: “I am the host with IP address XXX and my MAC address XXXX.”

The following figure shows the working mechanism of ARP.

In practice, it is inefficient to use ARP every time data is sent to the target host. The obtained MAC address is usually cached for a period of time. In general, once the source host sends a packet to the destination address, the probability of sending it more than once is very high, so this cache is very easy to hit.

The cache is invalid after the next ARP request is sent or after a certain period of time. This ensures that packets are correctly sent to the destination address even if the mapping between MAC addresses and IP addresses changes.

Again, MAC and IP addresses may seem to function similarly (both are used to uniquely distinguish hosts), but neither is necessary. If there are only IP addresses, you can skip ARP and directly send a broadcast on the data link. However, this applies only to the situation where the two communicating parties are on the same data link. If the two parties are on different data links, the datagram cannot penetrate the intermediate router.

If the whole world only uses MAC addresses, then follow the self-learning process of the switch, as you can imagine, this process can lead to huge, unnecessary traffic.

Because both MAC and IP addresses are indispensable, protocols like ARP were invented to link the two.

NAT and NAPT technologies

Network Address Translator (NAT) is a technology used to translate private IP addresses on a LAN into global IP addresses.

If you check the device’s IP address when connecting to a wireless router, you might find a LAN IP address like 192.168.1.1. How can hosts on different network segments communicate with each other at 192.168.1.1?

The following diagram illustrates how NAT works:

The host at IP address 10.0.0.10 on the LAN sends data to global IP address 163.221.120.9. The NAT router changes the source IP address of the packet to its global IP address: 202.244.174.37. Similarly, when receiving data, the router translates the destination address 202.244.174.37 into the Intranet address 10.0.0.10

The router has only one external global IP address. What if multiple internal hosts communicate with each other? This is where NAPT technology is used, which is similar in principle to NAT but can translate TCP and UDP port numbers.

When NAPT is used, different Intranet IP addresses are translated into the same public IP address, that is, the global IP address displayed by the router externally, but different port numbers are added to distinguish them.

Both NAT and NAPT require a router. The router maintains an address translation table that is automatically generated. Take TCP as an example. When the SYN packet that establishes the first handshake of a TCP connection is sent, this table is generated. When the connection is closed, a FIN packet is sent.

If you are not familiar with TCP and the three-way handshake, this will be covered in detail in the next article.