In this article, we explore the specific process from the time an HTTP request is sent to the time a network response is received. Because it is an analysis process, from the point of view of the whole macro process, it is involved in the main process, some knowledge details can not be covered, we check the gaps by ourselves.

Hierarchical network model

A famous aphorism of Butler Lampson goes: “All the problems in computer science can be solved by another level of indirection” from: en.wikipedia.org/wiki/Indire…

The above quote is widely used, from computer architectures to computer networks; From the operating system to the build, we also see layers of design and indirection added to solve problems.

Computer network system is also an abstract hierarchical network hierarchical model, there are two important network architecture:

  • OSI reference model
  • TCP/IP reference model

OSI reference model

Models widely used in teaching discussion are divided into:Answer, table, meeting, pass, net, number, thingSeven layers.

Image credit: Computational Mind Encyclopedia

TCP/IP reference model

The actual models that have been widely used for many years are divided into:

  • Response, transmission, network, data, material, 5 layers (combined with the advantages of OSI, just to introduce the principle of network design)
  • 4 layers: response, transmission, network interaction and network connection (the actual application is still TCP/IP four-layer architecture)

Image: zhuanlan.zhihu.com/p/31327310

Hierarchical correspondence

HTTP

HTTP is located at the application layer in the network layer model at the top, and TCP at the transport layer is below. For the convenience of transmission, the transport layer will segment the data (HTTP request packets) received from the application layer, mark the serial number and port number of each packet and then forward it to the network layer.

Request Sending and Receiving Process

Sending and receiving are reciprocal processes

Image source: Graphic HTTP

The sender encapsulates layer by layer, and the receiver unencapsulates layer by layer.

Image source: Graphic HTTP

It can be seen that when transmitting data between layers, the sender must be typed with the header information of this layer every time it passes through a layer. At the receiving end, when data is transferred from layer to layer, the corresponding head is eliminated with each passing layer.

HTTP Request Process

Before sending an HTTP request, we specify where to send it with the URL, but how does it work after that? Here are a few agreements

  • DNS: an application-layer protocol that resolves domain names and obtains target hosts using domain namesIP
  • TCP: transport layer protocol that provides reliable byte stream service. It requires three handshakes and four waves
  • IP: An Internet protocol used to transmit various data packets to each other. IP addresses and MAC addresses are required
  • ARP: a network-layer protocol that resolves addresses. Communication is performed based on the MAC address and the CORRESPONDING MAC address is detected based on the IP address of the communication party

The HTTP message

HTTP packets have a fixed format, whether request packets or response packets

Sample Request message

Response Packet Example

For details about packets, see:

  • Developer.mozilla.org/zh-CN/docs/…
  • Zh.wikipedia.iwiki.eu.org/wiki/HTTP head…
  • Hit – alibaba. Making. IO/interview/b…

Application Layer (DNS resolution)

As challenging as it can be to remember the phone numbers of everyone you know, it can be even more challenging if you change numbers from time to time. A smart approach, such as the current mobile phone address book, we only need to know their name to dial their phone and contact them, this is because each name corresponds to the phone number, the mobile phone will automatically dial, the address book is a mapping table, greatly reducing the burden of manual memory.

In fact, this relatively simple and primitive approach was adopted in the early days of the Internet. At that time, there was a file called hosts.txt, which kept the names of all the computers and their IP addresses. This approach is relatively simple, but on a network of millions of hosts, files can be extremely large, and the load and latency associated with centralized management and naming conflicts can be challenging.

In order to solve the problems of centralized management, DNS domain name system was developed in 1983. It is essentially a hierarchical, dome-based naming scheme implemented in a distributed database system.

What is the DNS

The Domain Name System (DNS) is a distributed database implemented by layered DNS servers and an application-layer protocol that enables hosts to query distributed databases. It maps host names to IP addresses.

DNS hierarchy

To deal with scalability, DNS uses a large number of DNS servers, and there is a clear hierarchy. Some DNS server layers are shown as follows:

DNS Resolution Process

The DNS protocol runs on TOP of UDP on port 53. Over UDP, DNS messages are sent in UDP packets in a simple format of query and response. Each query message contains a 16-bit identifier that is copied into the response packet so that the DNS server receives an answer that matches the corresponding query without confusing the results of multiple queries.

In fact, a domain name resolution can involve both recursive and iterative mechanisms.

Recursive query

In short, delegate queries to the local DNS server and focus only on the results. For example, the boss sent wechat to ask me, where nightclub girl, which I do not know, I have never been, I asked often go to the nightclub two dogs, two dogs asked his gay friend Big fly, big fly told two dogs, two dogs then tell me, finally I pretended to be very expert reply to the boss. To the boss, this query is a recursive query.

Iterative query

In short, it isDNSThe client does the query itself in every process. For example, the boss asked, xiaomao recommended the nearest pedicure shop? I refused to say with a serious face, do not know, you can ask the general sun of the public relations department. Boss to find sun, and was guided to the public relations staff Xiao Zhang, from xiao Zhang asked the answer. The process is for the boss to look up the answer step by stepIterative query.Image source: Computer Networking: The Top-down Approach

DNS cache

In order to reduce unnecessary requests, speed up response performance and reduce network delay, cache technology is widely used in DNS system. The MAPPING pair between the host name and IP address is cached in the DNS server. The mapping pair can be directly matched in the next query.

DNS Resource Records

Whether it’s a one-host domain or a top-level domain, each domain has a set of resource records associated with it that make up the DNS database. The basic function of DNS is to map domain names to resource records. The representation of a resource record as a quintuple is usually encoded in binary form for efficiency. Each record is on a single line, as shown in the following example:

Domain_name Time_to_live Class Type Value Specifies the domain name lifetime TypeCopy the code

For the above quintuple, domain name and lifetime category are easier to understand, focusing on the type. DNS records have many types. Common examples are as follows:

type meaning value
A IPV4 address of the host A 32-bit integer
AAAA IPV6 address of the host A 128 – bit integer
CNAME The specification of The domain name
NS Domain name server Name of the server in the local domain
TXT The text ASCII text for the description
MX Email exchange Priority, the domain that is willing to receive messages

Note: Each Internet host must have at least one IP address to communicate with other machines. Some hosts may have two or more network interfaces, and they may have two or more A or AAAA resource records. Therefore, multiple addresses may be obtained when a single domain name is queried. Example of DNS resource records

1165 IN TXT "v=spf1 include:_spf.google.com ip4:216.73.93.70/31 ip4:216.73.93.72/31 ~all" google.53965 IN SOA ns1.google.com. Dns-admin.google.com. 2014112500 7200 1800 1209600 300 Google.com. 231 IN A 173.194.115.73 231 IN A 173.194.115.64 google.com. 231 IN A 173.194.115.65 google.com. 231 IN A 173.194.115.66 google.com. 231 IN A 173.194.115.67 google.com. 231 IN A 173.194.115.68 google.com 173.194.115.69 google.com. 231 IN A 173.194.115.70 google.com. 231 IN A 173.194.115.71 google.com. 231 IN A 173.194.115.72 google.128 IN AAAA 2607: f8B0:400:809 ::1001 google.40766 IN NS ns3.google.google.com. 40766 IN NS ns4.google.com. google.com. 40766 IN NS ns1.google.com. google.com. 40766 IN NS ns2.google.com.Copy the code

NSLOOKUP

Nslookup is a DNS lookup that can query DNS records of any specified DNS server. Command syntax:

nslookup [-option1] [-option2] host-to-find dns-server
Copy the code

Common commands are as follows:

Query the IP address corresponding to a domain name

$ nslookup baidu.comAuthoritative answer: authoritative answer: authoritative answer: authoritative answer: authoritative answer: authoritative answer: authoritative answer: authoritative answer: authoritative answer: authoritative answer: authoritative answer: authoritative answer: authoritative answer: authoritative answer: authoritative answer: authoritative answer: authoritative answer: authoritative answer: authoritative answer: authoritative answer: authoritative answer Name: baidu.com Address: 39.156.69.79
#The IP address is the same as that of ping
$ ping baidu.comPING baidu.com (220.181.baidu.com): 56 data bytes 64 bytes from 220.181.38.148: Icmp_seq =0 TTL =50 time=9.037 ms 64 bytes from 220.181.38.148: ICmp_seq =1 TTL =50 time=8.393 msCopy the code

View the DNS host name of a domain name

Run the -type=NS command to search for the DNS server

$ nslookup -type=NS qq.comAuthoritative Answer: Authoritative answer: authoritative answer: Nameserver = ns1.qq.qq.com nameserver = ns3.qq.qq.com nameserver = ns4.qq.com. Authoritative answers can be found from:Copy the code

Note: An unauthorised reply indicates that the response is from a server’s cache, not from an authoritative DNS server.

Resolve domain names by specifying a DNS server

Analyze baidu.com through ns2.qq.com obtained previously

$ nslookup baidu.com ns2.qq.com
Server:		ns2.qq.com
Address:	123.151.66.78#53

** server can't find baidu.com: REFUSED
Copy the code

It seems that QQ DNS is their own use, no other data records, so refused. Of course we can use public DNS to resolve the domain name qq.com through dns.alidns.com

$nslookup qq.com dns.alidns.comAuthoritative Answer: Name: hkAA Server: DNs.alidns.com Address: 223.5.5.5#53 Non-authoritative answer: Name: QQ.com Address: Address: qq.com 58.250.137.36Copy the code

This successfully resolved the corresponding IP address.

Dig domain name output process

With the dig command, you can see the entire detailed query process, dig juejin.cn

$ dig juejin.cn

#Tool information section; <<>> DiG 9.10.6 <<>> juejin; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 5896 ;; flags: qr rd ra; QUERY: 1, ANSWER: 17, AUTHORITY: 0, ADDITIONAL: 0
#Query content side;; QUESTION SECTION: ; juejin.cn. IN A
#DNS server reply segment;; ANSWER SECTION: IN CNAME juejin.cn.w.cdngslb.com. 1 IN A 218.61.192.113 juejin.cn.w.cdngslb.com 1 IN A 218.61.192.109 juejin.cn.w.cdngslb.com. 1 IN A 60.19.67.238 IN A 116.95.26.243 juejin.cn.w.cdngslb.com. 1 IN A 116.95.26.242 juejin.cn.w.cdngslb.com IN A 60.19.67.240 juejin.cn.w.cdngslb.com. 1 IN A 218.61.192.114 IN A 1.28.145.231 juejin.cn.w.cdngslb.com. 1 IN A 60.19.67.248 juejin.cn.w.cdngslb.com. 1 IN A 124.132.135.242 Juejin.cn.w.cdngslb.com.1 IN A 60.19.67.239
#Transmission section;; Query time: 49 msec ;; SERVER: 192.168.1.1 # 53 (192.168.1.1);; WHEN: Sat Jun 26 12:42:34 CST 2021 ;; MSG SIZE rcvd: 320Copy the code

DNS packet capture tracing

Next, we capture packets to track the DNS resolution process

Recursive query

Take juejin. Cn as an example. Input dig juejin. The WireShark captures packets

Local 103 sends a DNS query message to DNS server 114.114.114.114

The DNS server 114.114.114.114 responds

Iterative query

Take Baidu.com as an example, enter Dig +trace baidu.cn to realize iterative query. Process:

  1. The second phase of the unauthoritative DNS house has found 13 root servers worldwide
  2. Found the.com top-level domain name server from the root server
  3. The domain name server of Baidu was checked in server of 1 top level domain name among them
  4. The IP address of Baidu.com was found on baidu’s domain name server

Console output:

$ dig +trace baidu.com; <<>> DiG 9.10.6 <<>> + Trace baidu.com; global options: 954 IN NS b.root-servers.net.. 954 IN NS i.root-servers.net.. 954 IN NS l.root-servers.net. . 954 IN NS g.root-servers.net. . 954 IN NS c.root-servers.net. . 954 IN NS e.root-servers.net. . 954 IN NS m.root-servers.net. . 954 IN NS j.root-servers.net. . 954 IN NS a.root-servers.net. . 954 IN NS k.root-servers.net. . 954 IN NS f.root-servers.net. . 954 IN NS h.root-servers.net. . 954 IN NS d.root-servers.net. ;; Received 239 bytes from 114.114.114.114#53(114.114.114.114.114) in 11 ms com.172800 in NS a.gtld-servers.net.com.172800 IN NS b.gtld-servers.net.com.172800 IN NS c.gtld-servers.net.172800 IN NS c.gtld-servers.net.172800 IN NS c.gtld-servers.net d.gtld-servers.net. com. 172800 IN NS e.gtld-servers.net. com. 172800 IN NS f.gtld-servers.net. com. 172800 IN NS g.gtld-servers.net. com. 172800 IN NS h.gtld-servers.net. com. 172800 IN NS i.gtld-servers.net. com. 172800 IN NS j.gtld-servers.net. com. 172800 IN NS k.gtld-servers.net. com. 172800 IN NS l.gtld-servers.net. com. 172800 IN NS m.gtld-servers.net. com. 86400 IN DS 30909 8 2 E2D3C916F6DEEAC73294E8268FB5885044A833FC5459588F4A9184CF C41A5766 com. 86400 IN RRSIG DS 8 1 86400 20210708200000 20210625190000 14631 . E4vNdg++JOGz+5Q0BcqMUAr4nJBE9dZ2j0S/4khgXUqCsJ5Wdorhccyn zjdcbRmkkCxasBFWgDqcKT00K18E9ErXgcgHVZkcy0eFbSHOFWLwbWU1 xEFBv8kjz+NxLd3bugv8zzEcDY5/4BE0TM/gPsIXz6FSjZSkZJJfrlMJ l1QQvur7cREIGYqMhFDs3IlEFXtrD35UVWgiVqwFKSsXxMnhqJauf/iF OjnUq8EEqtJxpuMtjLXWEkPzZPEfjCo7tLAzKfjp4DkbxK17B0e64foz u6oRbFL4yaeDkL+RRdbuKAIhq9AzwkR165xXtp8EdUTo8Vi2Br4uaAc5 thhuGA== ;; Received 1169 bytes from 198.97.190.53#53(h.root-servers.net) in 123 ms baidu.com. 172800 in NS ns2.baidu.com. Baidu.com IN NS ns4.baidu.com. 172800 IN NS ns4.baidu.com IN NS ns1.baidu.com. baidu.com. 172800 IN NS ns7.baidu.com. CK0POJMG874LJREF7EFN8430QVIT8BSM.com. 86400 IN NSEC3 1 1 0 -  CK0Q1GIN43N1ARRC9OSM6QPQR81H5M9A NS SOA RRSIG DNSKEY NSEC3PARAM CK0POJMG874LJREF7EFN8430QVIT8BSM.com. 86400 IN RRSIG NSEC3 8 2 86400 20210703042346 20210626031346 54714 com. fpRZKBFjStpVmT0pXcXOuti8qzhE0DfHJhriwBJn6Uj/rYfpI788/Jj6 K55Qs394YruLilWiMbHWRu7ubsSTwgqBD76CVuyqsq/jVUZePqBeU/5r nN68gFAjpfBZnG5UkxH9CzTHmercrzWD4rAp+5ZcpxkjjcT6x1VxbnYO brhgCDWKEipgGtByalV7NfRjyTuJrsrO8j1loeBOJOXnSQ== HPVUSBDNI26UDNIV6R0SV14GC3KGR4JP.com. 86400 IN NSEC3 1 1 0 - HPVV8SARM2LDLRBTVC5EP1CUB1EF7LOP NS DS RRSIG HPVUSBDNI26UDNIV6R0SV14GC3KGR4JP.com. 86400 IN RRSIG NSEC3 8 2 86400 20210703045003 20210626034003 54714 com. hG2hp+pbEak9kYn4UqCWs6f1fssX2v7DCKYKXQvzdA8Ruye7RJIoJWzT ae3gfdzoKt4uYvoRy2Mho2r9SvCerJuCbun9YQuY51SY9KfJVEfAL/7X 1BZ+WcL++zvwFUJZ/p/BCUrT6qvRFfgQvKAocwndfDvmSeM7O866u9Mv sCUP1JSYddWQBmZq50NZsRPKnWFaN0uYv/G+dOUVmE4rqw== ;; Received 757 bytes from 192.54.112.30#53(h.gtld-servers.net) in 192 ms baidu.600 in A 220.181.38.148 baidu.600 IN A 39.156.69.79 baidu.com. 86400 IN NS NS2.baidu.com. 86400 IN NS dns.baidu.com Ns3.baidu.com. # Search baidu.com IP address baidu.com. 86400 IN NS ns7.baidu.com ns4.baidu.com. ;; Received 240 bytes from 220.181.33.31#53(ns2.baidu.com) in 9 msCopy the code

WireSharkPacket capture diagram, packet details are more consistent with the above iteration, here is a full picture

DNS hijacking

DNS hijacking definition

DNS hijacking refers to tampering the mapping between a correct domain name and an IP address so that the domain name is mapped to an incorrect IP address. Therefore, DNS hijacking is considered as a DNS redirection attack. Typically, DNS hijacking can be used as domain name fraud, resolving it to a phishing site IP, or displaying additional advertising messages when visiting a site, stealing user data and personal information, etc.

DNS hijack classification

Generally, according to its links, it can be divided into the following categories:

  • Local DNS hijacking: DNS hijacking occurs on clients
    • Virus or template program, tamper with DNS configuration, DNS service address, DNS cache, etc
    • Tamper with DNS configurations and domain name resolution results of routers and network proxy devices
  • DNS resolution path hijacking
    • DNS request forwarding is redirected to another DNS server
    • DNS requests replication and then returns the result of DNS hijacking prior to a normal reply
    • DNS request pickup: Answers DNS requests instead of the DNS server
  • Tamper with an authoritative DNS server
    • The domain name management account is hacked and the NS authorization record of the domain name is tampered with
    • Hacked into the administrator account and tampered with NS authorization records of domain names

DNS hijacking

A common countermeasure in the industry is HTTPDNS, which is increasingly adopted by vendors to mitigate the effects of DNS hijacking to some extent. Domain name anti hijacking:

Domain name resolution requests are directly sent to the HTTPDNS server using the HTTP (HTTPS) protocol, bypassing carrier’s Local DNS and avoiding domain name hijacking.

Special Services:

Due to the diversity of carrier policies, the Local DNS resolution result may not be the nearest and optimal node. The HTTPDNS can directly obtain the client IP address and obtain the most accurate resolution result based on the client IP address, enabling the client to access the nearest service node.

A diagram of how HttpDNS worksPhoto credit: AliyunHTTPDNS Photo source: netease Cloud ShieldDNS hijacking principle? How do I handle DNS hijacking?

Transport Layer (TCP)

As mentioned above, for the convenience of transmission, the transport layer divides the HTTP request packets received from the application layer, marks the serial number and port number on each packet and forwards the HTTP request packets to the network layer. The IP address of the target host has been resolved through the ABOVE DNS resolution process, and the connection can be established here. The process of establishing connection communication is as follows:

Image source: Graphic HTTP

TCP is a connection protocol

Why do we need to confirm three times before establishing a connection and wave four times before disconnecting? This is because TCP is a connected reliable transport over an unreliable network.

Image source: Illustration of TCP/IP

TCP header format

Image source: Illustration of TCP/IP

Network layer (IP, ARP)

As mentioned earlier, DNS resolution is used to find the IP address of the target host, and then the TCP connection is established through it. As can be seen from the above FIGURE of TCP packets, the network layer pays attention to IP addresses. The network layer adds MAC addresses as communication purposes and forwards them to the link layer. However, IP addresses are not used at the data link layer, and only MAC addresses are used to transmit packets over Ethernet. The address used to send the packet is not enough to send the packet to the target end. The MAC address is also needed to find the real host information, which is stored in the routing control table

ARP

How do I find the Mac address

Once the IP address is determined, IP datagrams can be sent to the target address. However, it is necessary to know the MAC address corresponding to each IP address for actual communication at the underlying data link layer. ARP is a protocol for addressing the address problem. It can locate the MAC address of the next network device that should receive data subcontracting based on the destination IP address.

ARP working Mechanism

An ARP request packet is sent from an IP address to discover its MAC address. The destination IP address fills the ARP response packet with its MAC address and returns it to the IP address to implement IP communication within the link. Each host has an ARP cache table, which records the mapping between the IP address and MAC address of the host.

Data link layer

Once you have the MAC address and IP address, you can communicate with the target host.

Data link layer encapsulation

Data at the application layer is packaged at the transport layer with TCP packet header information and transmitted down. IP packet header information is attached at the network layer and Ethernet packet header information such as MAC address is attached when entering the data link layer. Packets passing through the data link layer are shown as follows:

Image source: Illustration of TCP/IP

HTTP request packet capture

Next use WireShark to capture and analyze the next HTTP network request to access: [https://www.rfc-editor.org/info/rfc2616] (https://www.rfc-editor.org/info/rfc2616), for example

The DNS

throughDNSrequest114.114.114.114Domain name service, querywww.rfc-editor.orgip In the packet return, you can see that the IP address is:4.31.198.49

TCP three-way handshake

I got it in the previous stepIPThe address is:4.31.198.49And then proceedTCPShake hands with lian

HTML GET request

TCPAfter the connection is successfully established, proceedHTTPrequest

Refer to the link

  • A complete HTTP request process
  • HTTP tutorial
  • Why do many sites open quickly the second time
  • View domain name A records online
  • Computer Network Learning Notes: Chapter 2
  • View the domain name DNS online
  • Let’s talk about DNS hijacking
  • What You Need to Know about DNS Hijacking — Web Security Part 1
  • Wireshark experiments: The DOMAIN name System (DNS)
  • A complete HTTP request process
  • Check and fill the gaps – Network -DNS