Let’s imagine that A user enters A web address to make an HTTP request, and the DNS server finds the IP address for the domain name and connects to the server for the IP address. How does the server know whether the user found it through domain name A or domain name B? The purpose of this article is to give you a deeper understanding of HTTP with examples of communication protocols, including reverse proxy, HTTP packets, three-way handshake, DNS resolution, DNS contamination/hijacking, HTTPS, and more. It should be noted that the fragmented knowledge points in this paper actually cover many directions and fields, so you can dabble deeply in your own hobbies.

From practical application to theoretical support, we go from “deep” to “shallow”, looking for the essence through the phenomenon.

Reverse proxy resolves multiple domain names

The reverse proxy serves as an intermediate layer to access Intranet sites, preventing content servers from being exposed to the Internet, providing security protection, and implementing load balancing and traffic limiting.

Let’s look at the code for multi-domain configuration on Nginx:

server { listen 80; server_name A.com; location / { proxy_pass http://localhost:11111; } } server { listen 80; server_name B.com; location / { proxy_pass http://localhost:22222; }}Copy the code

As shown in the code above, the A/B domain names correspond to port 11111/22222 of the host respectively. As you can see, the reverse proxy Nginx can get the domain name in the HTTP request. Therefore, it can be inferred that DNS resolution does not simply translate the domain name into an IP address. So let’s take a look at what an HTTP packet looks like.

The HTTP message

If you open Chrome to monitor HTTP requests, you can see something like the following:

The Host attribute of request always exists in the whole HTTP request and cannot be changed, which is the basis for Nginx reverse proxy server to determine the domain name.

All attributes of HTTP Request and Response are shown below:

TCP three-way handshake

The HTTP protocol (application layer) is based on the TCP protocol (transport layer) and is not dependent. HTTP communication also requires a three-way handshake. Here’s an example:

A: Request A call. (SYN) B: Requests a call. Received. (SYN+ACK) A: Received. (ACK)

SYN indicates synchronization, and ACK indicates confirmation. The three-way handshake itself is designed like a gentleman’s agreement, and the reason for the three-way handshake instead of the two-way handshake is to solve the regrouping problem caused by network delay (which can be understood as the answer to a non-question caused by network delay during the chat). The three-way handshake is the same as the communication model in project management:

Think about it. In real communication, “pass the message,” “acknowledge it,” and “feedback” are the same as three handshakes.

The DNS

The DNS server is a cache server corresponding to a key-value(domain name -IP address). If the host record in an HTTP request is not cached locally, the DNS server forwards the resolution request to the root server.

DNS hijacking

Hijacks DNS servers to modify their resolution results.

DNS pollution

The DNS server is disguised as a DNS server and returns false results to the query user if the DNS server matches the request on the blacklist. It utilizes the UDP protocol to be connectionless and unreliable.

DNS resolution, CDN acceleration, VPN gateway, these communication layer east and west side involves a lot of knowledge, I will not elaborate more. Let me just say that most mature hacking techniques are based on Internet protocols.

HTTP and HTTPS

S in HTTPS stands for secure, which can also be interpreted as HTTPS = HTTP + SSL. The default HTTP port is 80, and the default HTTPS port is 443. At the same time, HTTPS encrypts the transmitted data. More intuitively, you need to confirm the public key and encrypted certificate of the server to access HTTPS sites, thus ensuring the security of data transmission.

The communication protocol layer is relatively scattered, I did not carry out strong causal logic inference in the context, so the knowledge point is relatively fragmented. The communication layer itself has little to do with programming, but this knowledge, which is easily overlooked by programmers, often acts as a link between the various knowledge systems in programming concepts. Hope you found this article helpful.