For our users, just need to enter or click open a URL in the browser, we can directly see the content of the web page, but behind this has experienced a very complex story, simply speaking, there are mainly the following four processes:

1. The browser finds the CORRESPONDING IP address (remote server) based on the domain name. 2. The browser establishes a connection with the remote server (TCP connection, three-way handshake). 3. The browser and the remote server send and receive data. 4. The browser is disconnected from the remote server.Copy the code

Let’s explain it in more detail.

1. Resolve domain names to IP addresses

IP is an address assigned by the network to each computer. You can simply understand it as a house number. In order to find a computer, you need to know its IP address first. Because IP addresses are not easy to remember and other reasons, so there is a domain name, in theory, one domain name corresponds to one IP. The first thing the browser does when it gets a domain name is to resolve it to an IP address and then find the machine that corresponds to that IP address.

DNS resolution is usually used to find IP addresses for domain names. To make the lookup process more efficient, browsers and operating systems cache the results of each resolution.

That is, the next time you want to resolve a domain name, you will first go to the cache to look for the IP address. If there is no corresponding data in the cache, you will go to the root DNS server, top-level DNS server, etc.

In short, in this process, no matter what kind of way, the final purpose is to resolve the domain name into IP address.

2. Establish a connection

Now that you have the IP address of the other party, it is time to establish a connection. Because HTTP is based on TCP, TCP requires a three-way handshake to establish a connection. Here’s a simulation of the three-way handshake:

Browser A: Hello, this is Browser A. I want to play with you. Remote server B: Hello, Browser A. Go ahead. Browser A: Ok, I'm on my way.Copy the code

This is how the three-way handshake works, and TCP’s three-way handshake ensures that every message goes back and forth. If a message does not receive a valid response, TCP resends the message until a valid response is received.

TCP belongs to the transport layer, in fact, there is a udp protocol at the transport layer, the biggest difference between UDP and TCP is that there is no need to establish a connection, that is, UDP does not need to establish a connection like TCP through the three-way handshake to start communication, UDP packets do not have to ensure that can get an effective response, just send it. Udp is usually used in real-time scenarios, such as live broadcast.

3. Server response and return

Once the connection is established, data can be sent to each other. The browser sends a request to the server, which receives the request and responds to the browser with the result.

The HTTP server software on the server side generally includes Apache and Nginx. Apache or Nginx will hand over requests to specific programming languages (Java, Python, PHP, etc.) to process.

The server then returns the results of the program to the client browser in HTTP format. The browser then renders pages and data according to the returned data.

4. Disconnect

After the data communication is complete, consider disconnecting from the network, since the task is completed and the system resources are freed up. The TCP disconnection is actually a quad handshake, often referred to as a quad wave.

Browser A: Hello, it's getting late. I'd like to go back. Remote server B: Oh, let me see what time it is. Remote server B: Oh, it's really late. You go ahead. Bye. Browser A: Ok, bye.Copy the code

At this point, you might ask, why can’t disconnecting just like establishing a connection with three waves of the hand, i.e. why doesn’t the server just disconnect when the browser tells the remote server that it wants to disconnect?

When the server receives the disconnection message, there may be tasks or data that are not processed. At this point, the server will verify that all data is processed. If it is, the server will tell the browser that it can be disconnected.

Okay, so that’s pretty much what happens when a browser opens a web page, but it’s pretty complicated.

For example, DNS resolution, how to check the cache, if there is no cache and how to query the root domain name server, so many domain names in the world will go to the root domain name server to query IP, and how to deal with the high concurrency request root domain name server?

This article is based on HTTP, for some high security requirements such as payment services we usually use HTTPS, so HTTP and HTTPS exactly what is the difference between the principle of HTTPS encryption?

I will talk about these problems separately and in detail later. Welcome to pay attention. The original article is not easy, if the article is a little help or inspiration to you, I hope to give the article a thumbs up, if you have any questions, please leave a message and communicate with me.