Have you ever been asked a question about something and remembered that you’ve often seen this kind of question, but the exact details are unclear?

Ok, I am such, then tidy up, really can’t remember, later look for also convenient.

When you type a web address into your browser’s address bar, for example:www.zhihu.com/,

We will see the home page of Zhihu. In fact, this is what happened behind the scenes.

First, the browser parse the input url

First, the browser will parse the url we type in, so what makes up the domain name https://www.zhihu.com/?

1.http

Commonly translated as hypertext Transfer protocol. However, I read that this is not strict, the strict translation should be “hypertext translation protocol”, but we don’t worry about that, just “hypertext transfer protocol”. This protocol is used for communication between clients and servers. For example, your browser is the client and Zhihu’s server is the server.

But HTTP itself has many disadvantages:

  • Communications use clear text (not encryption) and the content can be eavesdropped
  • The identity of the communicating party is not verified, so it is possible to encounter camouflage
  • The integrity of the message could not be proved, so it may have been tampered with

This is true not only for HTTP, but for any unencrypted protocol. To effectively prevent these problems, HTTPS was born. HTTPS = HTTP + encryption + Authentication + Integrity protection. Instead of spreading it out, I recommend you take a look at the book Diagrams of HTTP, which I borrowed from here.

2. www

The World Wide Web. Dr Tim Berners-Lee of CERN, the European Organisation for Nuclear Research, has come up with an idea that would allow far-flung researchers to share knowledge. The basic idea of the original idea was to make the WWW (World Wide Web) accessible to each other by means of HyperText, which is formed by the correlation between multiple documents. WWW is the name that Web browsers used to use to browse hypertext client applications. It is now used to refer to the collection of these series, also known as Web for short. WWW together with zhihu.com constitutes a domain name address.

3. Resolve the domain name

In fact, there is a corresponding IP address behind each address. Before really initiating a request, DNS resolution is also needed to resolve what is the IP address behind zhihu.com.

2. The browser initiates a TCP connection to the server and experiences “three-way handshake”.

To make HTTP requests, send and return requests between the browser and the server, you need to create a TCP connection. Because HTTP does not exist such a concept of connection, it only has a request and response such a concept, then the request and response are a packet, between them is the need for a transmission channel, so where is the transmission channel? It’s in TCP. TCP is at the transport layer and provides reliable byte stream services. To facilitate the transfer of large data, TCP divides the data and checks whether the data is finally sent to the other party. So how can you ensure that the data is successfully delivered to the other party? “Three-way handshake” is one of the means to ensure the reliability of communication. The order is as follows:

  • First handshake: The sender sends a packet with the SYN flag to the other end.
  • Second handshake: After receiving a packet, the receiving end sends back a packet with the SYN/ACK flag, indicating that the packet has been received.
  • Third handshake: After receiving the response from the receiving end, the sender sends a packet with the ACK flag to the receiving end.

    Let’s borrow some pictures from the book to explain it, if it’s easier to remember.

We can also use itwiresharkTo see if the TCP “three-way handshake” really took place before the HTTP request. (Note: Wireshark is used in wireshark.)



Taking this HTTP request as an example, you can see that there are three TCP connections before HTTP. In A TCP connection, HTTP requests are made only after the handshake is complete.

The handshake is complete, and the HTTP request is exchanged between the browser and the server

The browser sends an HTTP request to the server, requesting a packet, and the server processes the received request and returns the data to the browser.

The browser decodes the server’s response

The browser decodes the server’s response and stores it in the cache if it supports caching. There are usually images, CSS, JS and other resources in the web page, and the browser will send a request to retrieve those resources embedded in the HTML.

Fifth, page rendering

With all the resources in hand, the browser can render them in the client, and the rich web page is finally presented to us.