Small knowledge, big challenge! This article is participating in the creation activity of “Essential Tips for Programmers”.

This article also participated in the “Digitalstar Project” to win a creative gift package and creative incentive money

preface

What’s going on behind the scenes when you open the browser and enter the url and the page is presented to you? What is the process? This is a very popular question for interviewers, like me. First to give you an overall flow chart, specific steps see below decomposition!

Generally speaking, it can be divided into six processes:

  • The browser finds its IP address by domain name (DNS resolution)
  • Establishing a connection between browser and server (TCP/TP three-way handshake)
  • The browser sends an HTTP request to the server
  • The server receives the request and returns an HTTP response
  • The browser parses the rendered page
  • Disconnect (wave four times)

(HTML/CSS/JS/JSON are the fourth part of HTTP and are all transmitted based on HTTP, which is based on TCP/IP)

What is a URL

Uniform Resource Locator (URL) is a Uniform Resource Locator (URL) used to locate resources on the Internet. Such as www.w3school.com.cn/html/index..

Scheme: / / host. Domain: port/path/filename in each part of the explanation is as follows: scheme – defines the type of Internet service. Common protocols include HTTP, HTTPS, FTP, and File. The most common type is HTTP, and HTTPS is used for encrypted network transmission. Host – Defines the domain name of the Internet, For example, w3school.com.cn port – defines the port number on the host (the default HTTP port number is 80) path – defines the path on the server (if omitted, the document must be in the root directory of the website). Filename – Defines the name of the document/resource

Domain name Resolution (DNS)

After entering a web address in the browser, the domain name must be resolved first, because the browser does not directly find the corresponding server through the domain name, but through the IP address. —- Computers can be assigned IP addresses as well as host names and domain names. Such as www.hackr.jp. Why didn’t you just give it an IP address in the first place? This saves parsing. So what is an IP address

1. The IP address

An IP Address is an Internet protocol Address, short for IP Address. An IP address is a unified address format provided by the IP protocol. It allocates a logical address to each network and each host on the Internet to shield physical address differences. An IP address is a 32-bit binary number, for example, 127.0.0.1 is the local IP address. A domain name is the equivalent of an IP address a pretender in disguise, wearing a mask. Its function is to facilitate memorization and communication of a set of server addresses. Users usually use host names or domain names to access each other’s computers, rather than directly through IP addresses. That’s because it’s better to remember a computer name as a combination of letters and numbers than as a set of pure numbers for an IP address. But getting computers to understand names is relatively difficult. Because computers are better at processing long strings of numbers. In order to solve the above problems, DNS service came into being.

2. What is domain name resolution

The DNS provides the service of searching IP addresses by domain names or reverse-searching domain names from IP addresses. DNS is a web server. Our domain name resolution is simply a record of information on DNS.

Example: baidu.com 220.114.23.56 (Server external IP address) 80 (Server port number)

3. How does the browser query the IP address of the URL based on the domain name

  • Browser cache: The browser caches DNS records at a certain frequency.
  • Operating system cache: If you can’t find the DNS record you need in the browser cache, look for it in the operating system.
  • Route cache: Routers also have DNS caches.
  • DNS server of an ISP: AN ISP has a dedicated DNS server to respond to DNS query requests.
  • Root server: If the ISP’s DNS server cannot be found, it sends a recursive query to the root server (DNS server first asks the IP address of the root DNS server.

4. Summary

The browser sends the domain name to the DNS server. The DNS server searches for the IP address corresponding to the domain name and returns the IP address to the browser. The browser injects the IP address into the protocol and sends the request parameters to the corresponding server. The next step is to send an HTTP request to the server. The HTTP request is divided into three parts: TCP three-way handshake, HTTP request response information, and closing the TCP connection.

TCP three-way handshake

Before sending data, the client initiates a TCP three-way handshake to synchronize the serial number and confirmation number of the client and server, and exchange TCP window size information

1. The TCP three-way handshake process is as follows:

  • The client sends a packet with SYN=1, Seq=X to the server port (the first handshake, initiated by the browser, tells the server I’m going to send the request)

  • The server sends back a response with SYN=1, ACK=X+1, Seq=Y as confirmation (second handshake, initiated by the server, telling the browser I’m ready to accept it, send it now)

  • The client sends back a packet with ACK=Y+1, Seq=Z, which means “handshake over” (the third handshake, sent by the browser, tells the server I’m sending soon, get ready to accept).

2. Why you need three handshakes

In His book Computer Network, Xie Xiren said that the purpose of “three-way handshake” was “to prevent the invalid connection request message segment from being suddenly transmitted to the server, resulting in errors”.

4. Send HTTP requests

After the TCP three-way handshake is complete, HTTP request packets are sent. The request message consists of request line, request header and request body, as shown in the following figure:

1. The request line contains the request method, URL, and protocol version

  • There are eight request methods: GET, POST, PUT, DELETE, PATCH, HEAD, OPTIONS, and TRACE.
  • The URL is the requested address, which is specified by < protocol > : //< host > : < port >/< path >? < parameters > composition
  • Protocol version Indicates the HTTP version number

POST/chapter17 / user. HTTP / 1.1 HTML

In the preceding code, POST indicates the request method, /chapter17/user. HTML indicates the URL, and HTTP/1.1 indicates the protocol and the protocol version. The popular version is Http1.1

2. The request header contains additional information about the request and consists of two pairs of keywords and values. Each pair of keywords and values is separated by colons (:).

The request header notifies the server that there is information about the client request. It contains a lot of useful information about the client environment and the request body. For example, Host indicates the Host name and virtual Host. Connection,HTTP/1.1 added to use keepalive, a Connection can send more than one request; User-agent, request originator, compatibility, and customization requirements.

3. Request body, which can carry data of multiple request parameters, including carriage return, line feed and request data. Not all requests have request data.

name=tom&password=1234&realName=tomson

The above code, carrying the name, password, realName three request parameters.

5. The server processes the request and returns HTTP packets

1. The server

The server is a high-performance computer in the network environment. It listens to the service requests submitted by other computers (clients) on the network and provides corresponding services, such as web service, file download service, mail service and video service. The main function of the client is to browse the web, watch videos, listen to music and so on, which are completely different. The application that handles requests, the Web Server, is installed on each server. Common Web server products include Apache, Nginx, IIS, or Lighttpd. Web server plays the role of control. For the requests sent by different users, it will combine configuration files and entrust different requests to the programs that process the corresponding requests on the server for processing (such as CGI scripts, JSP scripts, Servlets, ASP scripts, server-side JavaScript, Or some other server-side technology, etc.), and returns the result of daemon processing as a response.

2.MVC background processing stage

There are many frameworks for backend development, but most of them are built according to the MVC design pattern. MVC is a design pattern that divides an application into three core parts: model, view, and controller. Each of them handles its own tasks, separating input, processing, and output.

1. View

It is provided to the user’s operation interface, is the shell of the program.

2. Model

The model is mainly responsible for data interaction. Of the three parts of MVC, the model has the most processing tasks. A model can provide data for multiple views.

3. Controller

It is responsible for selecting data from the “model layer” according to user input instructions from the “view layer”, and then performing corresponding operations on it to produce the final result. The controller is a manager who receives requests from the view and decides which model artifact to call to process the request, and then determines which view to use to display the data returned by the model process. The three layers are closely linked but independent of each other, and changes within each layer do not affect the other layers. Each layer provides an Interface that can be invoked by the upper layer. What happens at this stage? In short, the request sent by the browser first passes through the controller, which performs logical processing and request distribution, and then invokes the model. At this stage, the model will acquire the data of Redis DB and MySQL, and render the page after obtaining the data. The response information will be returned to the client in the form of response message. Finally, the browser uses a rendering engine to render the web page to the user.

3. HTTP response packets

A response packet consists of a request line, a header, and a body. As shown below:

(1) The response line contains: protocol version, status code, and status code description

The status code rules are as follows: 1XX: indicates that the request has been received and continues to be processed. 2xx: Success: The request is successfully received, understood, or accepted. 3xx: Redirect – Further action must be taken to complete the request. 4XX: Client error – The request has a syntax error or the request cannot be implemented. 5xx: Server side error — the server failed to fulfill a valid request.

(2) The response header contains additional information of the response packet, consisting of name/value pairs

(3) The response body contains carriage return character, newline character and response return data. Not all response messages contain response data

The browser parses the rendered page

Now that the browser has the response text HTML, let’s talk about the browser rendering mechanism

There are five steps that browsers take to parse a rendered page:

  • Parse out the DOM tree from the HTML
  • Generates a CSS rule tree based on CSS parsing
  • Combine DOM tree and CSS rule tree to generate render tree
  • Calculate the information for each node according to the render tree
  • Draw the page based on the calculated information

1. Parse the DOM tree based on HTML

  • According to the content of HTML, tags are parsed into a DOM tree according to the structure. The DOM tree parsing process is a depth-first traversal. That is, all children of the current node are built before the next sibling node is built.
  • If a script tag is encountered during the process of reading an HTML document and building a DOM tree, the building of the DOM tree is suspended until the script is executed.

2. Generate a CSS rule tree based on CSS resolution

  • Js execution is paused while the CSS rule tree is parsed until the CSS rule tree is ready.
  • Browsers do not render until the CSS rule tree is generated.

3. Generate a rendering tree by combining DOM tree and CSS rule tree

  • After the DOM tree and CSS rule tree are all ready, the browser starts building the render tree.
  • Simplifying CSS can also speed up the building of CSS rule trees, resulting in faster pages.

4. Calculate the information of each node according to the render tree (layout)

  • Layout: Calculates the position and size of each render object from the render object information in the render tree
  • Backflow: After the layout is complete, it is found that some part of the layout has changed and needs to be rerendered.

5. Draw the page based on the calculated information

  • In the paint phase, the system traverses the render tree and calls the renderer’s “paint” method to display the renderer’s contents on the screen.
  • Redraw: Attributes such as the background color, text color, etc. of an element that do not affect the layout around or inside the element will only cause the browser to redraw.
  • Backflow: If the size of an element changes, the render tree needs to be recalculated and re-rendered.

Disconnect

When data transfer is complete, you need to disconnect the TCP connection and initiate the TCP wave four times.

  • If the sender sends a packet to the passive party, such as Fin, Ack, or Seq, no data is transmitted. And enter the FINWAIT1 state. (First wave: it is initiated by the browser and sent to the server. I have sent the request message. You are ready to close it.)
  • The passive sends Ack and Seq packets, indicating that it agrees to close the request. The host initiator enters the FINWAIT2 state. (Second wave: from the server, telling the browser that I’ve received my request and I’m ready to close, so are you)
  • The passive sends a Fin, Ack, or Seq packet to the initiator to close the connection. And enter the LAST_ACK state. (Third wave: initiated by the server to tell the browser that I have sent the response message and you are ready to close it)
  • The packet segment, such as Ack and Seq, is sent to the passive party. Then enter the wait TIME_WAIT state. The passive party closes the connection after receiving the packet segment from the initiator. If the initiator waits for a certain period of time and does not receive a reply, the system shuts down normally. (Fourth wave: initiated by the browser to tell the server, I have received the response message, I am ready to close, you are ready to do the same)