In the protocol stack, HTTP belongs to the first application layer, and its role is to negotiate the information interaction protocol between the client and the server.

If we want to visit a website, we usually type in the URL of the website, which is the URL. The URL generally consists of the following parts:

  • Protocol header: indicates the protocol type.
  • Host name: indicates the IP address of the host (translated by DNS into an IP address).
  • Path: the address of the accessed file.
  • Port number: port number of the current protocol (used to identify different applications and not displayed in plaintext in THE URL address).
  • Query string: with? As a sign,? The latter content is the query content (generally identified in the form of value pairs).
  • Locator: identified by #, combined with THE ID attribute in HTML, can directly jump to the specified ID;
  • Safe encoding: characters such as/will be recognized as unsafe characters and translated into US-ASCII hexadecimal characters;

Example:

https://developer.mozilla.org/zhCN/docs/Learn/Common_questions/What_is_a_domain_name

HTTPS: protocol header, indicating that the current protocol is HTTPS and the default port is 443. Developer.mozilla.org: host address; zhCN… : the path;

Such as:

https://www.baidu.com/s?ie=utf-8&f=8&rsv_bp=1&rsv_idx=1&tn=baidu&wd=http&fenlei=256&rsv_pq=eacf4b2100113fe9&rsv_t=3faa%2 FIrY9yB7qijAalZK2yeqz5H4gzb%2FBuSUWbO73VOTHdoHAjm98HdDZYo&rqlang=cn&rsv_dl=tb&rsv_enter=1&rsv_sug3=7&rsv_sug1=4&rsv_sug7 =101&rsv_sug2=0&rsv_btype=i&inputT=2640&rsv_sug4=3741&rsv_sug=2

This is a search HTTP baidu URL,? After the key value pair is the query content, you can see Baidu added many of their own restrictions.

If you search \, this translates to %2F;

If you want to navigate to an element on the page, you can add a # and the element’s ID at the end of the URL.

When we enter the URL and press Enter, the browser will send a request to the server. According to the content of the request, the browser will respond differently, which is called response.

Of course, before a request can be sent, a connection needs to be established between the browser and the server, and that connection relies on TCP. TCP establishes a connection with the server through three handshakes:

  • Browser sends packets;
  • The server sends a reply packet to verify the sending capability of the browser and its own receiving capability.
  • The browser sends packets again, verifying the server’s ability to send packets and its own ability to receive;

After the three-way handshake, the TCP connection is established and the browser can send its request.

Above is a request message containing:

  • Request mode URL protocol version;
  • The header.
  • The body (Response returns an HTML file, request usually doesn’t have much content);

The header contains:

  • Host address;
  • Connection status (referring to TCP connections)
  • User agents (typically the browser currently in use);
  • Acceptable file format;
  • Indexes of other urls;
  • Accepted encoding form;
  • The language of acceptance;
  • The accepted character set;

Of course, in addition to the above, the general header also contains:

  • cookie
  • Creation time and so forth

Before HTTP 2, headers were in clear text, so they were generally readable.

Now let’s talk about some of the important things in this.

Request way

There are four common requests:

  • get
  • post
  • put
  • delete

Get is a request to return something, but not change it; Post is asking for something and updating it; Put is uploading something new; Delete deletes something that already exists on the server.

Cookies are also worth talking about, but this is related to the connection that I’ll talk about later, so I’ll leave it behind. Now let’s look at response again.

Response consists of the following parts:

  • The protocol version status causes
  • The header
  • The main body

Mastheads generally include:

  • The cache control
  • The file type
  • Server address
  • Creation date
  • Connection status
  • The length of the content

State is a very important presence in response, which represents the server’s response to the request. A 404, for example, is a state.

There are the following states:

  • 100-199: indicates that the request has been received and needs to be processed.
  • 200-299 indicates that the processing is successful.
  • 300-399, indicating redirection (the server redirects the URL address to the new address);
  • 400-499, client error (incorrect request sent by browser);
  • 500-599, server error;

Common status codes and reasons include:

  • 200 OK: everything is normal.
  • 301 Move Permanently: indicates that the content to which the URL points has been Permanently moved.
  • 302 Move temporarily: Indicates that the content to which the URL points is temporarily moved.

For more information, see the following table (no need to remember, just look it up when you encounter it) :

Now that we know the basics of Request and Response, it’s time to take a closer look at the connections. HTTP is connected by TCP, which has the advantage of no packet loss (packet), and TCP also provides flow control to ensure that the sender does not send data too fast for the receiver to receive.

TCP is great, but setting up a connection requires three handshakes and disconnecting is a lengthy process, while HTTP is a stateless protocol that does not store any state sent by the server.

Imagine a web page where each request initiates a TCP connection, the request ends, and the connection disconnects. It is inefficient to wait for the next request to finish before proceeding. And a web page, need HTML files, CSS files, JS files, images, videos, if all established links, too slow.

Therefore, in connection with the way, we think of a variety of methods:

Parallel connection

This method sends multiple connections to the server, so multiple messages can be sent at the same time. However, the capacity of a server is limited, and the server is easily overwhelmed by multiple connections.

Persistent connections

This approach has the advantage of being constantly open (of course, using the Connection flag in the header to disconnect the connection), which avoids the delay of disconnecting and reconnecting, but a persistent connection still takes up server space and is vulnerable (because the connection is constantly open).

Pipeline connection

This type of connection sends a second connection without waiting for a reply from the previous connection, but the server still needs to create a new connection when it replies.

The latest HTTP2 protocol uses the == two-way connection ==, which establishes a single TCP connection but allows the connection to carry arbitrarily combed two-way messages (request and response). In this way, both the speed and the capacity of the server are greatly improved.

Additionally, HTTP2 allows the server to proactively push messages to the client (which previously had to be proactively initiated by the client), which again saves a lot of effort (proactively determining what the server needs).

Now we know URL, request, Response, and Connection. When we enter a URL, the browser sends a request, establishes a connection with the server over TCP, and the server returns its response.

However, the browser does not directly dump the request to the server. There are many proxy servers, which are not necessarily physical devices but can also be software. These proxy servers include:

  • Access control
  • The authentication
  • Compressed response body
  • Load balancing
  • SSL encryption
  • The cache
  • .

Let’s talk about caching here. Caches are divided into public caches and private caches. As the name implies, public caches can be used by everyone, and private caches can only be used by oneself.

Why cache? Since it takes too long for the browser to fetch resources from the server each time, it is better to create a duplicate resource and fetch resources from the cache.

Caching involves the question of when to cache and when to fetch again. Why do I have to redo it? Because the resource can be updated, the cached content becomes the old version.

In general, you can determine whether the cache is out of date based on the update date. For example, the browser sends a header that asks if modified-since, followed by the creation time of the previous cached file, meaning that the file has not been modified since the last time it was cached. The browser will react to the actual situation and return 304 (not changed) with the content returned empty; If it changes it returns 200 and returns the new file again.

In addition, the cache usually has a max-age attribute, which is used to mark the maximum number of seconds that the cache can create. If the maximum number of seconds is exceeded, the cache is invalidated and refetched.

Another way is to use ETag, which is generated using a hash algorithm that attaches the ETag to the header every time the server sends a response. If the file is updated, the value of the ETag changes and the server returns the new file.

That’s it for HTTP, and finally we’ll talk about security.

Many websites, is not allowed to access casually, the need to enter the user password, which requires the user to verify.

Cookie, also known as HTTP state management mechanism, is a common authentication method. The browser will attach a cookie to the header with a unique identifier (GUID), which is verified by the server.

In theory you can stuff anything in a cookie, which is represented by set-cookie in the header. If cookies are intended to be permanent (the server uses cookies for long-term authentication), they must be set to expire. Cookies vary from site to site.

But there is no way for cookies to authenticate users. If I get your cookie, I can use cookies to log in to some private websites, which is risky.

So the common method is authentication, which is the password we enter when we log in to the website.

When a user logs in to a website and the site requires authentication, it is redirected to a login page, and when the user successfully enters an account password, it is redirected back (a cookie is set so that the user does not need to log in every time).

Another way is openID, which is to hand over the responsibility of inspection to others, such as wechat and QQ login, which can also be used to log in to other websites.

The advantage is that we don’t have to apply for so many passwords.

It should be noted that authentication is just a way for the server to confirm whether our information is correct, but the server itself also needs to be verified, such as some phishing sites, it is not good to enter by mistake.

This requires the emergence of HTTPS protocol, which inserts an SSL/TLS encryption layer between the application layer and the transport layer to ensure the security of the website.

To use HTTPS, a website needs to install a security certificate (issued by an authority), which acts as a certificate.

Finally, a brief note on the differences between HTTP2 and previous versions:

  1. Messages changed from plaintext to binary;
  2. Header compression (using a new compression algorithm);
  3. Bidirectional connection;
  4. Allow the server to push messages;

Reference: HTTP Succinctly