First, the TCP/IP

1. The TCP/IP protocol family

When a computer communicates with a network device, both must be based on the same method. For example, how to find the communication target, which language to use, how to end the communication and so on. These are governed by rules, which we call protocols.

Some people think that TCP/IP refers to TCP and IP protocols. In fact, this is one-sided. TCP/IP does refer to both protocols in certain scenarios. More often, it refers to a family of protocols that communicate over IP. Therefore, it is often called TCP/IP protocol family. HTTP, which we will introduce today, is a subset of this.

2. The TCP/IP layer

TCP/IP adopts a four-layer layered model, which consists of the application layer, transport layer, network layer, and data link layer from top to bottom.

model function Protocol family
The application layer Provide application service communication for users. For example, HTTP is used for browsing; FTP is used for file transfer. POP3 is used to receive mail. SMTP is used to send mails. HTTP FTP POP3 SMTP SSH etc
The transport layer The transport layer provides data transmission services for the application layer. TCP UDP
The network layer Process packets that flow over the network and select routes for packets. IP etc.
Data link layer Transfer data over hardware media.

3. Data processing process

Let’s take web browsing as an example:

The application layer browser sends HTTP requests and generates HTTP request data.

TCP connects, sends data, and disconnects. TCP ensures the reliability of HTTP packets arriving at the receiving end. To achieve this function, you need to split HTTP data and add a TCP header to each packet.

IP protocol, the DATA transmitted from the TCP layer as their own data, and in their own data front-end plus IP head and then forward to the data link layer. It is worth mentioning that the MAC address of the receiver is specified in the IP header.

The receiving end receives data at the data link layer and transmits it to the upper layer until it reaches the application layer. At this point, the receiver receives the HTTP data from the sender.

4. The DNS protocol

DNS stands for Domain Name System (DNS), which resolves Domain names into IP addresses. It is located in the application layer like the HTTP protocol.

It is more acceptable for users to access a computer using a string of meaningful characters. For example, use www.baidu.com to access Baidu. The purpose of DNS is to resolve domain names easily understood by users into IP addresses needed for network transmission.

2. HTTP packets

The data messages exchanged between HTTP servers are called packets. There are two types of HTTP packets: request packets and response packets. Request packets are sent from the client to the server. Response packet Indicates the response packet from the server to the client.

The two types of packets consist of the first line, header, and body.

The request message

POST/I HTTP/1.1 Host: count. Typora. IO Accept: */* Content-type: application/x-www-form-urlencoded Connection: keep-alive Cookie: __cfduid=d345ac732c82bada5e1f1b3ec7f42498e1550563241 Accept-Language: zh-cn Content-Length: 206 Accept-Encoding: br, gzip, deflate User-Agent: Typora / 1355 CFNetwork / 975.0.3 Darwin / 18.2.0 app_key = 3162 bc659f38963b8f15099e19551 (x86_64)Copy the code

The response message

HTTP/1.1 200 OK Server: nginx Date: Mon, 11 Mar 2019 03:28:05 GMT Content-Type: Application /json; charset=utf-8 Vary: Accept-Encoding Access-Control-Allow-Origin: * X-Frame-Options: deny X-XSS-Protection: 1; mode=block Content-Encoding: gzip Transfer-Encoding: chunked Connection: Keep-alive {"result":"Success"}
Copy the code

1. The first line

1.1 First Line of a Request Packet

POST/I HTTP/1.1 is the first line of a request packet. Where POST represents the request method. / I indicates the requested resource URI. HTTP/1.1 indicates the HTTP version number.

Request method: Tell the server the intent of the request and expect some behavior from the server.

The current common methods are:

methods instructions HTTP Version
GET Requests resources that have been identified by the URI 1.0, 1.1,
POST The body of the transport entity 1.0, 1.1,
PUT Transfer files 1.0, 1.1,
HEAD HEAD is the same as GET except that it does not return the body of the response message 1.0, 1.1,
DELETE Delete the resource 1.0, 1.1,
OPTIONS Ask for supported methods 1.1

In addition to the above methods, there are TRACE, CONNECT, LINK, UNLINK, etc.

Usually we only use GET and POST methods. GET is used to request access to resources identified by the URI, and the server returns the response result after the specified resources are parsed. POST is used to transmit the body of the entity. Although GET can also be used to transfer the body of an entity, we generally use POST instead of GET.

PUT is used to transfer files. However, the HTTP/1.1 PUT method is seldom used because it does not have authentication mechanism.

DETETE has the same problem as PUT and is rarely used. But if the server has a RESTFUL design, PUT and DELETE will be applied.

HEAD is the same as GET except that it does not return the body of the response message. Used to verify the validity of the URI and the date and time of the resource update. For example, if the client needs a large file (hundreds of megabytes), loading the file from the network every time can cause a lot of overhead. To solve this problem, the client can cache the file after the first GET, then only need the HEAD request to verify that the file is updated, and only need to re-get when the file is updated.

URI: indicates the resource URI

HTTP version: Currently HTTP/1.0, HTTP/1.1, and HTTP/2.0. HTTP/1.1 is the most widely used version.

1.2 First Line of Response Packet

HTTP/1.1 200 OK is the first line of the response packet. HTTP/1.1 has been introduced and is the same as the HTTP version in the first line of the request packet. 200 indicates the status code of the result. OK is a cause phrase.

Status code

The status code is responsible for describing the result of the request returned when the client sends a request to the server.

Status code instructions
1xx Indicates that the accepted request is being processed
2xx Indicates success
3xx Redirection
4xx Indicates a client error
5xx Indicates server error

2. Packet header

Host: count.typora.io
Accept: */*
Content-Type: application/x-www-form-urlencoded
Connection: keep-alive
Cookie: __cfduid=d345ac732c82bada5e1f1b3ec7f42498e1550563241
Accept-Language: zh-cn
Content-Length: 206
Accept-Encoding: br, gzip, deflate
User-Agent: Typora/1355 CFNetwork/975.0.3 Darwin/18.2.0 (x86_64)
Copy the code

The header field of a packet represents various conditions and attributes of the request, which can serve to convey additional important information. According to its different uses, it can be divided into the following four types: general header field, request header field, response header field, entity header field.

  • Generic header field

    Header used in both request and response packets.

  • Request header field

    The header used in the request packet.

  • Response header field

    The header used in the response packet.

  • Entity head field

    The header used for the entity portion of request and response messages.

HTTP/1.1 defines 47 header fields. We won’t cover them all, of course, because there are only a few that are commonly used. Now, I’ll make a table that matches HTTP features to these commonly used header fields. Of course, we’ll cover these features and header fields in more detail later, but for now, this is just a list.

HTTP features The first The first type describe
stateless Set-Cookie In response to the first Cookie information sent by the server to the client, which indicates the identity of the client recorded by the server
Cookie The request first Cookie information carried by the client when it requests the server to indicate its identity
A persistent connection Connection Gm’s first Managing Connection Status
The cache Cache-Control Gm’s first How operation caching works
Last-Modified Entity first Resource final modification time (server time)
If-Modified-Since The request first Its value is one time. Tell the server to process the request If the if-Modified-since field value is earlier than the resource update time
ETag In response to the first Unique identification of the current resource. As resources change, so does ETag
If-None-Match The request first Used with ETag, tells the server to process the request If the if-none-match value (ETag) is inconsistent with the resource ETag

3. The entity

This seems to have nothing to say over it!!

HTTP features

1. A stateless

stateless Set-Cookie In response to the first Cookie information sent by the server to the client, which indicates the identity of the client recorded by the server
Cookie The request first Cookie information carried by the client when it requests the server to indicate its identity

HTTP is a stateless protocol, that is, HTTP does not manage the communication status between the client and the server. The CPU and memory stress on the server is greatly reduced by not having to store communication state. This is an advantage of HTTP statelessness, but it can also cause some problems. For example, when we go to a shopping website, we need to verify our identity (requiring us to log in to our account) before we can submit an order. However, due to the stateless nature of HTTP (no login status is recorded), each order requires us to log in again. In order to solve similar problems without breaking HTTP’s stateless nature, HTTP introduced Cookie technology. Cookie technology adds Cookie information to request messages and response messages to maintain state. In the response packet, the server can tell the client the cookie information through the set-cookie header. When the client requests again, it carries the Cookie information through the Cookie header.

2. Persistent connection

A persistent connection Connection Gm’s first Managing Connection Status

As we explained earlier in TCP/IP, HTTP relies on TCP for data transfer. If you know anything about TCP, you know that TCP is a stable long connection that continues until one party explicitly disconnects. In the early days of HTTP, for every HTTP communication, there was a TCP connection and disconnect. This contradicts the long-connection nature of TCP. Therefore, HTTP persistent Connection (also known as HTTP keep-alive or HTTP Connection reuse) was added to HTTP/1.1 and parts of HTTP/1.0, Use the same TCP connection to send and receive multiple HTTP communications, rather than opening a new connection for each communication. HTTP manages persistent connections through the Connection header.

The value of the Connection describe
Connection:keep-alive HTTP/1.1 is keep-alive by default, indicating that the connection is persistent
Connection:close disconnect

3. Caching mechanism

The cache Cache-Control Gm’s first How operation caching works
Last-Modified Entity first Resource final modification time (server time)
If-Modified-Since The request first Its value is one time. Tell the server to process the request If the if-Modified-since field value is earlier than the resource update time
ETag In response to the first Unique identification of the current resource. As resources change, so does ETag
If-None-Match The request first Used with ETag, tells the server to process the request If the if-none-match value (ETag) is inconsistent with the resource ETag

HTTP allows clients to store the response locally after a URL request is completed. The next time the client requests resources from the URL, the client directly obtains the resources from the local storage. This is HTTP caching. Reasonable use of caching mechanism can not only save network resources, but also speed up the request feedback.

With the introduction of caching, HTTP requests become slightly more complex.

  1. When we visit a URL for the first time, the server returns a request for the resource and tells the client to cache the resource and when the cache expires.
  2. When revisiting the URL, the client finds the corresponding cache based on the URI and checks whether the cache is valid (the current time is less than the expiration time of the cache).
  3. If the cache is valid, the client will not access the server, but will directly fetch resources from the cache, a process we call cache hit.
  4. If the cache is invalid (the current time is greater than the cache expiration time), the client verifies that the cache is valid.
    • Valid, the server only returns the header information that represents the cache validity, and the client updates the cache expiration time and retrieves resources from the cache.
    • Invalid, server returns new resource, client updates resource.

Now that we have some understanding of the HTTP caching process, we can move on to the implementation of caching.

In step 1, we know that the server tells the client the expiration date of the cache. How is this date determined? This is done by adding cache-control :max-age to the response message. Max-age is followed by a relative time, which means the valid time of the cache. For example, cache-control :max-age=60 indicates that the Cache is valid for 60 seconds. It is important to note that cache-control is a field defined only in HTTP/1.1. In HTTP/1.0, expiration time is implemented through the Expire field. Exprie has a bug that returns the expiration time as the absolute time of the server, which causes the cache time to be incorrect if the client time is inconsistent with the server time. If we set both cache-control :max-age and Expire headers, cache-control :max-age is preferred in HTTP/1.1 and the cache-control header is ignored in HTTP/1.0.

The second and third steps are easy, there is nothing extra to explain.

In step 4, we learned that when the cache fails, the client validates with the server again. So how does the client validate to the server? In the first step, in addition to the cache-Control header information, the server also returns a last-Modified field whose value is the Last time the resource was changed (the server time). When the client validates the cache to the server, it adds if-modified-since to the request header, whose value is the last-modified value. The server can determine whether the client cache resource is consistent with the server resource based on the time carried by if-modified-since. If the consistent server returns a status code 304 (Not Modified), no entity information is returned. If they are inconsistent, the server returns a status code of 200(SUCCESS), as well as resource information. Whether 304 or 200 is returned, the client reupdates the cache (304 only updates expiration time, 200 updates expiration time and resources).

In addition to last-modified/if-modified-since, you can also use ETag/ if-none-match to manage the cache. ETag Identifies each resource with a unique identifier. Resource changes cause ETag changes. If-none-match checks If the resource’s ETag has changed. The rest of the logic is basically the same as last-modified/if-modified-since, so I won’t go over it.

The caching mechanism greatly improves the efficiency of HTTP access, but also introduces real-time problems. Therefore, HTTP provides stricter Cache Control modes: cache-control :no-cache and cache-control :no-store.

Cache-control :no-cache Indicates that the client cannot directly use the local Cache and must verify the Cache validity with the server. Not literally no cache.

Cache-control :no-store Indicates that the client cannot Cache resources.

All that was said was to draw out this picture.

Four, HTTPS

Before getting into HTTPS, let’s take a look at some of HTTP’s drawbacks:

  1. Communications are in clear text and can be eavesdropped.
  2. Do not verify the identity of the correspondence party and may encounter camouflage.
  3. The integrity of the packet cannot be verified, and the packet may be tampered with.

To solve the above problems, HTTPS came into being. HTTPS is not a new application-layer protocol. Instead, SSL\TLS was added between HTTP and TCP. Generally, HTTP communicates directly with TCP. With HTTPS, HTTP communicates with SSL\TLS first, and then with SSL\TLS and TCP. SSL provides authentication and encryption functions. HTTPS is described in Illustrated HTTP as HTTPS=HTTP+ encryption + Authentication + integrity protection.

1. The encryption

There are two common encryption methods: symmetric encryption and asymmetric encryption.

Symmetric encryption is an algorithm that uses the same secret key for encryption and decryption. The advantage of this algorithm is fast encryption, but the disadvantage is that the secret key must be sent to the other party. But how can we be sure the key won’t be intercepted while it’s in transit?

Asymmetric encryption is an algorithm that requires a pair (or two) of secret keys for encryption and decryption. The pair of secret keys is one public key and the other private key. As the name implies, public keys can be made public at will, and private keys can only be held by oneself. The ciphertext encrypted with the public key can be decrypted only by using the corresponding private key, and the ciphertext encrypted with the private key can be decrypted only by using the corresponding public key. The disadvantage of this encryption algorithm is its slow speed.

SSL/TLS uses the combination of the two encryption modes. The server encrypts a symmetric secret key using asymmetric encryption and sends it to the client. The client decrypts the symmetric secret key using the public key. Then the two sides use symmetric encryption communication. In this way, the advantages of symmetric encryption and asymmetric encryption can be taken into account.

Certificate of 2.

Another problem with the SSL/TLS encryption described above is the inability to verify the correctness of the public key. For example, the public key has been tampered with while the server is transferring it to the client. To solve this problem, a digital certification authority and its public key certificates can be used. Digital authentication authority (DCA) is a third-party organization trusted by both client and server. The public key certificate issued by dCA is difficult to be tampered with. Let’s take a brief look at the certificate workflow. First, the server operator applies for a public secret certificate to a digital certificate authority. After the request is approved, the digital certificate Authority digitally signs the public key and binds the public key to the digital certificate. The server sends the certificate to the client. The client receiving the certificate can use the digital certificate Authority’s public key to verify the digital signature. If the authentication succeeds, the certificate is trusted.

First, the client (browser or operating system) will have the public key of a commonly used trusted digital certificate authority built in. In addition to the public key, the digital certificate also contains the identity information of the server, so we can also use the certificate to confirm the identity of the other party.

3. The generation process of symmetric secret keys

This process, in fact, is very complicated. I’ll simplify it as follows.