The old article

This article has published 2019.01.05 to www.jianshu.com/p/97519d170…

preface

As far as I know, many developers who have been working for a long time have a very one-sided or even biased understanding of HTTP and often confuse HTTP with TCP, so I think I need to clarify these two concepts. In the first two articles, we described transport layer protocols TCP, UDP and Socket programming. From this article we begin to move to the application layer, I will launch a series of articles on the application layer, including the application layer protocol HTTP, HTTPS and http-based network request framework use and source code parsing.

1 HTTP

1.1 HTTP overview

HTTP is the full name of HyperText Transfer Protocol, translated into Chinese is the name of the HyperText Transfer Protocol, I was a bit confused at the time, this is why many developers confuse TCP and HTTP, HTTP is not the application layer Protocol? Why is it hypertext Transfer Protocol? I think the name is really a little misleading, it’s basically an application layer protocol based on TCP, HTTP encapsulates the data in the protocol and then it’s transmitted by TCP, HTTP should be properly translated as hypertext Transfer protocol, that’s what it’s called, So what does HTTP do? We also said the front, HTTP is transmitted entirely based on TCP, TCP is byte stream oriented transport protocol, the receiver receive all bytes, so if the sender and the receiver is not agreed a deal to parse those bytes, even if the receiving party receives data so he also look not to understand, and HTTP is belong to this agreement as agreed upon by both parties, Both Client and Server must be strictly observed. I this description is enough plain language (゜-゜, ゜, I believe you must be smart enough to understand the significance of HTTP and the difference between TCP and the ゜, in addition, the application layer protocol is not only HTTP, there are many good, such as FTP, SMTP and so on.

1.2 HTTP message

HTTP packets are classified into request packets and response packets

The request message

The request message can be divided into three parts: header, blank line, and packet body. The header of the packet can be subdivided into two parts, namely, the request line and the request header field. I drew a sketch to represent the HTTP request packet, as shown in Figure 1-1

I will explain some of the professional vocabulary in detail later in this article, but you only need to understand the HTTP packet structure

  • Request line: contains three parts: request method, Uri path, and HTTP version
  • Request header fields: The header fields can be zero or more
  • Blank line: newline character (CR+LF), used to distinguish the packet head from the packet body
  • Body: Data that needs to be sent

The first line of the packet is the request line, the request header field between the empty line and the request line, and the request body after the empty line. The actual structure of the request message is shown in Figure 1-2, which is extracted from this article

The response message

Request packets are also divided into three parts, namely, packet header, blank line, and packet body. The header of the message can also be subdivided into two parts, namely, the status line and the response header field. The structure is basically the same as that of the request message, so it is no longer represented by a graph (drawing is really laborious -).

  • Status line: contains the HTTP version, status code, and description
  • Response header fields: The header fields can be zero or more
  • Blank line: newline character (CR+LF), used to distinguish the packet head from the packet body
  • Body: Data that needs to be responded to

The first line of the message is the status line, the response header field between the empty line and the status line, and the response body after the empty line

1.3 HTTP Header Fields

The header can be divided into request/status lines and header fields. The request/status lines are some HTTP versions, such as request methods, which occupy only one line, while the header fields can be zero or more. The header field consists of a keyword/value pair, one for each line. The keyword and value are separated by colons (:), which encapsulate some basic information about the request or response. Here are some common header fields:

Accept: MIME type accepted by the client. Accept-charset: character set acceptable to the client. Accept-encoding: Data Encoding that can be decoded by the client, such as gzip. Accept-language: The Language desired by the client, used when the server can provide more than one Language version. Authorization: Indicates the Authorization information, usually in the wwW-Authenticate header response sent by the server. Connection: Allows clients and servers to specify options related to request/response connections. Content-length: indicates the Length of the request message body. Cookie: Sets the Cookie, which is one of the most important request headers. From: The email address of the request sender, used by special client programs

There are many HTTP header fields, but I will not list them all. Interested students can learn about them. The header field can also be divided into request header field, response header field, general header field and entity header field.

1.4 HTTP Status Code

A status code is the result of a request that the server uses to describe when a client requests it. By analyzing the status code, the client can obtain a status of the request.

Status code classification:

  • 1XX: The category is Informational(Informational status code), indicating that the request is being processed
  • 2xx: indicates that the request is successful. The most common value is 200 OK
  • 3xx: indicates a redirect, indicating that the client request ear URI has changed
  • 4xx: indicates a client error, indicating that the client error occurred in your area
  • 5xx: Server error, indicating that the client error occurred on the server

Status codes are easy to understand without thinking, so I won’t list them one by one. Those who are interested can understand them by themselves.

1.5 HTTP cache

HTTP establishes a connection for each request and disconnects after the request is completed. If a request for the same resource is continuously made, the connection needs to be established and disconnects for several times, which is not necessary. Therefore, HTTP also has the caching function, which improves the request efficiency to a certain extent. HTTP caches can be roughly divided into two types: force caches and comparison caches

Mandatory cache

Expires: The first time a client makes a request, the server returns a header with an Expires field that represents the expiration date. The client saves the date. The next time it requests the resource, it determines whether the current date has exceeded the expiration date. The Expires date is retrieved from the server again in the following format:

Expires: Thu, 16 Mar 2018 12:31 GMTCopy the code

Cache-control: Caching Expires is a security issue. If the client and the server date do not match, there can be a BUG, so cache-control was introduced after HTTP1.1 to replace Expires. Cache-control has the following values:

Private: indicates that the client can cache. The default value is private. Public: Both client and proxy servers can cache. Max-age: XXX: indicates that the cache content will be invalid after XXX seconds. If age is specified, the default value is private. No-cache: Use a comparative cache to validate cached data. No-store: no cache is performed.

Cache-control is returned in the following format:

Cache-Control: public ,max-age=31536000
Copy the code

Before HTTP1.1, Expires overrode cache-Control when both existed. After HTTP1.1, the reverse is true

Compared to the cache

Contrast caching, as opposed to mandatory caching, uses the server to determine whether the cache is needed or not, which means you need to make a request to the server. Some of you might be wondering, “If you’re making a request to the server, what’s the point of caching?” Well, if the server decides that the cache is valid and returns only one response head and no response body, it’s like if I carry 10 bricks to the second floor and I carry 100 bricks to the second floor, it’s not going to take the same time. Last-modified/if-modified-since:

  • Last-modified: When the client makes the first request, the server returns a header field named last-Modified to indicate the Last modification time of the resource. The client needs to save the date.
  • The if-modified-since: After receiving the request, the server will determine whether the cache is valid. If the cache is valid, 304 will be returned to tell the client to read the data from its local cache. Otherwise, return 200 and update last-Modified with the new response body. The client saves last-Modified and data after receiving the response.

Etag/If None – Match:

Etag is a resource identifier. The Etag value will change after the resource is Modified. The Etag/ if-none-match process is basically the same as last-modified/if-modified-since. The only difference is that when the server returns 304, Etag is added to the response header and last-Modified is not.

Last-modified/if-modified-since is similar to Etag/ if-none-match. There are about three things:

  • Some resources may change periodically, but simply change the time of day without changing the content, and it is easy to return data in the mistaken belief that the content has changed.
  • Last-Modified/If-Modified-SinceThe time unit is accurate to the second, so it is impossible to guarantee the accuracy of the accuracy within seconds.
  • Some servers do not know exactly when a file was last modified.

In general, Expires and cache-control have the highest priority, followed by Etag/ if-none-match, and last-modified/if-modified-since. Let me use a diagram to illustrate the process of implementing HTTP caching through contrast caching and mandatory caching, as shown in Figure 1-5

Drawing is really killing me, from this article

2 HTTPS

HTTPS supports Hyper Text Transfer Protocol over Secure Socket Layer(HTTP Secure Socket Layer). HTTPS makes HTTP transmission more Secure. Here are some security risks associated with HTTP:

  • Steal: HTTP is plaintext transmission, if there are bank cards and passwords in HTTP, criminals can easily steal
  • Camouflage: if the client is communicating with the server, criminals can disguise themselves as the client and the server from the middle.
  • Tamper with: if be having network turn one’s head, lawless element can tamper with the bank card number that receives money for his bank card number.

The above three are the main threats to HTTP, and HTTPS can basically avoid these three dangers. How to secure HTTPS transmission? The network structure is complex, so it is impossible to prevent data from being intercepted by an intermediary. However, data can be encrypted. Even if an intermediary intercepts data, he cannot understand it.

2.1 Symmetric Encryption

Before describing symmetric encryption, I first popularize the concept of encryption algorithm and secret key, because some students are easy to confuse the concept of encryption algorithm and secret key. For example, encryption algorithm is equivalent to the structure in the lock, and the secret key is equivalent to the key, which is reflected in the code:

public String encryption(String content0,int key){ ... . .return content1;
}
Copy the code

Encryption (String content0,int key) encryption(String content0,int key) encryption(String content0,int key) encryption(String content0,int key)

Symmetric encryption:

In the encryption method of single-key cryptosystem, the same key can be used to encrypt and decrypt information at the same time. This encryption method is called symmetric encryption, also called single-key encryption.

To put it simply, the client and the server share a set of algorithms and secret keys. Before transmitting data, the client encrypts the data through the shared secret key, and then sends the data. After receiving the data, the receiver parses the data through the shared secret key.

Advantages: Simple, direct and efficient. Disadvantages: the secret key is not suitable for transmission on the network, because some algorithms after getting the secret key may be pojie(Chinese prompt I do not give violation ~ ~ ~), so each client must have a unique shared secret key, and the server to save all clients to share the secret key, but also increase the pressure of the server.

2.2 Asymmetric encryption

Let’s take a look at the official explanation for asymmetric encryption:

Asymmetric encryption algorithms require two keys: a publickey and a privatekey. The public key and private key are a pair. If the public key is used to encrypt data, only the corresponding private key can be used to decrypt data. If data is encrypted with a private key, it can only be decrypted with the corresponding public key. Because encryption and decryption use two different keys, the algorithm is called asymmetric encryption. The basic process of asymmetric encryption algorithm to realize confidential information exchange is as follows: Party A generates a pair of keys and discloses one of them to other parties as a public key; Party B who has obtained the public key encrypts the confidential information with the public key and then sends it to Party A. Party A then uses another special key saved by itself to decrypt the encrypted information.

Advantages: High flexibility because public and private keys exist. The secret key can be transmitted over the network. Disadvantages: High algorithm complexity, time-consuming encryption and decryption

The practical application

Symmetric encryption and asymmetric encryption have their own advantages and disadvantages, that can combine the advantages of the two to carry out a secure transmission? Yes, I will talk about the use process of symmetric encryption combined with asymmetric encryption:

  • The client requests the server, and the server returns a public key
  • The client generates a symmetric encryption key and sends it to the server through the public key
  • The server receives the symmetric encryption key and decrypts it using the private key

After the above three steps, the server and client can determine a secret key, which can be encrypted and decrypted in the subsequent transmission to ensure that the data is not stolen. This encryption mode eliminates the drawbacks that symmetric encryption keys cannot be transmitted over the network and the low efficiency of asymmetric encryption. In actual scenarios, data is basically encrypted in this mode.

A digital signature

Data integrity can be ensured through digital signature during network transmission. The specific process is as follows: (Assume that the client and server have completed the determination of symmetric encryption key)

  • The server hashes the data to get a value before sending it and encrypts the data with a secret key
  • After receiving the data, the client decrypts the data, hashes the data, and then decrypts the hash value of the server. Finally, the client compares whether the two hash values are the same. If they are different, the data is tampered during network transmission

2.3 the CA certification

The full name of a CA is Certificate Authority. It can issue digital certificates to enterprises to confirm their identities and revoke digital certificates. An SSL certificate is required in HTTPS. The CERTIFICATE is also a digital certificate issued by the CA. The certificate contains the public key, applicant and issuer information, and signature.

Specific certificate verification process:

  • The client requests the server
  • The server encrypts the certificate and digital signature with the private key, along with the public key, to the client
  • After receiving the certificate, the client uses the public key to decrypt the certificate and then goes to the CA for authentication
  • If the certificate is valid, hash the certificate and decrypt the digital signature. If the two hash values are consistent, the certificate is not impersonated.

In addition, CA certification is generally charged

2.4 the SSL

The big story is coming…… SSL stands for Secure Sockets Layer, and yes, it’s the heart of HTTPS, in other words HTTPS is HTTP+SSL, and SSL allows for HTTP Secure transport, The implementation principle is based on symmetric encryption, asymmetric encryption and CA authentication mentioned in the above summaries. Let me first use a figure to represent the entire SSL process, as shown in Figure 2-1:

Let me describe the process in words:

  • The client requests the server with the supported encryption algorithm
  • The server selects an encryption algorithm and returns the confirmed encryption algorithm, public key, and certificate to the client
  • After receiving the certificate, the client uses the public key to perform CA authentication. If the authentication succeeds, a symmetric secret key is generated and encrypted with the public key and returned to the server. Otherwise, the server is regarded as an illegitimate server.
  • The server decrypts and saves the symmetric secret key through the private key, and then the client and server can use the secret key to encrypt data transmission
  • Before sending data, the server performs a hash operation on the data to obtain a hash value, encrypts the data and the hash value, and sends the data to the client
  • After receiving the data, the client decrypts the data, performs a hash operation on the data to obtain a hash value, decrypts the hash value sent by the server, and compares the two hash values. If they are different, the data is considered to have been tampered

This is the SSL encryption process. SSL is actually located between the application layer and the transport layer. Therefore, SSL is not only HTTP, but also helps other application layer protocols to achieve secure transmission. That’s about it

conclusion

This article describes HTTP and HTTPS, HTTP content is relatively simple, are some conceptual knowledge, remember, HTTPS is between HTTP and TCP add SSL, SSL can achieve secure data transmission, remember, SSL can also be applied to other application layer protocols. This article ends its description of HTTP.