The Node series covers HTTP and HTTPS in detail, and summaries of related HTTP series interview questions

HTTP Protocol Overview

HTTP is short for Hyper Text Transfer Protocol. It is used to Transfer hypertext from the World Wide Web server to the local browser.

HTTP is a TCP/ IP-based communication protocol to transfer data (HTML files, image files, query results, etc.).

HTTP is an object-oriented protocol belonging to the application layer. Because of its simple and fast way, it is suitable for distributed hypermedia information system. It was put forward in 1990. After several years of use and development, it has been constantly improved and expanded. Currently the sixth version of HTTP/1.0 is used in the WWW. The standardization of HTTP/1.1 is under way, and the proposal of HTTP-NG(Next Generation of HTTP) has been put forward.

The HTTP protocol works on a client-server architecture. As the HTTP client, the browser sends all requests to the HTTP server, namely the WEB server, through the URL. The Web server sends response information to the client based on the received request.

Five features of HTTP

Client/server mode is supported.
Simple and fast: when a client requests services from the server, it only needs to send the request method and path. The commonly used request methods are GET, HEAD and POST. Each method specifies a different type of contact between the client and the server. Because HTTP protocol is simple, the HTTP server program size is small, so the communication speed is very fast.
Flexibility: HTTP allows the transfer of any type of data object. The Type being transferred is marked by content-Type.
Connectionless: The meaning of connectionless is to limit processing to one request per connection. The server disconnects from the customer after processing the request and receiving the reply from the customer. In this way, transmission time can be saved. The reason for doing this early on was to ask for fewer resources and pursue faster. Later, Connection: keep-alive was used to implement the long Connection
Stateless: HTTP is a stateless protocol. Stateless means that the protocol has no memory for transaction processing. The lack of state means that if the previous information is needed for subsequent processing, it must be retransmitted, which can result in an increase in the amount of data transferred per connection. On the other hand, the server responds faster when it doesn’t need the previous information.

The URL of the HTTP

HTTP uses Uniform Resource Identifiers (URIs) to transfer data and establish connections. A URL is a special type of URI that contains enough information to find a resource

URL is an IP address used to identify a resource on the Internet. The following uses the URL as an example to describe the components of a common URL

www.xxx.com:8080/news/1.html…

As you can see from the above URL, a complete URL consists of the following parts:

Protocol part: The protocol part of the URL is HTTP:, which indicates that the web page uses HTTP. There are many protocols that can be used on the Internet, such as HTTP, FTP, and so on. In this example, HTTP is used. The “//” after “HTTP” is the delimiter
Domain name: The domain name of the URL is www.aspxfans.com. In a URL, an IP address can also be used as a domain name
Port: The domain name is followed by the port. The domain name and port are separated by colons (:). The port is not a required part of a URL, and the default port is used if the port part is omitted
Virtual directory part: the virtual directory part begins with the first slash after the domain name and ends with the last slash. The virtual directory is also not a required part of a URL. The virtual directory in this case is “/news/”
File name: from the last slash after the domain name to? Is the filename part, if there is no? Is the file part, if there is no “?” And “#”, then from the last “/” after the domain name to the end, is the filename part. In this case, the file name is index.asp. The file name portion is also not a required part of a URL, and if omitted, the default file name is used
Anchor part: From the “#” to the end, it is the anchor part. The anchor part in this case is “name”. The anchor part is also not a required part of a URL
Parameter part: From “? The part between the beginning and “#” is the parameter part, also known as the search part, the query part. The parameter part in this example is “boardID=5&ID=24618&page=1”. A parameter can have multiple parameters separated by ampersand (&)

The difference between URLS and URIs

A UNIFORM Resource Identifier (URI) is a uniform resource identifier that uniquely identifies a resource.

Each resource available on the Web, such as HTML documents, images, video clips, and programs, is a URI to locate the resource. The URI generally consists of three parts: (1) the naming mechanism for accessing the resource, (2) the host name for storing the resource, and (3) the name of the resource itself, which is represented by the path and emphasizes the resource.

A URL is a Uniform Resource locator. A URL is a specific URI that can be used to identify a resource and specify how to locate the resource.

A URL is a string of characters used to describe information resources on the Internet. It is used in various WWW client and server programs, especially the famous Mosaic program. Using URLS can use a unified format to describe various information resources, including files, server addresses and directories. A URL consists of three parts: (1) protocol (or service mode), (2) IP address (sometimes including port number) of the host where the resource resides, and (3) specific address of the host resource. Such as directory and file name

URN, Uniform resource Name, identifies the resource by name, for examplemailto:[email protected].

Uris are an abstract, high-level concept that defines a uniform resource identity, while urls and UrNs are ways of identifying specific resources. Urls and UrNs are both urIs. Broadly speaking, every URL is a URI, but not necessarily every URI is a URL. This is because URIs also include a subclass, the Uniform Resource Name (URN), which names resources but does not specify how to locate them. The mailto, news, and ISBN URIs above are examples of UrNs.

In Java URIs, an instance of a URI can represent either absolute or relative, as long as it follows the syntax rules for URIs. The URL class, on the other hand, not only conforms to semantics but also contains information to locate the resource, so it cannot be relative. In the Java class library, the URI class does not contain any methods to access resources; its only function is parsing. In contrast, the URL class opens a stream to the resource.

The HTTP request

As can be seen from the figure above, an HTTP request consists of three parts: request line, message header and request body.

HTTP request status line

The request line consists of request Method, URL field and HTTP Version. In general, the request line defines the request mode, address and HTTP protocol Version of the request. For example:

GET/example. HTTP / 1.1 HTML (CRLF)Copy the code

HTTP protocol methods include:

GETRequest:To obtainRequest-uri Specifies the resource identified
POST: after the resource identified by the request-uriincreaseThe new data
HEAD: Requests access to the resource identified by request-URIResponse message header
PUT: Request serverStore or modifyA resource identified by a request-URI
DELETE: Request serverdeleteRequest-uri Specifies the resource identified
TRACE: The request server sends back the received request informationTesting or diagnosis
CONNECT: Reserved for future use
OPTIONS: Requests queries about server performance, or about resource-related options and requirements

The HTTP request header

The message header consists of a series of key-value pairs that allow the client to send additional information to the server, or information about the client itself, including:

Header	explain	The sample
Accept	Specifies the type of content that the client can receive	Accept: text/plain, text/html
Accept-Charset	A set of character encodings acceptable to the browser	Accept-Charset: iso-8859-5,utf-8
Accept-Encoding	Specifies the type of web server content compression encoding that the browser can support	Accept-Encoding: compress, gzip
Accept-Language	Browser acceptable language	Accept-Language: en,zh
Accept-Ranges	You can request one or more subscope fields of a web page entity	Accept-Ranges: bytes
Authorization	Type of the HTTP authorization certificate	Authorization: Basic QWxhZGRpbjpvcGVuIHNlc2FtZQ==
Cache-Control	Specify the caching mechanism that requests and responses follow	Cache-Control: no-cache
Connection	Indicates whether persistent connections are required (HTTP 1.1 does this by default)	Connection: close
Cookie	When an HTTP request is sent, all cookie values stored under the domain name of the request are sent to the Web server	Cookie: $Version=1; Skin=new;
Content-Length	The content length of the request	Content-Length: 348
Content-Type	MIME information that corresponds to the entity being requested	Content-Type: application/x-www-form-urlencoded
Date	The date and time the request was sent	Date: Tue, 15 Nov 2010 08:12:31 GMT
Expect	The specific server behavior requested	Expect: 100-continue
From	Email address of the user who made the request	From: [email protected]
Host	Specifies the domain name and port number of the requested server	Host: www.zcmhi.com
If-Match	This is valid only if the request content matches the entity	If – the Match: “737060 cd8c284d8af7ad3082f209582d”
If-Modified-Since	If the part of the request is modified after the specified time, the request succeeds; if it is not modified, the 304 code is returned	If-Modified-Since: Sat, 29 Oct 2010 19:43:31 GMT
If-None-Match	If the content has not changed, the 304 code is returned with the Etag sent by the server. The Etag is compared with the Etag returned by the server to determine whether it has changed	If None – Match: “737060 cd8c284d8af7ad3082f209582d”
If-Range	If the entity has not changed, the server sends the missing part of the client, otherwise sends the whole entity. The parameter is also Etag	If – Range: “737060 cd8c284d8af7ad3082f209582d”
If-Unmodified-Since	The request succeeds only if the entity has not been modified after the specified time	If-Unmodified-Since: Sat, 29 Oct 2010 19:43:31 GMT
Max-Forwards	Limit the amount of time messages can be sent through proxies and gateways	Max-Forwards: 10
Pragma	Used to contain implementation-specific instructions	Pragma: no-cache
Proxy-Authorization	Certificate of authorization to connect to the agent	Proxy-Authorization: Basic QWxhZGRpbjpvcGVuIHNlc2FtZQ==
Range	Only a portion of the entity is requested, specifying scope	Range: bytes=500-999
Referer	The address of the previous web page, followed by the current requested web page, is the incoming path	Referer: www.zcmhi.com/archives/71…
TE	The client is willing to accept the transmission code and notifies the server to accept the end plus header message	TE: trailers,deflate; Q = 0.5
Upgrade	Specify some transport protocol to the server for the server to convert (if supported)	Upgrade: HTTP/2.0, SHTTP/1.3, IRC/6.9, RTA/ X11
User-Agent	User-agent contains the information about the User that sends the request	The user-agent: Mozilla / 5.0 (Linux; X11)
Via	Notification intermediate gateway or proxy server address, communication protocol	Via: Fred, 1.0 to 1.1nowhere.com(Apache / 1.1)
Warning	Warning information about message entities	Warn: 199 Miscellaneous warning

HTTP Request body

The GET method does not have a request body except when a POST request is sent.

The HTTP response

Similar to HTTP requests, here is the first diagram:

The HTTP response also consists of three parts, including the status line, the message header, and the response body.

HTTP response status line

The status line also consists of three parts, including the HTTP protocol version, the status code, and the text description of the status code. Such as:

HTTP/1.1 200 OK (CRLF)Copy the code

HTTP response status code

The status code consists of three digits. The first number defines the category of the response and has five possible values:

1xx:instructions– Indicates that the request is received and processing continues
2xx:successful– Indicates that the request is successfully received, understood, or accepted
3xx:redirect– Further action must be taken to complete the request
4xx:Client error– The request has syntax errors or the request cannot be implemented
5xx:Server side error– The server failed to fulfill a valid request

Common status codes, status descriptions, instructions:

200:OK– The client request succeeds
400:Bad Request– The client request has syntax errors and cannot be understood by the server
401:Unauthorized– Request unauthorized, this status code must andWWW-AuthenticateHeader fields are used together
403:Forbidden– The server received the request but refused to provide service
404:Not Found– The requested resource does not exist, eg: An incorrect URL is entered
500:Internal Server Error– An unexpected error occurred on the server
503:Server Unavailable– The server cannot process client requests. However, the server may recover after a period of time

HTTP response status code description

StatusCode	StatusCode semantic	Product description
100	Continue	To continue. The client should continue with its request
101	Switching Protocols	Switch protocol. The server switches protocols based on client requests. You can only switch to a more advanced protocol, for example, the new version of HTTP
200	OK	The request succeeded. Typically used for GET and POST requests
201	Created	Has been created. The new resource was successfully requested and created
202	Accepted	Has been accepted. The request has been accepted, but processing is not complete
203	Non-Authoritative Information	Unauthorized information. The request succeeded. The meta information returned is not the original server, but a copy
204	No Content	No content. The server processed successfully, but did not return content. You can ensure that the browser continues to display the current document without updating the web page
205	Reset Content	Reset the content. The server is successful, and the user end (for example, browser) should reset the document view. Use this return code to clear the browser’s form field
206	Partial Content	Part of the content. The server successfully processed some of the GET requests
300	Multiple Choices	A variety of options. The requested resource can include multiple locations, and a list of resource characteristics and addresses can be returned for user terminal (e.g., browser) selection
301	Moved Permanently	Permanently move. The requested resource has been permanently moved to the new URI, the return message will include the new URI, and the browser will automatically redirect to the new URI. Any future new requests should be replaced with a new URI
302	Found temporary movement.	Similar to 301. But resources are moved only temporarily. The client should continue to use the original URI
303	See Other	Look at other addresses. Similar to 301. Use GET and POST requests to view
304	Not Modified	Unmodified. The requested resource is not modified, and the server does not return any resources when it returns this status code. Clients typically cache accessed resources by providing a header indicating that the client wants to return only resources that have been modified after a specified date
305	Use Proxy	Use a proxy. The requested resource must be accessed through a proxy
306	Unused	An invalid HTTP status code
307	Temporary Redirect	Temporary redirect. Similar to 302. Use GET to request redirection
400	Bad Request	Client request syntax error, server cannot understand
401	Unauthorized	The request requires user authentication
402	Payment Required	Reserved for future use
403	Forbidden	The server understands the request from the requesting client, but refuses to execute the request
404	Not Found	The server could not find the resource (web page) based on the client’s request. With this code, a web designer can set up a personalized page that says “the resource you requested could not be found.
405	Method Not Allowed	The method in the client request is disabled
406	Not Acceptable	The server could not complete the request based on the content nature of the client request
407	Proxy Authentication Required	The request requires the identity of the broker, similar to the 401, but the requester should use the broker for authorization
408	Request Time-out	The server waited for a request sent by the client for a long time and timed out. Procedure
409	Conflict	The server may return this code after completing a PUT request from the client. A conflict occurred when the server processed the request
410	Gone	The resource requested by the client does not exist. 410 differs from 404 in that if a resource previously had a 410 code that is now permanently deleted, the site designer can specify a new location for the resource through the 301 code
411	Length Required	The server cannot process the content-length message sent by the client
412	Precondition Failed	A prerequisite error occurred when the client requested information
413	Request Entity Too Large	The request was rejected because the requested entity was too large for the server to process. To prevent continuous requests from clients, the server may close the connection. If the server is temporarily unable to process it, a retry-after response is included
414	Request-URI Too Larg	The request URI is too long (usually a url) for the server to process
415	Unsupported Media Type	The server could not process the media format attached to the request
416	Requested range not satisfiable	The scope requested by the client is invalid
417	Expectation Failed	The server cannot satisfy Expect’s request headers
500	Internal Server Error	The server had an internal error and could not complete the request
501	Not Implemented	The server did not support the requested functionality and could not complete the request
502	Bad Gateway	A server acting as a gateway or proxy received an invalid request from a remote server
503	Service Unavailable	The server is temporarily unable to process client requests due to overloading or system maintenance. The length of the delay can be included in the server’s retry-after header
504	Gateway Time-out	The server acting as a gateway or proxy did not get the request from the remote server in time
505	HTTP Version not supported	The server did not support the HTTP version of the request and could not complete the processing

HTTP response packet

HTTP and HTTPS

The shortage of the HTTP

Communications use clear text (not encryption) and the content can be eavesdropped
The identity of the communicating party is not verified, so it is possible to encounter camouflage
The integrity of the message could not be proved, so it may have been tampered with

HTTPS is introduced

HTTP has no encryption mechanism, but it can be used in combination with Secure Socket Layer (SSL) or Transport Layer Security (TLS) to encrypt HTTP traffic. Belongs to communication encryption, that is, encryption in the entire communication line.

HTTP + Encryption + Authentication + Integrity Protection = HTTP Secure (HTTPS) Code replicationCopy the code

HTTPS uses a hybrid encryption mechanism that uses both shared key encryption (symmetric) and public key encryption (asymmetric). If the key can be exchanged securely, it is possible to consider using public-key encryption only for communication. However, public key encryption is slower than shared key encryption.

Therefore, we should make full use of their respective advantages and combine a variety of methods for communication. Public key encryption is used in the stage of exchanging key, and shared key encryption is used in the stage of establishing communication exchange message.

The HTTPS handshake process is described as follows:

The browser sends its own set of encryption rules to the site.

The server gets the browser public key to copy the codeCopy the code

The site selects a set of encryption and HASH algorithms and sends its identity back to the browser in the form of a certificate. The certificate contains information such as the website address, encrypted public key, and certificate authority.
```
The browser gets the server's public key and copies the codeCopy the code
```
After obtaining a web certificate, the browser does the following:

(a). Verify the validity of the certificate (whether the authority issuing the certificate is legitimate, whether the website address contained in the certificate is consistent with the address being accessed, etc.). If the certificate is trusted, a small lock will be displayed in the browser bar, otherwise the certificate will be given a hint that it is not trusted.

(b). If the certificate is trusted, or if the user accepts an untrusted certificate, the browser generates a random number of passwords (the key for subsequent communication) and encrypts them with the public key provided in the certificate (shared key encryption).

(c) Use the agreed HASH to calculate the handshake message, encrypt the message with the generated random number, and finally send all the previously generated information to the website.
```
Browser authentication -> Random password server public key encryption -> communication key Communication key -> serverCopy the code
```
After the web site receives data from the browser, it does the following:

(a). Use its own private key to decrypt the information and retrieve the password. Use the password to decrypt the handshake message sent by the browser and verify whether the HASH is consistent with that sent by the browser.

(b). Encrypt a handshake message with a password and send it to the browser.
```
The server decrypts the random password with its own private key -> decrypts the handshake message with a password (shared key communication) -> verifies that HASH is consistent with the browser (verifies the browser)Copy the code
```

The shortage of the HTTPS

The encryption and decryption process is complex, resulting in slow access
Encryption requires subscribers to pay certification authorities
Use HTTPS for requests throughout the page

Features and differences of HTTP1.0, HTTP1.1, and Http2.0

As long as the interview asks you about HTTP, this is usually the prerequisite for the interviewer.

Http1.0 features

Stateless: The server does not track the requested status
No connection: The browser establishes a TCP connection for each request

stateless

For stateless features, the cookie/session mechanism can be used for identity authentication and status recording

There is no connection

There are two types of performance resulting from no connection

Unable to reuse links

Each time a request is sent, TCP connections need to be made sequentially (i.e., three shakes and four shakes), which makes the network utilization very low
Adversary block

Http1.0 states that the next request cannot be sent until the response to the previous request arrives. If the previous request blocks, the subsequent request will also block. This is called head blocking

Http1.1 features

To address the performance shortcomings of HTTP1.0, a workaround has emerged for HTTP1.1:

Long connection: The Connction field is added, and the keep-alive value can be set to keep the connection open
Pipelining: Based on the long connection above, pipelining can continue to send subsequent requests without waiting for the first response, but the response is returned in the order requested. That is, multiple requests can be sent, but the responses are processed sequentially.
Cache processing: Added field cache-control
Breakpoint transmission

A long connection

Http1.1 maintains long connections by default. When data is transferred, keep TCP connections open and continue to transfer data over this channel

pipelining

Based on long connections:

TCP is not disconnected, using the same channel

request1> response1- > request2> response2- > request3> response3
Copy the code

Pipelined request response:

request1- > request2- > request3> response1-- > the response2-- > the response3
Copy the code

Even if the server prepares response 2 first, response 1 is returned in the order requested

Although piped, multiple requests can be sent at once, but the responses are still returned sequentially, still does not solve the problem of head blocking.

Cache handling

When a browser requests a resource, it checks whether there is a cached resource. If there is a cached resource, the browser directly obtains the cached resource and does not send another request. If there is no cached resource, the browser sends a request

Control by setting the field cache-control

Breakpoint transmission

When uploading or downloading resources, divide the resources into multiple parts and upload or download them separately. If a network fault occurs, you can continue to upload or download the resources from the places where the resources have been uploaded or downloaded, instead of starting from the beginning to improve efficiency

The two parameters that are implemented in the Header, the Range that the client sends the request and the content-range that the server responds to

Http2.0 features

Binary framing
Multiplexing: Sending requests and responses simultaneously over a shared TCP connection
The head of compression
Server push: The server can push additional resources to the client without an explicit request from the client

Binary framing

Divide all transmitted information into smaller messages and frames and encode them in binary format

multiplexing

Based on binary framing, where all access under the same domain name is routed through the same TCP connection, HTTP messages are broken up into separate frames, sent out of order, and the server reassembles the messages based on identifiers and headers

The difference between

The main difference between HTTP1.0 and HTTP1.1 is the transition from no connection to long connection
The main difference between Http2.0 and 1.x is multiplexing

The interview questions

Question 1: What happens when the browser enters the URL?

The client connects to the Web server

An HTTP client, typically a browser, establishes a TCP socket connection with the HTTP port (403 by default) of the Web server. For example, www.baidu.com.

2. Send an HTTP request

Through the TCP socket, the client sends a text request packet to the Web server. A request packet consists of the request line, the request header, the blank line, and the request data.

3. The server accepts the request and returns an HTTP response

The Web server parses the request and locates the requested resource. The server writes the resource copy to the TCP socket, which is read by the client. A response consists of a status line, a response header, a blank line, and response data.

4. Release the TCP connection

If the connection mode is set to close, the server actively closes the TCP connection, and the client passively closes the connection to release the TCP connection. If the Connection mode is Keepalive, the connection is kept for a period of time, during which requests can be received.

5. The client browser parses THE HTML content

The client browser first parses the status line to see the status code indicating whether the request was successful. Each response header is then parsed, and the response header tells the following several bytes of HTML document and the document’s character set. The client browser reads the response data HTML, formats it according to the HTML syntax, and displays it in the browser window.

For example, enter the URL in the browser address bar and press Enter. The following process occurs:

1. The browser requests the DNS server to resolve the IP address corresponding to the domain name in the URL.

2. After the IP address is resolved, establish a TCP connection with the server based on the IP address and default port 403

3. The browser sends an HTTP request to read the file (the file following the domain name in the URL). The request packet is sent to the server as the third packet of the TCP three-way handshake.

4. The server responds to the browser request and sends the corresponding HTML text to the browser;

5. Release TCP connections.

6. The browser will display the HTML text;

Second question: since we talked about browser rendering, let’s talk about the principle and process of browser rendering web pages

The principle of

In fact, browser rendering principle as long as you understand the key rendering path

The key render path is the entire process by which the browser receives the requested HTML, CSS, JavaScript and other resources, then parses, builds the tree, renders the layout, draws, and finally renders the interface to the user

Take a look at the WebPKit flow:

To summarize the process:

The browser parses the retrieved HTML document into a DOM tree
The CSS markup is processed to form the cascading style sheet model CSSOM
Combine DOM and CSSOM into a render tree representing the columns of objects to be rendered
Each element of the render tree contains computed content, called a layout. The browser uses a streaming approach that allows all elements to be laid out in a single drawing operation
Drawing the nodes of the render tree onto the screen is a step called painting
Display content to a web page

In fact, the above summary will be asked a lot, because every interviewer will ask different questions, so it is best to prepare for the interview.

Let’s say a few more questions, and if you want to see them, you can summarize them by yourself:

What is the difference between HTTP and HTTPS? (Mentioned in the article)
Why is HTTPS safe? (Mentioned in the article)
Do you understand how symmetric and asymmetric encryption algorithms perform encryption operations? (Check by yourself)
This section describes the HTTPS handshake process.
Man-in-the-middle attack on HTTPS

The HTTP family is a very large area of knowledge. If you want to learn, you can recommend the illustrated HTTP. SAO Nian ！！！！