HTTP Protocol Overview

HTTP is short for Hyper Text Transfer Protocol. It is used to Transfer hypertext from the World Wide Web server to the local browser.

HTTP is a TCP/ IP-based communication protocol to transfer data (HTML files, image files, query results, etc.).

HTTP is an object-oriented protocol belonging to the application layer. Because of its simple and fast way, it is suitable for distributed hypermedia information system. It was put forward in 1990. After several years of use and development, it has been constantly improved and expanded. Currently the sixth version of HTTP/1.0 is used in the WWW. The standardization of HTTP/1.1 is under way, and the proposal of HTTP-NG(Next Generation of HTTP) has been put forward.

The HTTP protocol works on a client-server architecture. As the HTTP client, the browser sends all requests to the HTTP server, namely the WEB server, through the URL. The Web server sends response information to the client based on the received request.

Five features of HTTP

  1. Client/server mode is supported.

  2. Simple and fast: when a client requests services from the server, it only needs to send the request method and path. The commonly used request methods are GET, HEAD and POST. Each method specifies a different type of contact between the client and the server. Because HTTP protocol is simple, the HTTP server program size is small, so the communication speed is very fast.

  3. Flexibility: HTTP allows the transfer of any type of data object. The Type being transferred is marked by content-Type.

  4. Connectionless: The meaning of connectionless is to limit processing to one request per connection. The server disconnects from the customer after processing the request and receiving the reply from the customer. In this way, transmission time can be saved. The reason for doing this early on was to ask for fewer resources and pursue faster. Later, Connection: keep-alive was used to implement the long Connection

  5. Stateless: HTTP is a stateless protocol. Stateless means that the protocol has no memory for transaction processing. The lack of state means that if the previous information is needed for subsequent processing, it must be retransmitted, which can result in an increase in the amount of data transferred per connection. On the other hand, the server responds faster when it doesn’t need the previous information.

The URL of the HTTP

HTTP uses Uniform Resource Identifiers (URIs) to transfer data and establish connections. A URL is a special type of URI that contains enough information to find a resource

URL is an IP address used to identify a resource on the Internet. The following uses the URL as an example to describe the components of a common URL

www.xxx.com:8080/news/1.html…

As you can see from the above URL, a complete URL consists of the following parts:

  1. Protocol part: The protocol part of the URL is HTTP:, which indicates that the web page uses HTTP. There are many protocols that can be used on the Internet, such as HTTP, FTP, and so on. In this example, HTTP is used. The “//” after “HTTP” is the delimiter
  2. Domain name: The domain name of the URL is www.aspxfans.com. In a URL, an IP address can also be used as a domain name
  3. Port: The domain name is followed by the port. The domain name and port are separated by colons (:). The port is not a required part of a URL, and the default port is used if the port part is omitted
  4. Virtual directory part: the virtual directory part begins with the first slash after the domain name and ends with the last slash. The virtual directory is also not a required part of a URL. The virtual directory in this case is “/news/”
  5. File name: from the last slash after the domain name to? Is the filename part, if there is no? Is the file part, if there is no “?” And “#”, then from the last “/” after the domain name to the end, is the filename part. In this case, the file name is index.asp. The file name portion is also not a required part of a URL, and if omitted, the default file name is used
  6. Anchor part: From the “#” to the end, it is the anchor part. The anchor part in this case is “name”. The anchor part is also not a required part of a URL
  7. Parameter part: From “? The part between the beginning and “#” is the parameter part, also known as the search part, the query part. The parameter part in this example is “boardID=5&ID=24618&page=1”. A parameter can have multiple parameters separated by ampersand (&)

The difference between URLS and URIs

A UNIFORM Resource Identifier (URI) is a uniform resource identifier that uniquely identifies a resource.

Each resource available on the Web, such as HTML documents, images, video clips, and programs, is a URI to locate the resource. The URI generally consists of three parts: (1) the naming mechanism for accessing the resource, (2) the host name for storing the resource, and (3) the name of the resource itself, which is represented by the path and emphasizes the resource.

A URL is a Uniform Resource locator. A URL is a specific URI that can be used to identify a resource and specify how to locate the resource.

A URL is a string of characters used to describe information resources on the Internet. It is used in various WWW client and server programs, especially the famous Mosaic program. Using URLS can use a unified format to describe various information resources, including files, server addresses and directories. A URL consists of three parts: (1) protocol (or service mode), (2) IP address (sometimes including port number) of the host where the resource resides, and (3) specific address of the host resource. Such as directory and file name

URN, Uniform resource Name, identifies the resource by name, for examplemailto:[email protected].

Uris are an abstract, high-level concept that defines a uniform resource identity, while urls and UrNs are ways of identifying specific resources. Urls and UrNs are both urIs. Broadly speaking, every URL is a URI, but not necessarily every URI is a URL. This is because URIs also include a subclass, the Uniform Resource Name (URN), which names resources but does not specify how to locate them. The mailto, news, and ISBN URIs above are examples of UrNs.

In Java URIs, an instance of a URI can represent either absolute or relative, as long as it follows the syntax rules for URIs. The URL class, on the other hand, not only conforms to semantics but also contains information to locate the resource, so it cannot be relative. In the Java class library, the URI class does not contain any methods to access resources; its only function is parsing. In contrast, the URL class opens a stream to the resource.

The HTTP request

As can be seen from the figure above, an HTTP request consists of three parts: request line, message header and request body.

HTTP request status line

The request line consists of request Method, URL field and HTTP Version. In general, the request line defines the request mode, address and HTTP protocol Version of the request. For example:

GET/example. HTTP / 1.1 HTML (CRLF)Copy the code

HTTP protocol methods include:

  • GETRequest:To obtainRequest-uri Specifies the resource identified
  • POST: after the resource identified by the request-uriincreaseThe new data
  • HEAD: Requests access to the resource identified by request-URIResponse message header
  • PUT: Request serverStore or modifyA resource identified by a request-URI
  • DELETE: Request serverdeleteRequest-uri Specifies the resource identified
  • TRACE: The request server sends back the received request informationTesting or diagnosis
  • CONNECT: Reserved for future use
  • OPTIONS: Requests queries about server performance, or about resource-related options and requirements

The HTTP request header

The message header consists of a series of key-value pairs that allow the client to send additional information to the server, or information about the client itself, including:

Header explain The sample
Accept Specifies the type of content that the client can receive Accept: text/plain, text/html
Accept-Charset A set of character encodings acceptable to the browser Accept-Charset: iso-8859-5,utf-8
Accept-Encoding Specifies the type of web server content compression encoding that the browser can support Accept-Encoding: compress, gzip
Accept-Language Browser acceptable language Accept-Language: en,zh
Accept-Ranges You can request one or more subscope fields of a web page entity Accept-Ranges: bytes
Authorization Type of the HTTP authorization certificate Authorization: Basic QWxhZGRpbjpvcGVuIHNlc2FtZQ==
Cache-Control Specify the caching mechanism that requests and responses follow Cache-Control: no-cache
Connection Indicates whether persistent connections are required (HTTP 1.1 does this by default) Connection: close
Cookie When an HTTP request is sent, all cookie values stored under the domain name of the request are sent to the Web server Cookie: $Version=1; Skin=new;
Content-Length The content length of the request Content-Length: 348
Content-Type MIME information that corresponds to the entity being requested Content-Type: application/x-www-form-urlencoded
Date The date and time the request was sent Date: Tue, 15 Nov 2010 08:12:31 GMT
Expect The specific server behavior requested Expect: 100-continue
From Email address of the user who made the request From: [email protected]
Host Specifies the domain name and port number of the requested server Host: www.zcmhi.com
If-Match This is valid only if the request content matches the entity If – the Match: “737060 cd8c284d8af7ad3082f209582d”
If-Modified-Since If the part of the request is modified after the specified time, the request succeeds; if it is not modified, the 304 code is returned If-Modified-Since: Sat, 29 Oct 2010 19:43:31 GMT
If-None-Match If the content has not changed, the 304 code is returned with the Etag sent by the server. The Etag is compared with the Etag returned by the server to determine whether it has changed If None – Match: “737060 cd8c284d8af7ad3082f209582d”
If-Range If the entity has not changed, the server sends the missing part of the client, otherwise sends the whole entity. The parameter is also Etag If – Range: “737060 cd8c284d8af7ad3082f209582d”
If-Unmodified-Since The request succeeds only if the entity has not been modified after the specified time If-Unmodified-Since: Sat, 29 Oct 2010 19:43:31 GMT
Max-Forwards Limit the amount of time messages can be sent through proxies and gateways Max-Forwards: 10
Pragma Used to contain implementation-specific instructions Pragma: no-cache
Proxy-Authorization Certificate of authorization to connect to the agent Proxy-Authorization: Basic QWxhZGRpbjpvcGVuIHNlc2FtZQ==
Range Only a portion of the entity is requested, specifying scope Range: bytes=500-999
Referer The address of the previous web page, followed by the current requested web page, is the incoming path Referer: www.zcmhi.com/archives/71…
TE The client is willing to accept the transmission code and notifies the server to accept the end plus header message TE: trailers,deflate; Q = 0.5
Upgrade Specify some transport protocol to the server for the server to convert (if supported) Upgrade: HTTP/2.0, SHTTP/1.3, IRC/6.9, RTA/ X11
User-Agent User-agent contains the information about the User that sends the request The user-agent: Mozilla / 5.0 (Linux; X11)
Via Notification intermediate gateway or proxy server address, communication protocol Via: Fred, 1.0 to 1.1nowhere.com(Apache / 1.1)
Warning Warning information about message entities Warn: 199 Miscellaneous warning

HTTP Request body

The GET method does not have a request body except when a POST request is sent.

The HTTP response

Similar to HTTP requests, here is the first diagram:

The HTTP response also consists of three parts, including the status line, the message header, and the response body.

HTTP response status line

The status line also consists of three parts, including the HTTP protocol version, the status code, and the text description of the status code. Such as:

HTTP/1.1 200 OK (CRLF)Copy the code

HTTP response status code

The status code consists of three digits. The first number defines the category of the response and has five possible values:

  • 1xx:instructions– Indicates that the request is received and processing continues
  • 2xx:successful– Indicates that the request is successfully received, understood, or accepted
  • 3xx:redirect– Further action must be taken to complete the request
  • 4xx:Client error– The request has syntax errors or the request cannot be implemented
  • 5xx:Server side error– The server failed to fulfill a valid request

Common status codes, status descriptions, instructions:

  • 200:OK– The client request succeeds
  • 400:Bad Request– The client request has syntax errors and cannot be understood by the server
  • 401:Unauthorized– Request unauthorized, this status code must andWWW-AuthenticateHeader fields are used together
  • 403:Forbidden– The server received the request but refused to provide service
  • 404:Not Found– The requested resource does not exist, eg: An incorrect URL is entered
  • 500:Internal Server Error– An unexpected error occurred on the server
  • 503:Server Unavailable– The server cannot process client requests. However, the server may recover after a period of time

HTTP response status code description

StatusCode StatusCode semantic Product description
100 Continue To continue. The client should continue with its request
101 Switching Protocols Switch protocol. The server switches protocols based on client requests. You can only switch to a more advanced protocol, for example, the new version of HTTP
200 OK The request succeeded. Typically used for GET and POST requests
201 Created Has been created. The new resource was successfully requested and created
202 Accepted Has been accepted. The request has been accepted, but processing is not complete
203 Non-Authoritative Information Unauthorized information. The request succeeded. The meta information returned is not the original server, but a copy
204 No Content No content. The server processed successfully, but did not return content. You can ensure that the browser continues to display the current document without updating the web page
205 Reset Content Reset the content. The server is successful, and the user end (for example, browser) should reset the document view. Use this return code to clear the browser’s form field
206 Partial Content Part of the content. The server successfully processed some of the GET requests
300 Multiple Choices A variety of options. The requested resource can include multiple locations, and a list of resource characteristics and addresses can be returned for user terminal (e.g., browser) selection
301 Moved Permanently Permanently move. The requested resource has been permanently moved to the new URI, the return message will include the new URI, and the browser will automatically redirect to the new URI. Any future new requests should be replaced with a new URI
302 Found temporary movement. Similar to 301. But resources are moved only temporarily. The client should continue to use the original URI
303 See Other Look at other addresses. Similar to 301. Use GET and POST requests to view
304 Not Modified Unmodified. The requested resource is not modified, and the server does not return any resources when it returns this status code. Clients typically cache accessed resources by providing a header indicating that the client wants to return only resources that have been modified after a specified date
305 Use Proxy Use a proxy. The requested resource must be accessed through a proxy
306 Unused An invalid HTTP status code
307 Temporary Redirect Temporary redirect. Similar to 302. Use GET to request redirection
400 Bad Request Client request syntax error, server cannot understand
401 Unauthorized The request requires user authentication
402 Payment Required Reserved for future use
403 Forbidden The server understands the request from the requesting client, but refuses to execute the request
404 Not Found The server could not find the resource (web page) based on the client’s request. With this code, a web designer can set up a personalized page that says “the resource you requested could not be found.
405 Method Not Allowed The method in the client request is disabled
406 Not Acceptable The server could not complete the request based on the content nature of the client request
407 Proxy Authentication Required The request requires the identity of the broker, similar to the 401, but the requester should use the broker for authorization
408 Request Time-out The server waited for a request sent by the client for a long time and timed out. Procedure
409 Conflict The server may return this code after completing a PUT request from the client. A conflict occurred when the server processed the request
410 Gone The resource requested by the client does not exist. 410 differs from 404 in that if a resource previously had a 410 code that is now permanently deleted, the site designer can specify a new location for the resource through the 301 code
411 Length Required The server cannot process the content-length message sent by the client
412 Precondition Failed A prerequisite error occurred when the client requested information
413 Request Entity Too Large The request was rejected because the requested entity was too large for the server to process. To prevent continuous requests from clients, the server may close the connection. If the server is temporarily unable to process it, a retry-after response is included
414 Request-URI Too Larg The request URI is too long (usually a url) for the server to process
415 Unsupported Media Type The server could not process the media format attached to the request
416 Requested range not satisfiable The scope requested by the client is invalid
417 Expectation Failed The server cannot satisfy Expect’s request headers
500 Internal Server Error The server had an internal error and could not complete the request
501 Not Implemented The server did not support the requested functionality and could not complete the request
502 Bad Gateway A server acting as a gateway or proxy received an invalid request from a remote server
503 Service Unavailable The server is temporarily unable to process client requests due to overloading or system maintenance. The length of the delay can be included in the server’s retry-after header
504 Gateway Time-out The server acting as a gateway or proxy did not get the request from the remote server in time
505 HTTP Version not supported The server did not support the HTTP version of the request and could not complete the processing

HTTP response packet

HTTP and HTTPS

The shortage of the HTTP

  • Communications use clear text (not encryption) and the content can be eavesdropped
  • The identity of the communicating party is not verified, so it is possible to encounter camouflage
  • The integrity of the message could not be proved, so it may have been tampered with

HTTPS is introduced

HTTP has no encryption mechanism, but it can be used in combination with Secure Socket Layer (SSL) or Transport Layer Security (TLS) to encrypt HTTP traffic. Belongs to communication encryption, that is, encryption in the entire communication line.

HTTP + Encryption + Authentication + Integrity Protection = HTTP Secure (HTTPS) Code replicationCopy the code

HTTPS uses a hybrid encryption mechanism that uses both shared key encryption (symmetric) and public key encryption (asymmetric). If the key can be exchanged securely, it is possible to consider using public-key encryption only for communication. However, public key encryption is slower than shared key encryption.

Therefore, we should make full use of their respective advantages and combine a variety of methods for communication. Public key encryption is used in the stage of exchanging key, and shared key encryption is used in the stage of establishing communication exchange message.

The HTTPS handshake process is described as follows:

  1. The browser sends its own set of encryption rules to the site.

    The server gets the browser public key to copy the codeCopy the code
  2. The site selects a set of encryption and HASH algorithms and sends its identity back to the browser in the form of a certificate. The certificate contains information such as the website address, encrypted public key, and certificate authority.

    The browser gets the server's public key and copies the codeCopy the code
  3. After obtaining a web certificate, the browser does the following:

    (a). Verify the validity of the certificate (whether the authority issuing the certificate is legitimate, whether the website address contained in the certificate is consistent with the address being accessed, etc.). If the certificate is trusted, a small lock will be displayed in the browser bar, otherwise the certificate will be given a hint that it is not trusted.

    (b). If the certificate is trusted, or if the user accepts an untrusted certificate, the browser generates a random number of passwords (the key for subsequent communication) and encrypts them with the public key provided in the certificate (shared key encryption).

    (c) Use the agreed HASH to calculate the handshake message, encrypt the message with the generated random number, and finally send all the previously generated information to the website.

    Browser authentication -> Random password server public key encryption -> communication key Communication key -> serverCopy the code
  4. After the web site receives data from the browser, it does the following:

    (a). Use its own private key to decrypt the information and retrieve the password. Use the password to decrypt the handshake message sent by the browser and verify whether the HASH is consistent with that sent by the browser.

    (b). Encrypt a handshake message with a password and send it to the browser.

    The server decrypts the random password with its own private key -> decrypts the handshake message with a password (shared key communication) -> verifies that HASH is consistent with the browser (verifies the browser)Copy the code

The shortage of the HTTPS

  • The encryption and decryption process is complex, resulting in slow access
  • Encryption requires subscribers to pay certification authorities
  • Use HTTPS for requests throughout the page

Features and differences of HTTP1.0, HTTP1.1, and Http2.0

As long as the interview asks you about HTTP, this is usually the prerequisite for the interviewer.

Http1.0 features


  • Stateless: The server does not track the requested status
  • No connection: The browser establishes a TCP connection for each request

stateless

For stateless features, the cookie/session mechanism can be used for identity authentication and status recording

There is no connection

There are two types of performance resulting from no connection

  1. Unable to reuse links

    Each time a request is sent, TCP connections need to be made sequentially (i.e., three shakes and four shakes), which makes the network utilization very low

  2. Adversary block

    Http1.0 states that the next request cannot be sent until the response to the previous request arrives. If the previous request blocks, the subsequent request will also block. This is called head blocking

Http1.1 features


To address the performance shortcomings of HTTP1.0, a workaround has emerged for HTTP1.1:

  • Long connection: The Connction field is added, and the keep-alive value can be set to keep the connection open
  • Pipelining: Based on the long connection above, pipelining can continue to send subsequent requests without waiting for the first response, but the response is returned in the order requested. That is, multiple requests can be sent, but the responses are processed sequentially.
  • Cache processing: Added field cache-control
  • Breakpoint transmission

A long connection

Http1.1 maintains long connections by default. When data is transferred, keep TCP connections open and continue to transfer data over this channel

pipelining

Based on long connections:

TCP is not disconnected, using the same channel

request1> response1- > request2> response2- > request3> response3
Copy the code

Pipelined request response:

request1- > request2- > request3> response1-- > the response2-- > the response3
Copy the code

Even if the server prepares response 2 first, response 1 is returned in the order requested

Although piped, multiple requests can be sent at once, but the responses are still returned sequentially, still does not solve the problem of head blocking.

Cache handling

When a browser requests a resource, it checks whether there is a cached resource. If there is a cached resource, the browser directly obtains the cached resource and does not send another request. If there is no cached resource, the browser sends a request

Control by setting the field cache-control

Breakpoint transmission

When uploading or downloading resources, divide the resources into multiple parts and upload or download them separately. If a network fault occurs, you can continue to upload or download the resources from the places where the resources have been uploaded or downloaded, instead of starting from the beginning to improve efficiency

The two parameters that are implemented in the Header, the Range that the client sends the request and the content-range that the server responds to

Http2.0 features

  • Binary framing
  • Multiplexing: Sending requests and responses simultaneously over a shared TCP connection
  • The head of compression
  • Server push: The server can push additional resources to the client without an explicit request from the client

Binary framing

Divide all transmitted information into smaller messages and frames and encode them in binary format

multiplexing

Based on binary framing, where all access under the same domain name is routed through the same TCP connection, HTTP messages are broken up into separate frames, sent out of order, and the server reassembles the messages based on identifiers and headers

The difference between

  1. The main difference between HTTP1.0 and HTTP1.1 is the transition from no connection to long connection
  2. The main difference between Http2.0 and 1.x is multiplexing

The interview questions

Question 1: What happens when the browser enters the URL?

  1. The client connects to the Web server

An HTTP client, typically a browser, establishes a TCP socket connection with the HTTP port (403 by default) of the Web server. For example, www.baidu.com.

2. Send an HTTP request

Through the TCP socket, the client sends a text request packet to the Web server. A request packet consists of the request line, the request header, the blank line, and the request data.

3. The server accepts the request and returns an HTTP response

The Web server parses the request and locates the requested resource. The server writes the resource copy to the TCP socket, which is read by the client. A response consists of a status line, a response header, a blank line, and response data.

4. Release the TCP connection

If the connection mode is set to close, the server actively closes the TCP connection, and the client passively closes the connection to release the TCP connection. If the Connection mode is Keepalive, the connection is kept for a period of time, during which requests can be received.

5. The client browser parses THE HTML content

The client browser first parses the status line to see the status code indicating whether the request was successful. Each response header is then parsed, and the response header tells the following several bytes of HTML document and the document’s character set. The client browser reads the response data HTML, formats it according to the HTML syntax, and displays it in the browser window.

For example, enter the URL in the browser address bar and press Enter. The following process occurs:

1. The browser requests the DNS server to resolve the IP address corresponding to the domain name in the URL.

2. After the IP address is resolved, establish a TCP connection with the server based on the IP address and default port 403

3. The browser sends an HTTP request to read the file (the file following the domain name in the URL). The request packet is sent to the server as the third packet of the TCP three-way handshake.

4. The server responds to the browser request and sends the corresponding HTML text to the browser;

5. Release TCP connections.

6. The browser will display the HTML text;

Second question: since we talked about browser rendering, let’s talk about the principle and process of browser rendering web pages

The principle of

In fact, browser rendering principle as long as you understand the key rendering path

The key render path is the entire process by which the browser receives the requested HTML, CSS, JavaScript and other resources, then parses, builds the tree, renders the layout, draws, and finally renders the interface to the user

Take a look at the WebPKit flow:

To summarize the process:

  1. The browser parses the retrieved HTML document into a DOM tree
  2. The CSS markup is processed to form the cascading style sheet model CSSOM
  3. Combine DOM and CSSOM into a render tree representing the columns of objects to be rendered
  4. Each element of the render tree contains computed content, called a layout. The browser uses a streaming approach that allows all elements to be laid out in a single drawing operation
  5. Drawing the nodes of the render tree onto the screen is a step called painting
  6. Display content to a web page

In fact, the above summary will be asked a lot, because every interviewer will ask different questions, so it is best to prepare for the interview.

Let’s say a few more questions, and if you want to see them, you can summarize them by yourself:

  • What is the difference between HTTP and HTTPS? (Mentioned in the article)
  • Why is HTTPS safe? (Mentioned in the article)
  • Do you understand how symmetric and asymmetric encryption algorithms perform encryption operations? (Check by yourself)
  • This section describes the HTTPS handshake process.
  • Man-in-the-middle attack on HTTPS

The HTTP family is a very large area of knowledge. If you want to learn, you can recommend the illustrated HTTP. SAO Nian !!!!