Talk about your understanding of HTTP

HTTP is a hypertext transfer protocol.

  • First, it is a computer protocol that establishes a communication specification between computers in a language that computers can understand.
  • Secondly, it is a two-way transport protocol, which determines the conventions and specifications for data transmission between the two ends, where the two ends are abstracted as servers and clients, regardless of how HTTP is transmitted at the lower level.
  • Finally, it is a hypertext transfer protocol. Hypertext is the images, audio, video, HTML pages that we perceive in our lives. HTTP defines the specification of hypertext transmission and also realizes the efficient transmission of hypermedia.

Talk about request messages

HTTP packets are written in ASCII text. It consists of a packet head, blank lines, and a packet body.

Packet header:

Request header = Request method, URL, HTTP version (request line) + Request header Response header = Status code, HTTP version (status line) + response header

Common header field:

Cache-control: controls the behavior of the Cache server;

Connection: Tells the server which field not to forward and manages persistent connections;

Date: indicates the creation time and time of the HTTP packet.

Transfer-encoding: chunked: indicates the Encoding mode used to transmit the packet body.

Request header field:

Accept: Notify the server of the media types that the user agent can handle, such as text/ HTML, image/ JPEG, etc.

Accept-charset: informs the server about the character set that the user agent can handle;

Accept-encoding: informs the server of the Encoding that the user agent can handle;

Host: tells the server which Host name + port number the requested resource is on.

User-agent: transfers browser information to the server

Response header field:

Accept-ranges: Tells the client the range of requests that the server can handle none/bytes

Entity header field:

Allow: Indicates that the HTTP method I can support fails and returns 405.

Content-encoding: Tells the client how I want to encode the Content.

Content-location: THE URI returned by the packet body and the content-Type media information of the object in the entity body.

Expires: Tells the client the expiration date of a resource. If the cache server receives a Expires response, it responds to the request with the cache until it Expires. Cookie tells the server to send a Cookie, and set-cookie sets the Cookie attribute

Content-length: Indicates the total Length of the message entity, which cannot be obtained if the main Content is transmitted in blocks.

Request method

GET: Gets resources from the server. It can be used with POST, except that GET has no request body.

POST: The entity submitted by a POST is a subordinate of the request URI, indicating that the URI can process the request entity. (PUT urIs typically represent different versions of the same resource.)

PUT: Puts the request entity on the request URI. If the client resource already exists on the server, the client resource PUT represents a new version and overrides the old version. The client will determine whether the PUT operation is created or added by the response code returned by the server (created 201, changed 200).

DELETE: DELETE method, idempotent method

Idempotent: Perform the same operation many times with the same result.

GET is definitely idempotent.

The semantics of POST are to submit data to create a resource, not idempotent (POST sends two packets with separate headers). If we need the server to think that the same request is sent twice in a row, we need to send a POST request.

PUT encapsulates the client resource under the request URI identifier, and overwrites the resource if it existed before. It is used to update data. If the data is updated for multiple times, the result is the same.

GET vs. POST

① In terms of function, GET is generally used to GET resources from the server, its semantics is to GET data; POST is generally used to update resources on the server, and its semantics are to submit data.

② From the perspective of idempotent, GET is idempotent, that is, reading the same resource always GET the same data; POST is not idempotent because each request does not necessarily change the resource identically. That is, GET does not change resources on the server, POST does.

(3) In terms of request parameters, the parameters of GET request will be attached to the URL, that is, the request data is placed in the request header of THE HTTP message; A POST request places the submitted data in the body of the HTTP request message. (Supplementary: If the GET request parameter is English/digit, it is sent as is. Otherwise, encode it as application/ X-www-form-ulencoded MIME string. Converts to + for Spaces, BASE64 encryption for Chinese/other characters)

④ In terms of security, POST requests are more secure than GET requests. Because the GET request parameters are attached to the URL in plaintext, the POST request is wrapped in the request body

⑤ In terms of the length of the request, the length of the GET request is limited by the browser/server limit on the LENGTH of the URL, allowing the amount of data to be sent is relatively small, while the SIZE of the POST request is not limited.

⑥ In terms of sending mode, generally GET requests are sent at one time. If the URI is too long, TCP also sends THE GET request packets in two times. A POST request generally generates two data packets. The browser sends the request header first, and the server responds with 100 after receiving it. The browser sends the request body after receiving 100. But with Firefox, smaller POST requests are also sent once.

⑦ In particular, GET requests are actively cached by the browser, which is harmless in the browser’s back/forward; A POST request for a back/forward is resubmitted

Status code

After the client sends a request and receives a response packet, it needs to know whether the request is correctly processed through the status code to determine what to do next

  • 1XX: The request has been received but requires further processing to complete
  • 200: The server processes the request normally.
  • 201: A new resource is successfully created on the server
  • 206: Response code when part of the response is returned using the Range protocol
  • 301: The resource has been permanently redirected. (Bookmarks will be saved again)
  • 302: Resource is temporarily redirected. (Bookmarks will not be updated)
  • 303: Redirect to another resource. Used in response to POST/PUT requests to indicate that the client should use the GET method to access the resource
  • 304: When the client has a cache that may expire, it will ask the server whether the cache can be reused with information such as the cache id, while 304 tells the client that the cache can be reused (even if the cache has expired). 304 is generally used for cache redirection. The request header contains an if-modified-since condition indicating that the client has a cache and the resource has not been modified. The server receives this condition for cache control (determining whether the cache needs to be invalidated, etc.).
  • 400: A syntax error occurs in the client request packet
  • 401: The sent request must be authenticated through HTTP
  • 403: The client’s access to the requested resource is denied by the server. The access permission may be incorrect.
  • 404: Requested resource not found on server
  • 405: The server does not support the method method in the request line
  • 500: An error occurred during server execution
  • 502 Bad Gateway: The proxy server fails to connect to the source server or obtain valid responses from the source server.
  • 503: Server resources are not ready to handle the current request
  • 504 Gateway Timeout: A Timeout occurs between the proxy server and the source server

A blank line

Message body

Request body:

If the method is GET, there is no request body.

If the method is POST, key-value pairs are separated by semicolons (;), key-value pairs +& (;), raw format is submitted in JSON format, and XML format can be sent.

Response body:

HTML source files, transferred data

The HTTP protocol does not specify that the packet body message should be split and sent when the data is too long. However, the server generally has a threshold. If the data exceeds the threshold, the packet will be split and sent.

HTTP1.0/1.1/2.0

HTTP / 1.0:

① In addition to the basic request line, HTTP/1.0 also has the KV structure of the request header, response header information, according to the request header and response body negotiation to complete a variety of different types of data transmission (such as the desired file type, encoding type, language, compression format).

②HTTP/1.0 implements many features through request headers and response bodies:

Status code: The server tells the browser about its request processing

Cache mechanism: Caches downloaded data

User Agent: The user agent field in the request header collects basic information about the client.

HTTP / 1.1:

Improvements over HTTP/1.0:

①HTTP1.0 must establish a TCP connection for each communication. HTTP/1.1 implements long connections, which reduce the additional burden on the server by reducing the number of TCP connections established and disconnected. Long connections are enabled by default in HTTP/1.1. You can add Connection:close to the request header to disable long connections.

② Each domain name in HTTP1.0 has a unique IP address corresponding to it. However, with the development of technology, it is necessary to bind one host to multiple virtual hosts, that is, multiple domain names share one IP address. The Host field is added to the HTTP1.1 request header to represent the current domain name.

③ The content-Length of the HTTP1.0 response header tells the browser the size of the file to receive. But as technology evolves and many pages are generated dynamically, browsers don’t know when they’ll be able to receive all the file data. So HTTP/1.1 introduced Chunked chunking encoding, where the server would slice up the data and send it with the length of the previous chunk, allowing the browser to dynamically receive the data.

④ Client Cookie mechanism

⑤ Safety mechanism

HTTP / 2.0:

① In HTTP/1.1, the packet header carries many header fields, which wastes a lot of bandwidth. So HTTP/2.0 compresses the header, uses the HPACK algorithm, and sets up dictionaries on both sides of the client and server, using index numbers to represent repeated characters.

② In HTTP/1.1, Chunked block encoding is adopted. HTTP/2.0 directly separates the header from the message text into multiple binary frames for transmission, which is small in size and fast.

(3) Because HTTP/2.0 uses binary frame transmission, HTTP/2.0 defines the concept of stream. A stream is a bi-directional sequence of binary frames, with each frame carrying a unique stream ID. So HTTP/2.0 can multiplex TCP, and when the frames arrive at the peer end through the stream, the peer end groups the frames according to the stream ID and assembles them into request and response packets. Avoid HTTP/1.1 queue header blocking (a TCP connection can only handle one request at a time, and other requests are blocked). In HTTP/2.0, the server can also create streams and actively send them to the client. It can push JS and CSS to the browser in advance to speed up the response.

(4) HTTP/2.0 can multiplex TCP, resulting in the transmission of multiple incomplete frames in a TCP connection at the same time, which may cause the flow control of THE TCP connection, resulting in Stream blocking, affecting the amount of concurrency. So HTTP/2.0 provides application-level flow control: it flows through window_UPDATE frames. The sender and receiver set independent flow control. When the flow reaches the set upper limit, one end will send instructions to inform the peer end, and only Data frames obey flow control.

⑤ In HTTP/1.1, the slow start of TCP and the fixed bandwidth competition between TCP connections lead to the decrease of bandwidth utilization. Therefore, HTTP/2.0 limits data transfer to one TCP long connection per domain name, which means that downloading resources on a page requires only one TCP slow start, and avoids the problem of multiple TCP connections competing on a page.

⑥HTTP/2.0 uses the HTTPS protocol name, that is, runs on TLS, to ensure secure transmission.

Connect related

Short connection:

The client establishes a connection with the server before sending a request and closes the connection immediately after receiving the response. Application scenario: Web site? The number of clients is too large, resulting in a large number of long connections occupying a large number of resources

HTTP long connection:

When a client sends a request to the server, it establishes a TCP long connection, which is reused for subsequent requests. Set Connection:keep-alive To enable the current TCP long Connection. HTTP/1.1 long Connection is enabled by default. You need to set the timeout period for HTTP long connections to prevent excessive idle long connections from consuming server resources. Keep-alive :timeout XXX and Connection:close can be set

Application scenario: Frequent point-to-point communication with a small number of connections, such as database connections.

Long TCP connection

Keepalive keeps TCP alive through heartbeat packets. The client periodically sends heartbeat packets. The server responds to heartbeat packets received from the client. If the client does not receive data packets from the server within a certain period of time, the TCP connection is disconnected.

The KEEPalive mechanism of TCP ensures the survival of long TCP connections through heartbeat detection and prevents a large number of idle TCP connections from occupying the memory.

The KEEP-alive mechanism of HTTP is used to manage TCP long connections, so that the supported TCP long connections can survive longer. (Of course, the HTTP Connection is set to keep-alive to enable the long Connection.)

Browsers can maintain up to six TCP long connections per domain name.

The default TCP heartbeat interval is 2 hours.

other

Talk about forward and redirect

Forward and Redirect are both request forwarding.

Forward is a direct forwarder. A server forwards the request directly to another server. The X-Forwarded-For HTTP header is used to Forward the client IP address to multiple proxy servers.

Redirect is indirect forwarding, where the server sends a message back to the browser asking it to re-request another server. After forwarding, the browser address changes and its data is not shared between the two requests.

Redirection process: When the browser receives the redirection response code, it reads the value of the response Location header, obtains the new URI, and then redirects to the page

Talk about Web caching

A Web cache, also known as a proxy server, caches the server’s response to the client for next access. If the browser is configured to preferentially use the cache, the browser sends an HTTP request to the Web cache first. If the Web cache has a copy of an object, the Web cache directly returns an HTTP response. If there is no copy of the object, the Web cache sends an HTTP request to the server, receives the response, copies it, and then sends it to the client. It can greatly reduce the response time of client requests. CDN is a Web caching technology. (The cache structure is key-URL, and keys are stored by red-black tree, hash table and other dictionaries.)

If the cache expires, the server returns a response code 304, indicating that the client can continue to use the cache despite its expiration.

Let’s talk about cookies and sessions

HTTP is a stateless protocol. In order to save data for multiple requests, cookies and sessions are used to realize stateful HTTP requests.

The first time a client requests the server, if the server needs to record the user status, the server stores the user information in the session and issues a cookie to the client when it responds. The client saves the cookie in the browser and carries the cookie when it requests the website again. The server checks the consistency between the user status in the cookie and the session.

Cookies are generated by the server and stored in the browser. After the server generates the Cookie, it informs the client through the set-cookie header in the response. After the client obtains the Cookie, the Cookie header will be automatically placed in the request header in subsequent requests

Cookies are not cross-domain. Because cookies are saved based on browsers, they cannot be accessed across domains due to different domain names.

The data types that cookie can contain: String name, other data, int expiration time, version number, etc.

If cookies are disabled, sessions are still available. We can provide the server with a sessionID via get and POST request parameters.

HTTP transfers large files

Content-encoding Specifies the Encoding format, but in fact, multimedia data such as images, audio and video are already highly compressed, so this method is only effective for text files.

(2) Block transfer: the file is decomposed into a number of small pieces, the peer end received small pieces and then assembled and restored. In this way, neither the browser nor the client needs to store the entire file in memory, and only send and receive a small chunk at a time without consuming too much network bandwidth. Each partition contains a length header, which stores the length in hexadecimal, and a data block, both ending in CRLF (/r/n).

③ Range request: The running client uses the Range field in the request header to indicate that only a part of the file is fetched.

MIME=multipart/byteranges, the request header boundary= XXX indicates the boundary mark between segments.

Package frame flow

The frame format:

StreamID implements multiplexing, in which multiple threads are sent simultaneously and frames are transmitted interspersed, assembled according to streamID. But frames within the same Stream must be ordered, so concurrency cannot be achieved.

Length: frames smaller than 16KB are supported when the frame length is up to 2 ^ 14 -1. When the frame length is up to 2 ^ 14 -1, the receiver must first publish the size it can handle.

Type: includes HTTP packet body frame, HTTP header frame, Stream priority frame, termination Stream frame, PING heartbeat detection frame, flow control frame, large HTTP header frame, and notifying the receiver of some information frame

Stream priority: Use priority frames to indicate the priority of a Stream. A frame has a stream dependency field and a Weight field represents the data stream priority.

Packet: When the sender can determine the full Length of the packet, use the Content-Length header to explicitly specify the Length of the packet. If not, use the transfer-Encoding header to specify the Chunk transmission mode. A chunk = chunk-size hexadecimal code + chunk-data binary code

Multithreading breakpoint continuation, random vod

HTTP Range: The running server sends only part of the response package to the client based on the client’s request. After sending, the client automatically combines the package of multiple fragments into a complete and larger package (request header Range sends the current length, The server responds 206, and the content-range header shows the position of the current fragment package within the complete package.)

Cross domain

Cross-domain is actually the result of the same origin policy, that is, the browser intercepts the response data of any protocol, domain name, or port that is different from it. Without the same origin policy, the response received by A user visiting site A may automatically go to site B. That is, the request cannot be voluntarily sent by the user.

CORS solution: ① When a client sends a simple request (GET /head/ POST), the server must be informed of the source of the request site. The response header sent by the server is configured to carry access-Control-Allow-Origin: XXX. ② Use access-Control-request-xxx to precheck the Request header and access-Control-allow-xxx to precheck the response header.

CSRF attack: ① Cross-domain request forgery attack, the user’s form submission is not restricted by cross-domain, so the user’s cookie may be attacked. ② Attackers fake form pseudo site, we can verify pseudo site through server Token.