An overview of HTTP

HTTP is a client/server (C/S) based architectural model that exchanges information over a reliable link and is a stateless request/response protocol.

HTTP is a protocol for retrieving web resources such as HTML. It is the basis for data exchange on the Web and is a client-server protocol, meaning that requests are usually initiated by a recipient such as a browser.

Clients and servers interact by exchanging their own messages (as opposed to data flows). Messages sent by a client such as a browser are called requests, and messages responded to by a server are called responses.

1. Basic properties

  • HTTP is connectionless:Connectionless means to limit processing to one request per connection. The server disconnects from the customer after processing the request and receiving the reply from the customer. In this way, transmission time can be saved. A connection is controlled by the transport layer, which is fundamentally outside the scope of HTTP. HTTP does not need its underlying transport layer protocol to be connection-oriented, only that it is reliable or does not lose messages (or at least return errors).
  • HTTP is extensible:The advent of HTTP Headers in HTTP/1.0 made protocol extension very easy. As long as the server and client agree on the semantics of the new Headers, new functionality can be easily added.
  • HTTP is stateless:HTTP is a stateless protocol. Stateless means that the protocol has no memory for transaction processing. The lack of state means that if the previous information is needed for subsequent processing, it must be retransmitted, which can result in an increase in the amount of data transferred per connection. On the other hand, the server responds faster when it doesn’t need the previous information. Using Cookies, you can create stateful sessions.

2. HTTP message structure

1. Request packets

An HTTP request packet consists of four parts: request line, request header, blank line, and request data.Copy the code

2. Respond to the packet

The HTTP response also consists of four parts: the status line, the message header, the blank line, and the response body.Copy the code

HTTP request method

According to the HTTP standard, HTTP requests can use multiple request methods.

HTTP1.0 defines three request methods: GET, POST, and HEAD.

HTTP1.1 adds five new request methods: OPTIONS, PUT, DELETE, TRACE, and CONNECT.

methods describe
GET Requests the resource identified by the request-URI.
HEAD Similar to a GET request, except that there is no concrete content in the response returned, which is used to retrieve the header.
POST Submit data to a specified resource to process the request (for example, submit a form or upload a file), and the data is contained in the request body. POST requests may result in the creation of new resources and/or the modification of existing resources.
PUT Data transferred from the client to the server replaces the contents of the specified document.
DELETE Requests the server to remove the resource identified by request-URI.
CONNECT The HTTP/1.1 protocol is reserved for proxy servers that can change connections to pipes. Data is encrypted using SSL and TLS and transmitted over network tunnels.
OPTIONS Causes the server to return all HTTP request methods supported by the resource. Use * instead of the resource name to send an OPTIONS request to the Web server to test whether the server functions properly.
TRACE The command output displays the requests received by the server for testing or diagnosis.

Among them, the most common methods are GET and POST. For RESful interfaces, PUT, DELETE, GET, and POST are commonly used (corresponding to adding, deleting, checking, and changing, respectively).

HTTP header

It is mainly divided into general header, request header, response header and entity header:

  • General header field: it can appear in both the request message and the response message. It provides the most basic information related to the message.
  • Request header field: the header field used when sending a request packet from the client to the server, which supplements the additional information of the request, client information, and priority of the response content.
  • Response header field: the header field used when the server returns the response packet to the client. Additional content added to the response also requires the client to attach additional content information;
  • Entity header field: The header used in the entity part of the request message and response message, which supplements entity-related information such as the update time of the resource content.
  • Other packet fields: These fields are not defined in HTTP, but are widely used in HTTP requests.

1. Generic header field

Header field name instructions
Cache-Control Control cache behavior.
Connection Manages persistent connections. Set its value to keep-alive to implement persistent connections.
Date The date and time when the HTTP packet is created.
Pragma Historical legacy fields prior to Http/1.1, defined only as Http/ 1.0 backward compatibility, although generic, are commonly used in client requests, such as Pragma: no-cache, indicating that the client does not return cached data during the request.
Trailer The header of the packet tail.
Transfer-Encoding Specifies the Transfer code used to transmit the subject of the message, such as transfer-encoding: chunked.
Upgrade Used to check whether a higher version of HTTP or other protocols is available.
Via Tracing the transmission path of packets between the client and server can also avoid loops. Therefore, this field must be added when packets pass through the proxy.
Warning The Http/1.1 message field, evolved from Http/1.0 AfterRetry, is used to inform users of cache-related warnings.

2. Request header field

Header field name instructions
Accept Media types that the client can handle
Accept-Charset Represents the character set supported by the client. For example, accept-Charset: GB2312, ISO-8859-1
Accept-Encoding Represents the content encoding format supported by the client. Accept-encoding: gzip
Accept-Language Represents the language supported by the client. Accept-language: zh-cn, en
Authorization Indicates the authentication information of the client. When the client accesses something that requires authentication, the server returns 401. After the client adds authentication information into the Authorization field and sends it to the server, 200 is returned if the authentication succeeds
Host Indicates the name of the host that accesses the resource, that is, the domain name part of the URL. Such as: m.baidu.com
If-Match The value of if-match matches the ETag value of the requested resource (entity tag, associated with the resource. The server processes the request only when the resource changes and the entity token changes with it
If-Modified-Since Used to verify the timeliness of local resources owned by the client
If-None-Match The server processes the request only when the if-match value is inconsistent with the ETag value of the requested resource
If-Range When the value of if-range (ETag value or time) is consistent with the ETag value or time of the resource being accessed, the server processes the request and returns the specified Range of data set in the Range field. If not, all is returned. If-range is an upgraded version of if-match. If the if-match values do not Match, data is still returned. If the if-match values do not Match, the request is not processed and data needs to be requested again
If-Unmodified-Since In contrast to if-modified-since, the request is processed only If the requested resource has not changed after the specified time, otherwise 412 is returned
Max-Forwards Max-forwards specifies the maximum number of servers for which requests can be forwarded. After the request is Forwards, max-forwards decreases by 1. When max-forwards reaches 0, the server Forwards Forwards and responds directly. You can use this field to locate communication problems
Proxy-Authorization When the client receives an authentication challenge from the Proxy server, the client adds authentication information to proxy-authorization to complete authentication. Authorization is similar to Authorization, except that Authorization occurs between client and server
Range Get some resources, such as: Range: bytes = 500-1000 indicates the access to the specified resource content between 1000 to 500 bytes, if the server can correctly handle, 206 as a response is returned, said to return some of the data, if you can’t handle this request, in 200 as a reply, returns the complete data
Referer Tell the server from which page the request originated
User-Agent Information such as the browser and proxy name that initiated the request is sent to the server
Cookie Cookies are added at request time to enable HTTP state logging

3. Response header field

Header field name instructions
Accept-Ranges Whether to accept a byte range.
Age The server tells the client how long ago, in seconds, the source server (not the cache server) created the response.
ETag An identity of an entity resource that can be used to request a specified resource.
Location The new location of the requested resource.
Proxy-Authenticate The authentication information required by the proxy server is sent to the client.
Retry-After The server tells the client how long it will take to try again. This is typically used with 503 and 3XX redirected replies.
Server Tells the server about the HTTP server application currently in use.
vary Proxy server cache management information.
WWW-Authenticate Inform the client of the authentication scheme applicable to the resource being accessed, such as Basic or Digest. The 401 must have a WWW-authenticate field in its response.
Set-Cookie The server passes Cookie information to the client through this field.

4. Entity header field

Header field name instructions
Allow Notifies the client of the request method supported by the server.
Content-Encoding Tells the client that the server encodes the content of the resource.
Content-Language Inform the client of the natural language used by the resource.
Content-Length Tells the client the length of the resource
Content-Location Tells the client where the resource is located.
Content-Type Indicates the media type of the resource to the client, the same value as Accept in the request header field.
Expires Inform the client of the expiration date of the resource. Can be used for caching processing.
Last-Modified Tells the client when the resource was last modified.

5. Other packet fields

X-frame-options: the header field x-frame-options is used in the HTTP response header to control the display of Web content within the Frame tag of other Web sites, mainly to prevent clickjacking attacks

X-xss-protection: header field X-xss-protection is a countermeasure against cross-site scripting attacks in the HTTP response header. It is used to enable or disable the browser XSS Protection mechanism

DNT(Do Not Track) : Refuse to have personal information collected, a way of refusing to be tracked by precise advertising

HTTP status return code

The status code is responsible for indicating the return result of the client request, marking the health of the server, and notifying the error.

Status code category Classification description
1XX Informational (Informational status code) The request is being processed
2XX Success (Success Status code) Request processed successfully
3XX Redirection (Redirection status code) You need to redirect
4XX Client Error (Client Error status code) The server cannot process the request
5XX Server Error The server failed to process the request


1. Information response

Status code Short sentences meaning
100 Continue Continue, the client should continue with its request
101 Switching Protocols You can only switch to a more advanced protocol


2. Successful response

Status code Short sentences meaning
200 OK The request succeeded. This is usually used for GET and POST requests
201 Created Created, successfully requested and created a new resource
202 Accepted Accepted, the request has been accepted, but processing is not complete


3. The redirection

Status code Short sentences meaning
300 Multiple Choices Multiple options, the requested resource can include multiple locations
301 Moved Permanently A permanent move
302 Found Temporary move, GET or HEAD request
303 See Other View other addresses, similar to 302. You need to use the GET request to view
304 Not Modified Unmodified. When the server returns this status code, no resources are returned
307 Temporary Redirect A temporary redirect should not change the request method


4. The client is faulty

Status code Short sentences meaning
400 Bad Request Client request syntax error, server cannot understand
401 Unauthorized The request requires user authentication
402 Payment Required Reserved for future use
403 Forbidden The server understands the request from the requesting client, but refuses to execute the request
404 Not Found The server cannot find the resource based on the client’s request (web page)
405 Method Not Allowed The method in the client request is disabled


5. The server is faulty

Status code Short sentences meaning
500 Internal Server Error The server had an internal error and could not complete the request
501 Not Implemented The server did not support the requested functionality and could not complete the request
502 Bad Gateway Received an invalid response from the remote server
503 Service Unavailable The server is temporarily unable to process client requests
504 Gateway Time-out Failed to get the request from the remote server in time
505 HTTP Version not supported The server did not support the HTTP version of the request and could not complete the processing


5. HTTP content types

The content-Type of a web page defines the Type of a web file and the code of the web page, and determines the format and encoding in which the browser will read the file.

Common media types: Text files: Text/HTML, text/plain, Text/CSS, Application/XML Image files: IAMGE/JPEG, image/ GIF, image/ PNG; Video files: Binary files used by the video/ MPEG application: application/ OCtet-stream, application/zip

Common content encoding: Gzip: the encoding format generated by the file compression program gzip; Compress: An encoding format generated by the Unix compress program. Deflate: An encoding format generated using a combination of zlib and deflate compression algorithms; Identity: Default encoding format that does not perform compression.

Six, HTTP cookies

Cookies are mainly used for the following three aspects:

  • Session state management (such as user login status, shopping cart, game score, or other information that needs to be logged)
  • Personalization (such as user-defined Settings, themes, etc.)
  • Browser behavior tracking (e.g. tracking and analyzing user behavior, etc.)

Cookies were once used to store data on the client side. As there was no other suitable storage method at that time, cookies were used as the only storage method. However, as modern browsers began to support a variety of storage methods, cookies gradually fell into disuse. Since the Cookie is specified by the server, each browser request carries the Cookie data, incurring additional performance overhead (especially in a mobile environment). New browser apis already allow developers to store data directly locally, such as using the Web Storage API (local and session storage) or IndexedDB.


Set-Cookie: id=a3fWa; Expires=Wed, 21 Oct 2015 07:28:00 GMT; Secure; HttpOnly

(1) Session Cookie will be automatically deleted after the browser is closed, that is, it is only valid during the session.

(2) Persistent cookies are different from session cookies that expire after closing the browser. Persistent cookies can specify a specific expiration time (Expires) or max-age.

(3) Secure and HttpOnly

  • Marked asSecureCookies should only be sent to the server through requests encrypted by the HTTPS protocol.

  • To avoid cross-domain scripting (XSS) attacks, the JavaScript document. cookie API cannot be accessed withHttpOnlyFlagged cookies, which should only be sent to the server.

The Domain and Path identifiers define the scope of the Cookie: that is, which URLS the Cookie should be sent to.

HTTP/1.x connection management

Connection management is a key topic of HTTP: opening and maintaining connections greatly affects the performance of websites and Web applications. There are several models in HTTP/1.x: short connections, long connections, and HTTP pipelining.

The earliest model of short connection HTTP, and the default model for HTTP/1.0, was short connection. Each HTTP request is completed by its own independent connection; This means that every HTTP request is preceded by a TCP handshake, which is continuous.

Long Connection A long connection is kept for a period of time and is repeatedly used to send a series of requests. This saves the time of creating a TCP connection and enhances TCP performance. Of course, the connection does not stay open: the connection is closed after being idle for some time (the server can specify a minimum connection hold time using the keep-alive header).

HTTP pipelining HTTP pipelining is not enabled by default in modern browsers. The HTTP/2 pipelining has been replaced by better algorithms such as multiplexing


Domain name subdivision

Unless you have an urgent and pressing need, don’t use this outdated technology and just upgrade to HTTP/2. In HTTP/2, domain sharding is unnecessary: HTTP/2 connections handle concurrent non-priority requests very well. Domain name sharding can even affect performance. Most HTTP/2 implementations also attempt to merge sharded domain names using a technique called connection aggregation.

If the server wants to respond more quickly to web site or application replies, it can force the client to make more connections.

Browsers have concurrency limits to prevent Dos/DDoS attacks.

For example, instead of getting all resources under the same domain name, let’s say we have a domain name called www.example.com and we can split it into several domains: Www1.example.com, www2.example.com All these domain names point to the same server, and the browser establishes multiple connections for each domain name.

This technique is called domain sharding

Domain name convergence is to put static resources under a domain name to reduce the cost of DNS resolution.


HTTP caching

Reusing acquired resources can effectively improve the performance of websites and applications. Web caching reduces latency and network congestion, which in turn reduces the time it takes to display a resource. With HTTP caching, Web sites become more responsive.

1. Various types of caches

Caching is a technique for saving a copy of a resource and using it directly on the next request. When the Web cache discovers that the requested resource has been stored, it intercepts the request and returns a copy of the resource rather than going to the source server to download it again.

  • There are many types of caches, which can be roughly grouped into two categories:Private and shared caches. Responses stored in a shared cache can be used by multiple users, and private caches can only be used by individual users.

The browser cache is mainly introduced in this section. In addition, proxy cache, gateway cache, CDN, reverse proxy cache, and load balancer are deployed on servers to provide better stability, performance, and scalability for sites and Web applications.

Common HTTP caches store only GET responses, not other types of responses.

2. Cache rules

In order to facilitate understanding, we believe that the browser is a cache database, to store the cache information (actually a static resource is cached in memory and disk), the browser request data for the first time, at this time no corresponding cached data cache database, you will need to request to the server, the server caching rules and data will be returned, The browser stores the cache rules and data into the cache database.

We can divide it into two main categories: strong cache and negotiated cache

2.1 strong cache

If the browser determines that the local cache is not expired, it can use it directly without sending HTTP requests (200 from memory/disk cache).

The HTTP 1.0

The response header field used by the server is Expires, and the value is a future absolute time (timestamp). If the current time of the browser request exceeds the time set by Expires, the cache is invalid and the request needs to be sent to the server again, otherwise the data will be retrieved directly from the cache database.


The HTTP 1.1

Cache-control is the most important rule and is private by default.

Private Private cache Public Shared cache max-age The contents of the cache will expire in XXX seconds no-cache Comparison cache is used to verify the cached data no-store All contents will not be cached, strong cache, and negotiated cache will not be triggered

Note: In HTTP 1.0,ExpiresThe absolute time of the field is obtained from the server. Since the request takes time, there is an error between the browser request time and the server receiving the request, which also leads to the error of the cache hit. In HTTP 1.1, becauseCache-ControlThe value of themax-age=xxxXXX is the relative time in seconds, so the browser starts to countdown after receiving the resource, avoiding the error of cache hit in HTTP 1.0. In order to be compatible with lower versions of HTTP protocol, two response headers will be used at the same time in normal development.HTTP 1.1 has a higher implementation priority than HTTP 1.0.


2.2 Negotiated Cache

The first time the browser requests data, the server returns the cache id along with the data to the client, which backs up both to the cache database. When requesting data again, the client sends the backup cache ID to the server. The server checks the backup cache ID. After the check succeeds, the server returns the 304 status code to inform the client that the backup data is available.

The HTTP 1.0

  • If-Modified-Since/Last-ModifiedThese two come in pairs and belong to the negotiated cache, where the browser header isIf-Modified-Since, and the server side isLast-ModifiedIf if-modified-since matches last-Modified, the server resource has not changed. Therefore, the server does not return the resource entity, but only the header, notifying the browser that the local cache is available. Last-modified, as the name suggests, refers to the time when a file was Last Modified, and onlyAccurate to within 1s.

The HTTP 1.1

  • If-None-Match/E-tagThese two come in pairs and belong to the negotiated cache, where the browser header isIf-None-Match, and the server side isE-tagSimilarly, after a request is made, If if-none-match and e-tag Match, the content has not changed, notifying the browser to use the local cache. Unlike last-Modified, e-tag is more accurate. It is something like a fingerprint, based onFileEtag INode Mtime SizeGenerated. Whenever the file changes, the fingerprint changes, andThere is no limit to 1s accuracy.



In order to make the caching policy is more robust, flexible, HTTP 1.0 and HTTP 1.1 caching strategies will be used at the same time, and even forced to cache cache and consultation will be used at the same time, for mandatory cache, the server browser notify a cache time, in the cache time, the next request, direct use of caching, beyond the valid time, Implement the negotiated cache policy. For the negotiated cache, Etag and last-Modified in the cache information are sent to the server through if-none-match and if-Modified-since request headers, and the server verifies and sets the new mandatory cache. If the 304 status code is returned, the browser directly uses the cache. If the negotiation cache fails, the server resets the negotiation cache identifier.

Response with the Vary header

Vary HTTP response headers determine how to decide whether to request a new resource or use a cached file for subsequent headers.

When the cache server receives a request, the cached response can only be used if the current request and the original (cached) request header match the Vary in the cached response header.

Using the Vary header facilitates dynamic diversity of content services. For example, with the Vary: user-Agent header, the cache server needs the UA to determine whether to use the cached page. If you need to differentiate between mobile and desktop displays, this way you can avoid displaying the wrong layout on different terminals. In addition, it can help Google or other search engines better discover mobile versions of pages and tell search engines that Cloaking is not being introduced.


Reference article:

  • Thoroughly understand HTTP caching mechanisms and principles
  • One article on the HTTP protocol is enough
  • MDN Web Docs