Definition: HyperText Transfer Protoco. HyperText is marked text.

The HTTP protocol defines how a Web client requests a Web page from a Web server and how the server delivers the Web page to the client. The HTTP protocol uses a request/response model. The client sends a request packet to the server, which contains the request method, URL, protocol version, request header, and request data. The server responds with a status line containing the protocol version, success or error code, server information, response headers, and response data.

HTTP is a stateless protocol, that is, stateless protocol. The protocol itself does not save the communication status of the request or response directly, so the connected parties cannot know the current identity and status of the other party. This is also one of the important reasons for cookie technology. For the state management of the client, the browser will automatically keep cookies according to the set-cookie header field information in the response packet sent from the server, and the client will carry cookies in the packet every time it sends an HTTP request. Identifies the identity status of the failed client on the server.

What happens when you go from URL input to page presentation?

  • DNSResolution: Resolves the domain name intoIPAddress;
  • TCPConnection:TCPThree handshakes;
  • sendHTTPrequest
  • The server processes the request and returns itHTTPmessage
  • The browser parses the rendered page
  • Disconnect:TCPFour times to wave

URI

The HTTP protocol uses URIs to locate resources on the Internet:

  • URI(Universal Resource Identifier)
  • URL(Universal Resource Locator)
  • URN(Universal Resource Name)

Common status codes:

1XX: The received request is being processed (informational status code);

2xx: The request is successfully processed (success status code).

3XX: Additional action is required to complete the request (redirection status code);

4xx: The server cannot process the request (client error status code);

5xx: The server fails to process the request (server error status code).

304: indicates that the resource is cached by the browser and does not need to request the server again.

400: for parameter verification, one parameter is missing or the parameter type is wrong.

401: Insufficient permissions;

403: Resource access is prohibited (this error occurs if your IP is blacklisted)

502: The back-end service is down or under too much pressure;

Request method:

  • GET: Generally used to obtain server resources due to browser orwebThe server toURLThe length is limited, sogetRequests have size limits, the maximum length of which varies from browser to browser or Web server;
  • POST: generally used to transmit entity bodies;
  • PUT: Used to transfer files.
  • DELETE: Used to delete files.
  • HEAD: is used to obtain the packet head without returning the packet body.
  • OPTIONS: method used to ask for URI resource support;

TCP three-way handshake

  • The client sends withSYNFlag packet – one handshake – server
  • The server sends a message withSYN/ACKFlag packet – second handshake – client
  • The client sends withACKFlag packet – three-way handshake – server

TCP waved four times

  • Sending a packet to the passive party.Fin,Ack,Seq, indicating that no data is being transmitted. And into theFIN_WAIT_1state
  • The passive sends a packet.Ack, SeqTo close the request. At this point, the host initiator entersFIN_WAIT_2state
  • The passive sends a packet segment to the initiator.Fin,Ack,SeqRequest to close the connection. And into theLAST_ACK state
  • Sends a packet segment to the passive party.Ack,Seq. And then go to waitTIME_WAITState. The passive party closes the connection after receiving the packet segment from the initiator. If the initiator waits for a certain period of time and does not receive a reply, the system shuts down normally.

HTTP cache

Mandatory cache:

  1. If the requested data is already in the cache database, the client directly obtains the requested data from the cache database. If the requested data is not in the cache database, the client obtains the requested data from the server.
  2. server-responsiveheaderTo indicate that:ExpiresandCache-Control;
  3. ExpiresIs the data expiration time returned by the server. If the request time for a second request is shorter than the return time, the cached data is directly used. Pitfalls: Since server and client time may have errors, this will also lead to errors in cache hits, on the other handExpiresisHTTP1.0Product, so now mostly usedCache-ControlAlternative;
  4. Cache-ControlThere are many attributes, and different attributes can mean different things.
  • private: The client can cache;
  • public: Both client and proxy servers can cache;
  • max-age=t: Cache contents will expire in t seconds;
  • no-cache: A negotiated cache is needed to validate cached data;
  • no-store: All data is not cached;

Negotiation cache:

  1. The client will first get an id of the cache data from the cache database, and then request the server to verify whether the id is invalid. If the id is not invalid, the server will return 304. At this time, the client will directly get the requested data from the cache.
  2. The header field used by the negotiation cache:Last-ModifiedandEtag
  3. Last-Modified: When the server responds to a request, it tells the browser the last modification time of the resource, which the browser will carry when making a request againif-Modified-SinceThe server compares the request header with the last modification time of the requested resource. If the request is consistent, 304 and the response header are returned. The browser only needs to retrieve the information from the cache.defects: 1. Can only be timed in seconds. If the time can only be modified in one second, the cache will still be hit. 2. The last modification time may be updated even if the content is not changed, resulting in failure to hit the cache;
  4. Etag: When the server responds to a request, this field tells the browser the unique identifier of the current resource generated on the server. This identifier is the unique identifier of the file content. This identifier is updated only when the file content changes.defects:EtagIs calculated using algorithms, which take up server computing resources. All server resources are precious, so they are rarely usedEtag.

If the two caching mechanisms exist at the same time, the strong cache takes precedence over the negotiated cache. If the strong cache hits, the data in the cache database is directly used, and the negotiated cache is not used.

HTTP and HTTPS

  1. HTTPSThe agreement needs to apply for a certificate from CA (certificate authority). Generally, there are few free certificates, which need to pay fees.
  2. HTTPThe protocol runs onTCPAbove, all transmitted content is clear text,HTTPSRunning on theSSL/TLSAbove,SSL/TLSRunning on theTCPAbove, all transmitted content is encrypted;
  3. HTTPandHTTPSUsing a completely different connection, using a different port number, the former 80, the latter 443;
  4. HTTPIs simple, is stateless,HTTPSAgreement is madeHTTP+SSLProtocol construction can be encrypted transmission, identity authentication network protocol, can effectively prevent operators hijacking, solve a big problem of anti-hijacking, thanHTTPProtocol security;

Web Security defense:

  1. XSS: cross-site scripting attacks are usedHTMLCan execute,<script>alert('a')</script>Feature, try to inject the script into the page in the attack means.XSSThere are two kinds of attacks:
  • The script is injected into the page by modifying the browser URL, which is attacked by Chrome field defense.
  • The script code is injected into the database through the input box. Manual defense is required. It is recommended to use the whitelist filtering defense method of the ‘XSS’ library.
  1. CSRF: cross-site request forgery, fromwebImplicit authentication mechanism,webImplicit authentication can guarantee that a request is made from a user’s browser, but it cannot guarantee that the request is approved by the user.CSRFThe attack is usually resolved by the server, and the most common is the request carrying authentication information, such as a verification code ortoken.
  2. Clickjacking: A visual deception attack in which the attacker will need the site to passiframeEmbed in their own web pages, and williframeSet it to transparent, and then induce the user to operate on the page. Defense method: back-end big guy solution, use oneHTTPResponse to the head –X-Frame-options, which has three optional values:
  • deny: This page is not allowed inframeIn, nesting is not allowed even in pages of the same domain name;
  • sameorigin: The page can be in the same domain pageframeIn the display;
  • allow-from uri: The page can be displayed at the specified sourceframeIn the display;
  1. Man-in-the-middle attack: An attacker establishes a connection with both a server and a client and makes the other party think the connection is secure, but in fact the attacker controls the entire communication process. Attackers can not only gain access to communications between the two sides, but also modify them. The essence of man-in-the-middle attack is authentication and trust between client and server. Defense: Symmetric encryption, asymmetric encryption, and hybrid encryption are not effective in preventing man-in-the-middle attacks, because a man-in-the-middle can intercept the first transfer of a secret key without the client or server knowing about it.HTTPSAs the ultimate means of preventing man-in-the-middle attack, the certificate mechanism solves the trust problem between client and server, thus effectively preventing man-in-the-middle attack.