Introduction to the

  • Hypertext Transfer Protocol (HTTP) is a simple request-response Protocol that usually runs on top of TCP.
  • HTTP is an application layer protocol.
  • HTTP is a stateless protocol, meaning that the server does not retain any state from transactions with clients.

preface

  • The Internet protocol stack has five layers:
    1. Application layer: The main protocols include HTTP, FTP, Telnet, SMTP, and POP3.
    2. Transport layer: TCP and user datagram protocol UDP.
    3. The network layer is responsible for sending datagrams independently from the source to the home, mainly solving the problems of routing, congestion control and network interconnection.
    4. Data link layer: it is responsible for encapsulating IP datagrams into frame formats suitable for transmission on the physical network and transmitting them, or unencapsulating frames received from the physical network and taking out IP datagrams to the network layer.
    5. Physical layer: responsible for the transmission of bitstreams between nodes, that is, responsible for physical transmission. The protocol of this layer is related to both the link and the transmission medium
  • OSI model consists of seven layers:
    1. The application layer refers to the network operating system and specific applications, corresponding to WWW server, FTP server and other application software
    2. Presentation layer data syntax conversion, data transmission and so on
    3. The session layer establishes the session relationship between the two ends and is responsible for data transmission
    4. The transport layer is responsible for error checking and fixing to ensure the quality of the transmission and is where TCP works. (a message)
    5. The network layer provides the addressing scheme, where the IP protocol works (packets)
    6. The data link layer wraps raw bit data from the physical layer into data frames
    7. The physical layer corresponds to physical devices (bits) such as network cables, network adapters, and interfaces.

history

  1. HTTP / 0.9 1991
  • The server closes the TCP connection after sending the packet.
  • 0.9 Protocol is an unordered protocol for exchanging information, which is limited to words
  1. HTTP / 1.0 1996
  • The main drawback of HTTP/1.0 is that only one request can be sent per TCP connection. Once the data is sent, the connection is closed, and if additional resources are requested, a new connection must be created. TCP is expensive to create. As a result, HTTP 1.0 has poor performance. The more external resources a web page loads, the more this problem becomes.
  • Connection: keep-aliveResolve, reuse TCP until the client or server actively closes the connection. However, this is not a standard field and may not behave consistently across implementations, so it is not a fundamental solution.
  1. HTTP / 1.1 1997
  • It further refines the HTTP protocol by introducing pipelines that allow multiple requests to be made
  • Connection: keep-aliveNot off by default,Connection: closeClosed. Currently, most browsers allow up to six persistent connections to the same domain name.
  1. HTTP/2
  • Google developed its own update, which is not called HTTP/2.0, because the standards committee is not going to release a sub-version.
  • binaryThe HTTP/1.1 header is definitely text (ASCII encoded), and the data body can be either text or binary. HTTP/2 is a completely binary protocol. Headers and data bodies are binary and collectively referred to as “frames” : header and data frames.
  • multiplexingHTTP/2 reuses TCP connections. For example, in A TCP connection, the server receives both A request and B request, and responds to A request first. Send the rest of the A request. Such two-way, real-time communication is called Multiplexing.
  • The head of compressionGzip, COMPRESS, and header index improve header reuse.
  • The data flowA frame is the minimum unit that identifies which stream, which can be used to cancel half of the data stream sent to it, while keeping the TCP connection open.
  • Server pushThe server actively pushes such static resources to the client.

Issues related to

Front-end optimization of HTTP

  • Reduce the static resource file size
  • Content-Encoding: gzip
  • Using preloading: Preloading requires knowledge of preload and prefetch.
  • Use defer and Async wisely
  • Using the cache
  • Use the CDN

Why does TCP need three handshakes, not two?

  • First handshake: The client sends a network packet and the server receives it. In this way, the server can conclude that the sending capability of the client and the receiving capability of the server are normal.
  • Second handshake: The server sends the packet and the client receives it. In this way, the client can conclude that the receiving and sending capabilities of the server and the client are normal. However, the server cannot confirm whether the client’s reception capability is normal.
  • Third handshake: The client sends the packet and the server receives it. In this way, the server can conclude that the receiving and sending capabilities of the client are normal, and the sending and receiving capabilities of the server are also normal.
  • Therefore, three handshakes are required to confirm the normal receiving and sending capabilities of both parties.
  • If it’s two handshakes

For example, if the client sends a connection request but does not receive any confirmation because the connection request packet is lost, the client retransmits the connection request. Confirmation was received and a connection was established. After data transmission is completed, the connection is released. The client sends two connection request message segments. The first one is lost, and the second reaches the server. The server for the client and send a new connection request, and then send a confirmation message to the client, agreed to establish a connection, do not use three-way handshake, as long as the server send confirmation, to establish a new connection, confirmation letter from the client to ignore the service side, at this time also not send data, consistent service side waiting for the client to send data, waste of resources.

  • The first handshake, half line up; A fully connected queue is one that has completed three handshakes.
  • On the third handshake, you can carry data. However, the first and second handshakes cannot carry data

###ISN(Initial Sequence Number)

  • When one end sends its SYN to establish a connection, it selects an initial sequence number for the connection. The ISN changes over time, so each connection will have a different ISN. The ISN can be thought of as a 32-bit counter incremented by 1 every 4ms. The purpose of this selection is to prevent a packet delayed in the network from being transmitted later, resulting in a misinterpretation of it by one of the connected parties.

What is a SYN attack?

  • Resource allocation on the server side is made during the second handshake, while resources on the client side are made when the third handshake is completed, so the server is vulnerable to SYN flooding attacks. In a SYN attack, the Client forges a large number of nonexistent IP addresses and sends SYN packets to the Server. The Server replies with an acknowledgement packet and waits for the Client to confirm the attack. Because the source IP address does not exist, the Server resends the packets until the attack times out. These forged SYN packets occupy unconnected queues for a long time. As a result, normal SYN requests are discarded because the queues are full, causing network congestion or even system breakdown. SYN attack is a typical DoS/DDoS attack.
  • SYN attacks are very easy to detect. When you see a lot of semi-connected states on the server, especially if the source IP address is random, you can basically tell that this is a SYN attack. On Linux/Unix, you can use the netstats command to detect SYN attacks.
  • netstat -n -p TCP | grep SYN_RECV

Common SYN attack defense methods are as follows:

  • Shorten SYN Timeout
  • Increase the maximum number of connections
  • Filtering gateway Protection
  • The SYN cookies technology

Why do you need four waves?

  • After receiving a SYN request packet from a client, the server sends a SYN+ACK packet. ACK packets are used for reply, and SYN packets are used for synchronization. However, when the server receives a FIN packet, it may not close the SOCKET immediately. Therefore, the server can only reply with an ACK packet to tell the client, “I received the FIN packet you sent.” I can send FIN packets only after all packets on the server are sent. Therefore, THE FIN packets cannot be sent together. So you need four waves.

CSRF

  • A CSRF attack is when a hacker takes advantage of a user’s login status and uses a third party’s site to do something bad.

HTTPS process

  • HTTPS is done in conjunction with HTTP and SSL/TLS.
  • HTTPS uses both symmetric and asymmetric encryption to ensure security and efficiency.
  • Data is transmitted with symmetric encryption. In the process of symmetric encryption, a key of the client is required. To ensure that the key can be securely transmitted to the server, asymmetric encryption is used to encrypt the key.
  • In general, symmetric encryption is performed on data, and the keys used for symmetric encryption are transmitted through asymmetric encryption.
  • Why symmetric encryption?
    • Asymmetric encryption is based on large numbers, such as large prime numbers or elliptic curves. It is a complex mathematical problem, so it consumes a lot of computation and is slow.
    • In addition to being slow, a possible disadvantage is that more bits are required, and symmetric keys of the same strength are shorter than asymmetric keys.
    • Symmetric keys are 128 bits, 256 bits, and RSA is 2048 bits, but elliptic curves are shorter

Http is different from Https

  • Data transmitted through HTTP is unencrypted, that is, plain text. Therefore, it is very insecure to transmit private information through HTTP. To ensure that private data can be encrypted and transmitted, Netscape designed the Secure Sockets Layer (SSL) protocol to encrypt data over HTTP, giving birth to HTTPS. To put it simply, HTTPS is a network protocol that uses SSL and HTTP to encrypt transmission and authenticate identities. It is more secure than HTTP. The main differences between HTTPS and HTTP are as follows:
    1. In general: HTTPS=SSL+HTTP
    2. HTTPS requires you to apply for a certificate from a CA. Generally, there are few free certificates, so a certain cost is required.
    3. HTTP is a hypertext transmission protocol, and information is transmitted in plain text. HTTPS is a secure SSL encryption transmission protocol.
    4. HTTP and HTTPS use completely different connections and use different ports, the former 80 and the latter 443. (This is only the default port is different, actually the port can be changed)
    5. HTTP connections are simple and stateless; HTTPS is a network protocol that uses SSL and HTTP to encrypt transmission and authenticate identity. It is more secure than HTTP.

Http headers

  1. general:
  • Cache-control Control Cache ✨
  • Connection Connection management, item by item header ✨
  • Upgrade Upgrade to another protocol
  • Information about the VIA proxy server
  • Wraning error and warning notifications
  • Transfor-encoding Transmission Encoding of the packet body ✨
  • View the header at the end of Trailer message
  • Pragma message instruction
  • Date Date when the packet is created
  1. request:
  • Accept The media types that the client or agent can process ✨
  • Accept-encoding Indicates the Encoding format that can be processed preferentially
  • Accept-language Preferred natural Language that can be processed
  • Accept-charset Specifies the character set that can be processed first
  • If-match Compares entity tags (ETage) ✨
  • If-none-match compares entity tags (ETage) to if-match ✨
  • If-modified-since Compares the resource update time (last-modified) ✨
  • If-unmodified-since Compares the resource update time (last-modified) as opposed to if-modified-since ✨
  • If-rnages send range requests for entity Byte when the resource is not updated
  • Range Requests the byte Range of the entity ✨
  • Authorization Web authentication information ✨
  • Proxy-authorization Proxy servers require Web authentication information
  • Host Server that requests resources ✨
  • From Email address of the user
  • User-agent Client program information ✨
  • Max-forwrads Indicates the maximum number of hops
  • Priority of TE transmission encoding
  • Referer requests the URL originally placed
  • Expect expects specific behavior from the server
  1. response
  • Accept-ranges Specifies the range of bytes that are acceptable
  • Age Calculates the elapsed time of resource creation
  • Location Specifies the URI ✨ that the client is redirected to
  • Vary proxy server cache information
  • ETag can represent a string ✨ that is unique to a resource
  • Www-authenticate The server requests authentication information from the client
  • Proxy-authenticate The Proxy server requests authentication information from the client
  • Server Server information ✨
  • Retry-after The header field used with status code 503 indicates the time when the server is requested next

The process by which a browser enters a URL into a page display

  1. First, enter the URL in the address bar of the browser. The URL is parsed to check whether the URL is valid
  2. The browser first looks at the browser cache, system cache, and router cache. If there is one in the cache, the page content is directly displayed on the screen. If not, skip to step 3.
    • Browser cache: The browser keeps DNS records for some time, so it is only the first place to resolve DNS requests;
    • Operating system cache: If this record is not included in the browser cache, the system will call the operating system to get the operating system record (saving the recent DNS query cache).
    • Router cache: If the preceding two steps fail to obtain DNS records, continue to search the router cache.
    • ISP cache: If all the preceding parameters fail, continue to search for the ISP.
  3. Before sending HTTP requests, domain name resolution (DNS resolution) is required to obtain corresponding IP addresses.
  4. The browser initiates a TCP connection to the server and establishes a TCP three-way handshake with the browser.
  5. After the handshake succeeds, the browser sends an HTTP request to the server for data packets.
  6. The server processes the received request and returns the data to the browser
  7. The browser received an HTTP response
  8. The browser decodes the response and, if it can be cached, stores it in the cache.
  9. Browsers send requests for resources embedded in HTML (HTML, CSS, javascript, images, music…).
  10. The browser sends an asynchronous request.
  11. The page is completely rendered.

Browser cache locations, in order of priority:

  • Service Worker
    • Service Worker is a relatively new Web technology. After absorbing the Event Page mechanism of ChromePackaged App and the failure of HTML5 AppCache standard, the Chromium team A new W3C specification is proposed to improve the offline caching capability of WebApp and narrow the gap between WebApp and NativeApp. Service Worker translated from English is a Service Worker, serving the background thread of front-end page, based on Web Worker implementation. Has an independent JS running environment, share and assist the front-end page to complete the front-end developers assigned to the need to quietly perform tasks in the background. It can intercept and process network request, message push, silent update, event synchronization and other services.
    • 'serviceWorker' in navigatorDOM access is not possible
    • The service worker must be HTTPS
    • Advantages and application scenarios
      1. Offline cache: It can store unchanged or rarely changed resources in H5 applications on the client side for a long time, improving the loading speed, reducing traffic consumption, and reducing server pressure. Such as medium and heavy H5 games, frame data independent Web information client, Web mail client and so on
      2. Message push: Activate sleeping users, push instant messages, announcements and notifications, inspire updates, etc. Such as web information client, Web instant messaging tools, H5 games and other operational products.
      3. Event synchronization: Ensures that tasks generated on the Web can be completed even when the user closes the Web page. For example, web mail clients and Web instant messaging tools.
      4. Timed synchronization: Periodically triggers a timed synchronization event in the Service Worker script to refresh the cache in advance. Such as web information client.
  • Memory Cache
    • Memory cache, existing in memory, high reading efficiency, short survival time, TAB page closed release.
    • Do not care about the value of the HTTP cache header cache-control,
  • Disk Cache
    • A Cache stored on a hard disk is slower to read, but everything can be stored on disk, compared to a Memory Cache in terms of capacity and storage timeliness.
    • Coverage is basically the largest. Based on the fields in the HTTP Herder, it determines which resources need to be cached, which resources can be used without being requested, and which resources have expired and need to be re-requested. And even in the case of cross-site, resources with the same address once cached by the hard disk will not be requested again.
  • Push Cache
    • Push Cache is HTTP/2 and is used when all three caches fail. It only exists in the Session, is released once the Session ends, has a short cache time (about 5 minutes in Chrome), and does not strictly implement the cache instructions in the HTTP header.

Browser Cache policy

  • Strong and negotiated caching, and caching policies are implemented by setting HTTP headers.
  • The process of a browser request: every time the browser initiates a request, it first looks up the result of the request and the cache identifier in the browser cache. Each time the browser receives a return request result, it stores the result and the cache id in the browser cache.
  • Strong cache:
    1. Instead of sending a request to the server, the resource is read directly from the cache. In the Network option of the Chrome console, you can see that the request returns a status code of 200, and Size displays from Disk cache or from Memory cache. Strong caching can be implemented by setting two HTTP headers: Expires and cache-Control.
    2. The difference is that Expires is a product of HTTP1.0 and cache-Control is a product of HTTP1.1. If both exist, cache-Control takes precedence over Expires. In some environments where HTTP1.1 is not supported, Expires can be useful. So Expires is an outmoded object that currently exists as a way to write compatibility. Strong caches Determine whether or not the cache is cached based on whether or not the server side file has been updated after a certain time or period, which may result in the loading file is not the latest content on the server side, so how do we know whether the server side content has been updated? Here we need to use a negotiated cache strategy.
  • Negotiation cache:
    1. Negotiation cache is a process in which the browser sends a request to the server with the cache identifier after the cache is invalid, and the server decides whether to use the cache based on the cache identifier. The negotiation cache takes effect, returns 304 and Not Modified/the negotiation cache is invalid, and returns 200 and the request result. Negotiated caching can be implemented by setting two HTTP headers: Last-Modified and ETag.
    2. Last-Modified / If-Modified-SinceDisadvantages: If the file is opened locally, last-Modified will be Modified even if the file is not Modified. The server will fail to hit the cache and send the same resource because last-Modified can only be measured in seconds. If the file is Modified in an imperceptible amount of time, The server will assume that the resource is still hit and will not return the correct resource
    3. Etag / If-None-MatchEtag is a unique identifier (generated by the server) that returns the current resource file when the server responds to a request. Etag is regenerated whenever the resource changes.
    4. A comparison between the two:
      1. First, Etag is superior to Last-Modified in accuracy.
      2. Last-modified time is in seconds. If a file changes several times within a second, their last-Modified time is not actually Modified, but Etag changes each time to ensure accuracy. If the server is load-balanced, the last-Modified generated by each server may also be inconsistent.
      3. In terms of performance, Etag is inferior to Last-Modified, because last-Modified only records the time, whereas Etag requires the server to compute a hash value through an algorithm.
      4. The server verification takes precedence over the Etag

Browser caching mechanism

  • Mandatorized caching takes precedence over negotiated caching. If mandatorized caching (Expires and cache-control) is in effect, the Cache is used directly. If not, the negotiation cache (last-modified/if-modified-since and Etag/if-none-match) is implemented. The negotiation cache is determined by the server whether to use the cache or not. If the negotiation cache is invalid, the request cache is invalid and 200 is returned. Return the resource and cache id to the browser cache; If it takes effect, return to 304 and continue to use the cache.

Browser cache application

  • Frequently changing resources. Cache-Control: no-cache
  • A resource that is not constantly changing. Cache-control: Max – age = 31536000. (for example, jquery-3.3.1.min.js, lodash.min.js, etc.) all adopt this mode.

Browser caching and user behavior

  • The effect of user behavior on the browser cache refers to the cache policies that are triggered when the user acts on the browser. There are three main types:
    1. Open the web page and enter the address in the address bar to check whether there is a match in the disk cache. Use if available; If no network request is sent.
    2. Plain flush (F5) : Since TAB is not closed, memory cache is available and will be used preferentially (if a match is made). Disk cache comes next.
    3. Forced refresh (Ctrl + F5) : The browser does not use caching, so requests are sent with cache-Control: no-cache(with Pragma: no-cache for compatibility) in the header, and the server returns 200 and the latest content.

The Http status code

  1. 2 xx success
  • 200 The request was successful and the desired response header or data body will be returned with this response.
  • 201 The request has been implemented, and a new resource has been created based on the request’s requirements, and its URI has been returned with the Location header
  • 202 The server has accepted the request but has not yet processed it
  1. 3 xx redirection
  • The page for request 301 (Permanent move) has been permanently moved to a new location. When the server returns this response (a response to a GET or HEAD request), it automatically forwards the requester to the new location.
  • The 302 (temporary mobile) server currently responds to requests from web pages in different locations, but requesters should continue to use the original location for future requests.
  • 303 (View other locations) The server returns this code when the requester should use a separate GET request for a different location to retrieve the response.
  • 304 (Unmodified) The requested page has not been modified since the last request. When the server returns this response, the web page content is not returned.
  • 305 (Using a proxy) The requester can only access the requested web page using a proxy. If the server returns this response, it also indicates that the requester should use a proxy.
  • 307 (temporary redirection) The server currently responds to requests from web pages in different locations, but the requester should continue to use the original location for future requests.
    • 302 is the protocol status code of HTTP1.0. In http1.1, there are two additional 303 and 307 to refine the 302 status code. 303 explicitly states that the client should use the GET method to obtain the resource, and that it will turn the POST request into a GET request for redirection. 307 will comply with browser standards and will not change from POST to GET.
  1. 4XX Client error
  • 401 Current request requires user authentication. If the current request already contains Authorization certificates, the 401 response represents server validation that has rejected those certificates
  • The 403 server understands the request, but refuses to execute it. Unlike the 401 response, authentication does not help, and the request should not be submitted twice
  • 404 Request failed. The requested resource is not found on the server
  1. 5XX Server error
  • The server encountered an unexpected condition that prevented it from completing processing the request. Typically, this problem occurs when the server’s code fails.
  • The 501 server does not support a feature required for the current request. When the server does not recognize the requested method and cannot support its request for any resource.
  • 502 A server working as a gateway or proxy received an invalid response from the upstream server when attempting to execute the request.
  • 503 The server is currently unable to process requests due to temporary server maintenance or overload. The situation is temporary and will recover over time.

What is the CDN

  • The full name of CDN is Content Delivery Network. CDN is an intelligent virtual network built on the basis of the existing network. It relies on the edge servers deployed in various places, through the central platform of load balancing, content distribution, scheduling and other functional modules, users can get the content nearby, reduce network congestion, and improve user access response speed and hit ratio. The key technologies of CDN mainly include content storage and distribution.

Ajax

  • Ajax is Asynchronous Javascript And XML,
var request = new XMLHttpRequest(); // Create an XMLHttpRequest object

request.onreadystatechange = function () {
  // The function is called back when the state changes
  if (request.readyState === 4) {
    // Completed successfully
    // Determine the response result:
    if (request.status === 200) {
      // Get the text of the response via responseText:
      return request.responseText;
    } else {
      // Failed, according to the response code to determine the failure cause:
      returnrequest.status; }}else {
    // HTTP requests continue...}};// Send a request:
request.open("GET"."/api/categories");
request.send();
Copy the code