🌈 I decided to have a macro understanding of THE HTTP protocol, including TCP/HTTPS/cross-domain/Socket/WebSocket/Session & Cookies, and planned to use a long article to deal with them. Students can take advantage of their own interests, and check the following outline before reading:

  1. TCP Transmission process
  2. HTTP 1.0 | | 1.1 2.0 protocol
  3. HTTPS | asymmetric encryption algorithm overview | TLS validation process
  4. QUIC agreement
  5. The Socket API | concurrent server
  6. The WebSocket protocol
  7. Cross domain and CORS
  8. Cookies | Session

The length is very long, the mobile end does not show the side directory, it is recommended to use the computer end to open the reading.

Most of the content is read, and a small part of the code is implemented based on Spring Boot. The wider the content, the more details you may miss, and the less depth you may have. The author through the network resources to the above knowledge points were sorted out, the understanding of some concepts will inevitably be biased, welcome to add correction.

Note: Socket is not a protocol, but upper-layer network protocols are inseparable from it. Therefore, the author also included it in this article to introduce.

RTT : Root of all evil

Why is RTT the “root of all Evil”? Because the evolutionary history of network protocols is the history of human’s constant struggle with RTT.

Round-trip Time (RTT) : indicates the delay between the Time when the sender sends data and the Time when the sender receives a response. RTT includes the Time between the sender and the sender. For example, if we visit the server in Shanghai from Beijing and the physical distance between the two ends is 1000 km, if the information travels at the speed of light 30,000 km/s, then RTT= (1000 x 2/30,000) ≈ 6.7ms from Beijing to Shanghai.

Of course, consider that information does not travel completely at the speed of light, and information is forwarded in a topological network, rather than a simple two-point line. Therefore, in fact, the RTT from Beijing to Shanghai is much higher than this simple estimate. In fact, the RTT delay from Beijing to Shanghai is about 40 ms.

It is easy to infer that the closer the two hosts are geographically/topologically, the lower the estimated delay should be. One effective way to reduce RTT delay, given the limitations of objective physics (such as the speed of light), is to bring the two hosts “closer together.” CDN is born through such inspiration. It distributes content to network nodes close to users to reduce user access delay, which is simple and effective.

Start with the process of network transmission

On the sender, HTTP packets from the application layer are continuously “boxed” and then sent to the receiver through the link layer, and then “unboxed” gradually. This is part of the classic five-layer network model, where we do not consider the physical layer.

Before HTTP/3, HTTP ran on top of TCP. Let’s do a review of THE TCP protocol here, rather than just sketching a three-way handshake or a four-way handshake (not covered in this article). I have chosen to review the details of TCP here so that we can explore the impact of TCP on the evolution of the HTTP protocol, including the understanding of sockets, in future articles.

Reviewing TCP Connections

Three-way handshake

TCP uses complex mechanisms and timers to ensure its reliability. This is a brief description of the TCP connection section. The process of establishing TCP connection between client and server is shown in the figure below:

Note:

  • Uppercase ACK is a flag bit in a packet. The meaning of uppercase ACK is different from that of lowercase ACK.
  • A TCP connection has a delay of 1.5 RTT. Some people think that data can be carried in the third handshake, so a TCP handshake can also be 1 RTT delay. The author calculated 1.5 RTT in the whole paper.
  • The first SEQ number sent by both parties is a random number, so x and y are used instead.

Why three handshakes?

The three-way handshake ensures that the Client and Server can confirm each other. Assume that a long-delayed packet has just arrived at the Server (but the Client may have closed the connection in advance because it cannot wait for a reply). If the Server enters the ready state without verification, it will be stuck in the empty state and waste its network resources.

Therefore, both parties should use the SEQ serial number to confirm and return the ACK to indicate that a valid bidirectional connection has been established.

MSS

Maximum Segment Size (MSS) : indicates the Maximum Segment length set by TCP. At the beginning of the connection (sending the SYN segment), the MSS is sent to the peer, indicating the maximum value of the data portion of the TCP packet segment that the peer expects to receive (note that only the data portion is included, and the TCP header length is not included).

Including the 20 bytes of the IP packet header and the 20 bytes of the TCP packet header, sending a TCP packet requires 40 bytes of load. Set the internal data of TCP to n bytes and the transmission efficiency = n/(n + 40). In other words, the larger the amount of data transmitted by TCP, the higher the network utilization. Therefore, we always try to adjust the MSS as high as possible.

Of course, there are limits to MSS Settings. The data link layer specifies a frame size of up to 1518 bytes, which contains a 14-byte frame header, a 4-byte frame checksum, and a data portion of up to 1500 bytes (also known as MTU). If the total length of an IP packet exceeds 1500 bytes, two frames are needed at the data link layer for transport.

To save overhead, we always want an Ethernet frame to fit exactly one IP datagram, or to transmit as much content as possible within a data frame, so the MSS of TCP packets is compressed to (1500-40) = 1460 bytes.

MTU

Why does the IP layer not want to fragment a complete TCP packet? The IP layer does not have a loss retransmission mechanism. That is, if a small part of the TCP packet is lost during transportation, the TCP protocol at the upper layer needs to retransmit the entire packet.

Maximum Transmission Unit (MTU). We just mentioned that TCP packets are fragmented when IP packets exceed 1500 bytes. This 1500 bytes is the MTU set at the Ethernet data link layer, actually MSS = (MTU-IP header – TCP header).

The TCP protocol allows both parties to negotiate on the SIZE of the MSS at the beginning of the connection establishment. The purpose of the TCP protocol is to prevent TCP packets from exceeding the MTU limit in advance, so that the packets are fragmented again when they are transmitted from the IP layer to the data link layer.

🌎 this post provides more details about MSS and Mtus.

Sliding Window mechanism (flow control)

TCP is a full-duplex communication. Both sides of the communication have independent receiving Windows and sending Windows. Sliding Windows are described by three Pointers: head/tail/current position. These Pointers change from time to time when a packet is received or sent. In addition, the sliding window sends data in bytes. Take the following diagram as an example:

Byte sequences can be divided into four parts:

  1. That has been processed and identified.
  2. Processed but not confirmed in the sliding window.
  3. In a sliding window waiting to be processed.
  4. Not in the sliding window for now.

Processing here refers to receiving, or sending. At the same time, the Sender’s send window and the Receiver’s receive window are not always the same size.

In the figure, the start byte of the Sender’s sending window is 32, the window Win size is 5 (specified by the Receiver of the other party), and the sending window ends at sequence number 36. For the Sender:

  1. Sequence number 31 and before has been sent and confirmed by the Receiver’s ACK.
  2. TCP packets of sequence 32-34 are loaded and sent, but no receipt is received.
  3. The remaining 35,36 have not been sent yet.
  4. Due to the limitation of the window size of the other party, the sequence after number 37 cannot be sent temporarily.

For the Receiver, it has just received the sequence of bytes of numbers 32-34, and after a certain amount of time (see the delayed acknowledgement mechanism), the Receiver sends an ACK = 35 to the Sender and readjugs the window.

What’s the Sender’s send window doing

Let’s set the start sequence of the Sender’s send window to be N. Each time the Sender receives an ACK message from the Receiver:

  • Take out the ACK field and assign it to N again, which is equivalent to pushing the Sender’s send window back.
  • Calculating the end of the window based on Win+ N is equivalent to extending the window backwards.

Set Sender’s next sequence to start at next and prepare to send TCP packets. Every time a byte is loaded, the value of next is updated. It moves forward as many byte sequences as it sends, but no more than N + Win.

The byte sequence sent by the LOADED TCP packet is in the “sliding window, but no response” state, which is favorable for the Sender to resend if necessary. After receiving the Receiver’s new acknowledgment ack, the Sender’s send window “slides” through these sequences and backward incorporates the new byte sequence waiting to be sent.

What is the Receiver window doing

Set the start sequence number of the receiving window of the Receiver as M. Each time when the Receiver acknowledges receiving the message sent by the Sender, it will assign the last sequence number of the ordered sequence to ACK and return the receipt to the Sender. At the same time, the Receiver will assign the value of ACK to M, which is equivalent to pushing back the receiving window of the Receiver.

The Receiver will also calculate the value of Win according to its actual status, load the message and send it to the Sender, and inform the Sender not to send too much content at one time.

For example, after the Receiver receives sequence no. 31, the ACK of the TCP packet sent to the Sender next time is 32 and Win is 5 (the value of Win partly depends on the remaining buffer capacity n of the Receiver). The Sender, after receiving this message, will know that the sequence 31 and the preceding bytes have been received, and will only send sequence 32 next time and up to number 36.

The value of Win depends on the size of the congestion window in addition to the remaining buffer capacity n at the Receiver, which we will refer to later.

To illustrate the problem of “Receiver assigns the last ordered sequence number to ack”, here is another example. Here is a diagram of the sequence of bytes received by the Receiver, as shown in the figure below. In addition to receiving the completely ordered sequence of 0 and 100, it is assumed that the following sequence of 121, 140,146 and 165 bytes is received, but they are discontinuous. Presumably the Sender would send the sequence in the sliding order of the window, obviously the vacant sequence in the middle is because the corresponding TCP packet was lost or did not arrive in time due to network problems.

In this case, the Receiver will send an ACK of 101 to the sender next time: “0 ~ 100 sequence is completely ok, I want to complete 101 ~ 120” instead of sending 196. In other words, the head of the Receiver window remains in position 101.

❗ If the data packets between 100 and 120 are not received late, the receiving window of the Receiver stagnates and does not slide backward.

Details of the data transfer process

  1. At the transport level, the two parties interact with each other based on complete TCP packets (or packets). The Sender will take out a large sequence of consecutive bytes from the sliding window at one time (the data amount does not exceed the Win sent by the receiver, the upper limit is MSS) and encapsulate it into a TCP packet before sending it. The Sender will maintain a table to record the start sequence number of each TCP packet sent and the length of the data part of this TCP packet, so as to facilitate the subsequent resending of data for the Receiver until it is judged that the packet has no problem according to the ACK of the Receiver. (Refer to timeout retransmission)
  2. The receiving window of the Receiver receives a continuous sequence of bytes from the Sender, but does not respond immediately. (Refer to delayed confirmation)
  3. In a complex network, it is inevitable that some data will arrive late. If the Receiver receives a duplicate byte sequence, it discards it.
  4. The TCP protocol itself does not guarantee that the TCP packets sent by the Sender will arrive in time order. Therefore, the Receiver uses the receiving window to buffer and waits for the missing bytes in the byte stream to be received before delivering them to the upper layer.
  5. The ACK number of Receiver indicates that the sequence before ACK – 1 is sequential. Even if the Receiver receives the next few discrete byte sequences, it still sends the Sender the last sequence number +1 of the previously ordered byte sequence. (Refer to fast retransmission)

Timeout retransmission

The TCP protocol stipulates that the Sender must be responsible for each TCP packet it sends. Therefore, in addition to recording the contents of this packet, the Sender also sets a retransmission timer for each TCP packet. The retransmission timer takes one RTO as the time unit. If the corresponding ACK is not received within this time unit, the Sender will think that this TCP packet has been lost and will actively resend this TCP packet at this moment. The TCP packet is not necessarily lost, perhaps it arrives at the Receiver a long time later. If the Receiver has received the replacement TCP packet in advance, the redundant TCP packet is discarded.

Retransmission Timeout (RTO) : the period of time between the moment a packet is sent and the time it receives the corresponding ACK, which largely determines the efficiency of the Timeout Retransmission mechanism. If the RTO is set too small, unnecessary retransmissions may occur. If the RTO setting is too large, the missing byte sequence of the Receiver is not replenishes. The RTO is usually set to be larger than the average RTT time obtained from the sample.

TCP retransmission ambiguity problem

The TCP timeout retransmission mechanism actually has some vulnerabilities. If the SEQ of a TCP packet is x, the seQ of the packet is x whether the sender sends it for the first time or retransmits it. Assuming that the ack = x+1 of the acknowledgement packet has now been received, is this acknowledgement for a normally sent packet? Is it confirmation of a retransmitted packet?

The Sender’s RTT sampling of the packet becomes blurred: it cannot tell which packet the ACK is an acknowledgement of, because the SEQ of the two packets is the same.

The figure above shows two cases 1 and 2. Receiver is affected by the delayed acknowledgement mechanism, so there is a time difference between receiving and sending acknowledgements (indicated by the orange line).

  • If the sender determines that the case is 1, but the actual case is 2, the estimate of this RTT will be larger.
  • If the sender determines that the case is 2, but the actual case is 1, the estimate of this RTT will be small.

Both of the above results will affect the Sender’s judgment on the actual RTO. Because the RTO is calculated by sampling the RTT time of the most recently sent packet, retransmission ambiguity can cause the Sender to set the RTO too high or too low.

Fast retransmission

The fast retransmission mechanism improves the responsiveness of the Sender, so that it does not have to wait completely for the RTO time to determine whether a TCP packet is lost. As mentioned in “Details during data transmission”, the Receiver only sends the last sequence number +1 of the ordered byte sequence. Assume the Sender has already sent:

  1. 📦 Package 1 (101-120 byte sequence)
  2. 📦 Package 2 (121-140 byte sequence)
  3. 📦 Package 3 (141 to 160 byte sequence)

In the subsequent response packets of the Receiver, ack is 101 for three consecutive times, and the Sender will judge that packet 1 is suspected to be lost. In this case, it will not wait for the end of the retransmission timer of packet 1, but immediately send packet 1 again.

Note: Fast retransmission is a form of congestion control.

Delay to confirm

The Receiver does not need to receive an ACK for each byte sequence received. TCP provides two policies here:

  • Cumulative acknowledgements: The Receiver only receives data during a time window and checks for the latest ACK acknowledgements at the end of the time window. The TCP protocol stipulates that the Receiver cannot delay the acknowledgement too much, lest the Sender trigger the timeout retransmission mechanism. This time window is set to no more than 500 ms.
  • Piggyback confirmation: When the Receiver needs to send some data to the Sender, it sends back the latest ACK number.

The benefit of delayed validation is a significant reduction in network traffic. The ACK confirmation number sent by the Receiver means that the Receiver determines that the ACK number and all byte sequences before it are also perfectly fine.

Congestion control

Flow control is something that needs to be considered between the Sender and Receiver, and is implemented by sliding the window. Congestion control takes into account the overall situation of the network. In the early days, TCP’s judgment on congestion was simple: “Packet loss is congestion; If there is congestion, reduce the amount of data sent.

Congestion control is implemented using a congestion window and is controlled according to the level of congestion. Suppose the sliding window size calculated by the Receiver is RWND and the congestion window size calculated by the current comprehensive network state is CWND, then the actual Win = Min {RWND, CWND} of the Receiver.

Congestion control algorithms include slow start, congestion avoidance, fast retransmission (just described), fast recovery:

  1. Slow start: At the beginning, the capacity of the sender’s congestion window is minimal, but CWND increases exponentially over a period of time.
  2. Congestion avoidance: CWND reaches a certain threshold during slow startthreshWhen the CWNDLinear stabilityIncreased.
  3. Quick recovery: if packet loss occurs in process 1,2, then CWND immediatelybinary. If the half is less thanthresh, slow start is performed. Otherwise, the execution is avoided according to congestion.

Note: In the initial TCP Tahoe release, once a packet is lost, CWND is immediately reduced to its initial minimum, not halved. The algorithm was later scrapped because it was too inefficient.

TCP Transmission process

The following is a coherent demonstration of the TCP send and receive process, ignoring some details and assuming that each TCP packet sent by the server contains a fixed 100 B data portion.

Why did HTTP choose TCP?

Let’s get down to business. HTTP is an application-layer protocol that doesn’t really care how the underlying message is sent.

The HTTP protocol has two options: UDP or TCP. But why did the earlier HTTP protocol choose the latter? Let’s start with two examples: TCP and UDP.

  • TCP: Like when you soulmate over the phone. You need to dial first, then the TA needs to answer, and a two-way communication is established. While you’re on the phone, you can talk about anything. Of course, phone calls require a bit of waiting time because you need to wait for someone to answer.
  • UDP: Like when you date SoulMate by text message. This way, there is no waiting time, just send the text directly. Of course, he or she might have seen the message, or he or she might have accidentally missed it.

If you simply send a simple greeting, “Have you eaten yet? “Then it’s ok to send a text message. If you really miss your Soulmate, give him or her a call. 😏

In the beginning, HTTP chose TCP for a clear reason: it wanted the transport layer to guarantee the quality of communication. Otherwise, the browser would suffer from garbled text, missing images, and ineffective CSS styles, which would be a huge blow to the user experience. But with the advent of HTTP/2, people seem to have discovered the performance problems of TCP in certain network situations……

Is TCP really a panacea?

TCP and its close friend IP are commonly referred to as TCP/IP. They were conceived in 1973 by two young scientists, Winton Cerf and Robert Kahn, and became the Internet around 1980 when the TCP/IP protocol was introduced for “heterogeneous” networking environments, allowing computers with different hardware and operating systems to work together.

Until now, TCP has been synonymous with “reliable” because it uses quite sophisticated mechanisms to ensure that data is transmitted reliably.

TCP/IP has also had its ups and downs. For IP, 4.3 billion IPv4 addresses were allocated in 2019, and the next generation of Internet construction officially (had to) transition to IPv6.

And TCP has a big problem: at the beginning of THE design of TCP, computers are basically connected through the wired network, and the possibility of network instability is relatively low, so the designer of THE TCP protocol at that time believed that: packet loss means network congestion, once packet loss occurs, TCP congestion control will actively reduce the throughput.

Nowadays, the introduction of wireless network has made the network instability normal in some scenarios. For example, in the subway and station scenarios, it is difficult for the client to quickly complete the third communication and establish TCP connection with the server under the poor network environment because of the high packet loss rate.

Congestion control in TCP and the handling of packet loss led to a follow-up article on “Does HTTP/2 really solve queue header congestion?” In the discussion.

A classic HTTP connection

HTTP is a protocol based on TCP/IP. The browser opens a connection to send the request and then waits for the server to return the appropriate resource. Is a stateless connection protocol (which is why we later need to use cookies, sessions to maintain Session state).

By default, the third TCP handshake does not carry data. Throughout this article, Client refers to the browser, which I sometimes refer to as the Client; Server refers to the Server side.

Before the HTTP connection is established, the TCP connection is established through the three-way handshake. With HTTP/1.0, the Connection between a client and a server is disconnected after a request/response, or is actively maintained for a period of time through Connection:keep-alive. In HTTP/1.1, by default, a Connection is allowed to remain open for a period of time until a party actively sends Connection:close.

Of course, there’s a lot of work to be done before that. For example, the browser first checks whether the URL is valid (see URL Encoding rules); Obtain the MAC address of the gateway based on ARP; The DNS domain name is resolved in the following order: local Host file -> local DNS server provided by the ISP -> authoritative DNS server -> root DNS server…… After a series of steps, the browser learns the domain name’s actual IP address, and finally establishes the TCP connection, establishes the Socket communication……

Including the time consumed by TCP connections, an HTTP response requires 2.5 RTT delays.

HTTP/1.1 Packet structure

HTTP packets are divided into two types: request packets (sent by the browser) and response packets (sent by the server). The two message structures are different. Generally speaking, AN HTTP packet can be divided into header and body.

There are four types of heads: universal head, response head, request head and entity head. An HTTP packet contains multiple headers. The general header and entity header are applicable to HTTP request and response packets.

General head

The generic header contains basic information about the connection (such as the connection time, whether to keep a long connection, etc.), as well as cache-related information. This section only describes the common values of common headers. We can also customize some headers, such as a token as a user credential, but this requires both the front and back ends to understand its meaning and handle it correctly.

Here are four common heads:

  1. Cache-control:

    • No-cache: The repeated request is forcibly sent to the server for verification. If the request does not expire, the cache content is returned; otherwise, new content is returned.

    • No-store: No cache.

  2. Date:

    • Tue, 23 Jun 2020 07:59:00 GMT: indicates the request/response time, which is automatically generated by the browser or server and meets the format defined in RFC822. The caching mechanism uses the Date of the request to determine whether the cached content can be directly responded to.
  3. Connetion:

    • Close: Close the connection immediately after this request is completed.
    • keep-alive: To maintain the connection for a period of time after this request is completed. And oftenKeep-aliveHeads are used together.
  4. Keep-alive (if the Connection header is keep-alive) :

    • nThis n represents the number of seconds. How many seconds to keep the connection open.

Here I mention the caching mechanism as an extra. The process goes something like this: If the same request has been stored in the cache and has not expired, the cached contents are fetched directly. The cache-control header is much more than just no-cache and no-store. HTTP actually provides quite a few headers for flexible Control. Thanks to the caching mechanism, the browser can quickly load the pages of sites we have visited.

Open Chrome for delay, we refresh the same page, and open Network analysis observation: most of the files are from the local cache.

Learn more about the browser’s caching mechanism at 🌎.

Entities in the head

HTTP: Hyper Text Transfer Protocol (Hyper Text Transfer Protocol) : Any type of file can be transferred over this Protocol. This file is loaded as an entity into the HTTP request/response header for transmission.

To ensure that both parties understand what type of file the other is sending, we need to use the entity header to describe what is being transferred. Only the common entity heads are shown below.

If we open a page or file from the server and it shows garbled characters, it is probably a problem caused by the inconsistency of the character set adopted by both parties.

The head role
Allow The server returns a 405 error for an invalid method request. At this point the server sets this header to tell this URL which request method is allowed.
Content-Length The size of the physical content, in bytes.
Content-Encoding The server sideFor example, gzip, deflate, etc. Large files are usually compressed before being sent.
Content-Type The media type of the entity. Can be specifiedcharsetCharacter set encoding type, separated by semicolons.
Content-Language Natural language of entity content, such as Zn-CN in Chinese.
Expires Indicates the expiration time of the resource. The browser will combineEtagCheck whether cached files have expired.
Etag Returns the signature of the file, used to determine whether the file has been modified. If the file changes, the browser requests the server again for an updated copy of the resource.

The Expires/Etag header can also be cached using cache-control. If both of them exist, cache-control is preferred.

Spring Boot’s handling of the Allow header

The Spring Boot framework makes it easy to specify methods for mapping. You just add the method value in @requestmappping. Method is a variable of type RequestMethod[], so you can use {} to cover multiple permitted request methods.

@RequestMapping(value = "/hi",method = {RequestMethod.GET,RequestMethod.HEAD})
public @ResponseBody String get(a) {
	//TODO ...
}
Copy the code

The commonly used content-type Type

value role
text/plain Plain text format.
text/html Return the HTML file.
image/gif GIF image format.
image/jpeg JPG image format.
image/png PNG Image format.
application/x-www-form-urlencoded Upload forms containing only K-V key-value pairs.
multipart/form-data When uploading binary files (images, videos, compression packages) on a form.
application/xml In order toxmlFile format for data interaction, but I’ve rarely used this approach.
application/json In order tojsonFile format for data interaction.A common form of communication between the page and the server in current front-end separation projects.

For application/ JSON data types, Spring Boot can easily extract the JSON content submitted by the front end using @RequestBody. It can also convert an object to JSON format and return it to the front end using @responseBody.

By the way, I prepared a simple combat at the end of the article: receive files based on Spring Boot and send them to the Nginx proxy server. For those interested, see the end of the article, or choose to skip this section.

HTTP request Packet

An HTTP request packet consists of the following parts: request line, general header, entity header, request header, and request body.

The request line

We usually in the browser only need to enter www.baidu.com can access to baidu page. In fact, a complete request line includes the request method, the target server URL, and the HTTP version number.

The CONNECT www.baidu.com:443 HTTP / 1.1Copy the code

8 request methods

Request methods, the most familiar are the GET and POST methods. In fact, there are eight request methods.

As of HTTP/1.0, three request methods are supported:

methods role
GET Gets the data from the specified URL.
POST Submit data, such as a form, to the specified URL.
HEAD Gets the data from the specified URL, but only the HTTP header is required.

HTTP/1.1 provides five additional request methods:

methods role
OPTIONS Returns the HTTP methods supported by the server, appearing inPreview the requestIn the middle.
PUT Uploads the latest content to the specified resource location.
DELETE Requests the server to delete the content at the specified resource location.
TRACE The command output displays the request received by the server for test and judgment.
CONNECT Convert the request connection to transparentTCP/IPConnecting.

Note: The OPTIONS method is used to initiate non-simple cross-domain requests and is related to CORS.

The difference between GET and POST methods

  1. GET is harmless when the browser rolls back. But the POST will submit the request again.
  2. GET requests can only be URL-encoded, but POST can be set in a variety of formats.
  3. Due to the length limit of the URL, the parameters in the GET request are also length limited.
  4. The parameter of GET is directly exposed to the URL and cannot be used to transfer sensitive data.
  5. In general, the GET parameter does not need the request body, and the POST parameter is placed in the request body.

How does the browser handle GET and POST methods

Since POST carries more data, in general, a browser might do two steps on a POST request.

For GET requests, the browser sends the HTTP Header along with the parameters, and the server returns a 200 (OK).

For POST requests, most browsers send a Header, the server responds with 100 (continue), the browser sends an internal request body, and the server returns 200 (OK). It really depends on the browser’s strategy to avoid wasting web resources, especially when submitting large amounts of data.

It makes no difference to the server whether the browser sends a POST request at once or in batches. The log records only one POST.

The boundary between GET and POST

GET and POST are semantically different. GET requests can also carry request bodies; POST requests can also take a small number of parameters in the request URL. But the server generally doesn’t process this extra content.

  • For the behavior of retrieving resources, we use GET requests.
  • For the behavior of submitting a form, or resource, a POST request is generally taken.

See this article: What’s the difference between Get and Post? (What the interviewer wants to hear)

URL Encoding rules

While most current browsers encode urls using UTF-8, older Versions of IE encode urls according to the system’s default Settings. By the way, what we call Unicode is a set of symbols that specifies only the binary code of symbols, but not how that binary code should be stored. Like UTF-8, UTF-16 are implementations of Unicode storage.

In my browser, I typed:

localhost:8080/test/hi? Name = how are youCopy the code

But as we can see from the packet capture tool, the URL is actually compiled to look like this:

localhost:8080/test/hi? name=%E4%BD%A0%E5%A5%BDCopy the code

In fact, E4 BD A0 is the hexadecimal representation of your Chinese character corresponding to the UTF-8 encoding. A Chinese character takes 3 bytes to store. (Chinese characters are represented in the Unicode character set as 2 bytes, but UTF-8 stores it in 3 bytes.)

In addition, the URL specifies that only these characters are allowed:

ASCII letters, digits, ~,! , @, #, $, &, *, (), =, :, /,,; ,? , +, ‘(note: this is single quote, if double quote “is encoded %22)

For a detailed interpretation of URL encoding can refer to: URL encoding rules

The request header

The request header indicates the identity of the originator of the request, where the request is to be made, the ability of the client to process the information, and so on. The server can provide more personalized services based on the needs of the user: for example, the server can infer the country/region of the customer based on the accept-language requested and translate the web content into the Language of the corresponding country/region. (internationalization)

Previously, we introduced Expires and Etag headers, and the browser caching mechanism allows browsers to avoid accessing resources to the server as much as possible to save time on network connections. In THE HTTP/1.1 protocol, many IF headers were added to make browser caching policies more flexible.

The main request headers are described below.

The head role
Accept Tells the server what type of file the request expects to receive, andContent-TypeThe corresponding.* / *Indicates that any type of response is allowed.
Accept-Charset Tells the server the encoding of the received character set.
Accept-Encoding Informs the server of the compression methods supported by the client, andContent-EncodingThe corresponding.
Accept-Language Informs the server of the natural language services that the client wants to be provided, such as Chinese, Japanese, or English.
If-Match If the request objectEtagIf there is no change, a new resource is requested.
If-None-Match If the object’sEtagIf there is no change, a new resource is requested.
If-Modified-Sinze If the requested resource changes after the specified Greenwich mean time, a new resource is requested.
If-Unmodified-Since If the requested resource is not modified after the specified Greenwich mean time, a new resource is requested.
Host The client specifies the WEB server domain name, IP address, and its port number to request.
Refer The client indicates the URL from which it initiated the request, which can be used for cross-domain inspection.
User-Agent The client indicates the browser model it is using.

HTTP response Packet

The HTTP response packet consists of the following parts: status line, general header, entity header, response header, and response entity.

The status line

Here is the status line for an HTTP response packet:

HTTP / 1.1 200 OKCopy the code

The status line contains the HTTP protocol number, the status code, and a brief text description. 200 is what we want most. When an unexpected error occurs, the server sets up a different status code to tell the client what happened.

In general, we can classify status codes into five categories:

Status code meaning
1xx Request received, will continue processing.
2xx Request received successfully, understood.
3xx The requested resource was redirected.
4xx An error occurred on the client.
5xx An error occurred on the server.

Common status codes are listed below:

Status code Phrase meaning
200 OK The client request succeeds. Procedure
304 Not Modified The requested resource is not modified on the server, and the client can continue to use the cached resource.
400 Bad Request The client request is invalid, usually because the URL parameter is not specified as required.
403 Forbidden The server understands the client request, but refuses to serve it, for example, because of insufficient permissions.
404 Not Found The resource does not exist.
415 Unsupported Media Type The server cannot handle the requestor sending the entityContent-TypeFormat.
500 Internal Server Error An internal error occurred on the server. For safety reasons,Requires the developer to actively hide the details of the error from the client.
502 Bad Gateway The proxy server did not receive the correct response from the remote (upstream) server. This is common to proxy servers such as Nginx.In this case, check the status of the upstream server.
503 Server Unavailable The client is currently unable to process client requests. This is usually caused by the number of requests the server cannot handle.
504 Gateway Timeout The proxy server did not receive the response from the upstream server in time. Common in proxy servers such as Nginx.

Response header (basic)

Response headers role
Refresh Specify the time and the new URL (separated by a semicolon) to instruct the browser to redirect. Such as4. https://www.baidu.com
Server The name of the server that handled this request. Such asServer: nginx / 1.6.3.
Set-Cookie Set the HTTP Cookie. Each K-V pair accounts for a separate response header.
Accept-Ranges The server indicates whether to receive a portion of a file/entity.bytesAcceptance,noneNot accepted.
vary Use this header to tell the cache server when to use the returned entity to respond to subsequent requests of the same nature. [1]

[1] : Assume that there is a WEB server S1 (which can gzip entities) and a Cache server S2. The client uses two browsers B1 and B2; Only B1 supports decoding gZIP compressed entities.

Web server S1 takes into account the fact that the gzp-compressed entity returned by cache server S2 cannot be processed when the user is using browser B2, so S1 has two items in its response header: Content-encoding :gzip; Vary: Content-encoding, which tells the cache server S2 that the cached Content will be sent to it only if subsequent requests allow GZIP decoding.

So when a user accesses the resource using browser B2, the request header will not contain accept-encoding :gzip because browser B2 itself does not support GZIP Encoding. The request fails to be verified by the cache server S2, and browser B2 can only directly request the entity content that is not gZIP compressed from the Web server.

Response header (cross-domain)

The following response headers are specifically designed to handle cross-domain cases. Cross-domain checking is a browser behavior, and the browser checks the server’s response header to see if the cross-domain request is valid.

I have collated cross-domain articles through online resources before, and decided to incorporate that article here. You may choose to return to these headers after browsing the cross-domain section.

Response headers role
Access-Control-Allow-Origin Set allowed cross-domain sites (sources). inIn some cases[1] Cannot be set to*.
Access-Control-Allow-Method Allowed request methods, the eight methods described in HTTP/1.0/1.1.
Access-Control-Allow-Header The request header that can be carried.
Access-Control-Allow-Credentials Sets whether to allow the browser to read the content of the response to a cross-domain request.
Access-Control-Max-Age For non-simple cross-domain requests, if they are allowed,There is no need to send the precheck request again within the specified time.
Access-Control-Expose-Headers Ajax is only allowed to fetch the response header when it uses the XMLHttpRequest objectBasic head[2]. Therefore, you need to set the response headers that allow additional exposure.

[1] : “In some cases” refers to requests that need to carry cookies for authentication. The CORS mechanism does not allow the server to set Origin to * (which means that all cross-domain sources are allowed), in order to avoid CSRF cross-domain attacks.

[2] : The “basic header” in the table means cache-control, Content-language, Content-type, and Expires.

The evolution of HTTP

The earliest version of the HTTP/0.9 protocol was introduced in 1991. At the time, clients could only send GET requests and servers could only return hypertext content constructed by <HTML> tags. After the server responds, it closes the TCP connection immediately.

version time content
HTTP / 0.9 In 1991, Only simple requests can be made to the server using the GET method.
HTTP / 1.0 In 1996, Added Status Code and Header.
HTTP / 1.1 In 1997, Persistent connection, new pipe mechanism, block transfer files.
HTTP/2 In 2015, Multiplexing, server push, header compression, binary protocol, etc.
HTTP/3 In 2018, Based on UDP, quick handshake, upgraded multiplexing, optimized congestion control, etc.

HTTP / 1.0

True “hypertext” — that is, support for any mime-type file — was implemented in 1996 with HTTP/1.0. In addition, the new POST and GET commands greatly enrich the way the client and server interact, that is, the client can also submit data through the form.

After the implementation of HTTP/1.0 packets in 1996, request packets and response packets also carry header information describing the request/response. For example, the client needs to indicate its identity as user-agent and the response type allowed to be accepted. In response packets, status codes are used to indicate the response status, Server Type, content-type of the returned entity, and so on.

limitations

The limitation of HTTP/1.0 is that TCP is still limited to one request-response at a time. If the client wants to request other resources, it must create a new connection. TCP connections are not cheap, in addition to the 1.5 RTT time required to create a new connection and the slow start mechanism of congestion control, which leads to slow transmission rates at the beginning.

The idea of shutting down HTTP network connections after a “question and answer” is considered a bit of a waste of hard-won TCP connections. It was suggested at the time to use a Connection:keep-alive to keep the Connection alive for a short period of time until either the browser or the server actively closes the Connection.

HTTP / 1.1

Less than six months after the release of HTTP/1.0, version 1.1 followed, and for quite some time it was the most common version, because in addition to adding five new request methods, such as PUT and OPTIONS, HTTP/1.1 had the following major updates:

Persistent connection

It is important to emphasize here that persistent connections do not mean persistent connections. HTML5 developed a separate technology to maintain persistent connections, the WebSocket protocol. This protocol currently builds connections over the HTTP protocol, which we’ll cover in a later chapter.

HTTP/1.1 by default TCP does not close immediately, and a persistent Connection can transmit multiple requests and responses without declaring Connection:keep-alive. Until the message is sent, one party actively closes the Connection by sending Connection: CLOSE. Browsers allow up to six simultaneous persistent connections for the same domain name. A common example would be a supermarket with six cashiers for customers to check out, and customers could wait in line at any of them.

Pipeline Mechanism

A client can send multiple requests within the same TCP connection. The client sends the requests sequentially and waits for the responses: Request1, Response1, Request2, and Response2,…….

Content-Length

Since a TCP connection supports multiple response requests, each time the server sends a response, an additional content-length header is added to indicate that the response has ended and that the next data sent is the response to the next request.

Chunked

Using the content-Length field means that the server will not know the Length of the response until it has processed and analyzed the response data once. ** For time-consuming operations, the server must complete all operations before sending data, which is not very efficient.

Therefore, HTTP/1.1 also specifies that chunked transfer encoding can be used. If an HTTP header contains transfer-encoding :chunked, it means that it is part of a full entity block. Each non-empty data block is preceded by a hexadecimal number indicating the length of the block. Finally, sending a data block of size 0 means “this batch of goods has been sent”.

Head block problem

Although HTTP/1.1 adopts a piped connection, note that this is serial. All requests are processed in a queue according to FIFO rules. If a request at the head of the queue is processed very slowly, all subsequent requests will be backlog in the queue waiting. This phenomenon is called head-of-line Blocking.

For quite some time, queue head blocking was a problem in the optimization part of the HTTP protocol, and it was not completely solved until HTTP/2.

Repeated header information

After HTTP/1.0, almost every request was followed by a large number of ASCII-encoded headers, or a large number of Cookie contents. If most of the resources for a page come from the same server, then the large number of repeated headers in a large number of requests will put more burden on an already crowded network.

The birth of HTTP / 2

Moving from HTTP/1.1 to HTTP/2.0 is another big step forward. HTTP/2 has the following updates:

Binary protocol

In HTTP/2, headers and bodies are wrapped in binary-based frame units, respectively.

Note: The TLS layer is related to the HTTPS protocol, which we’ll cover later.

Header compression

In our introduction to THE HTTP/1.1 protocol, we talked about the problem of repetitive headers. We had to do this because the HTTP protocol itself is memoryless, which is why we use cookies and sessions to maintain the memory of HTTP connections.

How is HTTP/2 optimized? First, a large amount of header information is gZIP or compress compressed. In addition, both the browser and the server maintain a header table. An index stores a header collection. In this way, the header information of both parties can be solved by simply telling each other the index number.

Server push

HTTP/2 allows the server to push additional resources to the client in response to the corresponding request. This action is called push.

Here’s an analogy: A customer buys a computer at a store. Normally, businesses just need to ship the computers on a step-by-step basis. However, considering that the user may need other services in the future, the merchant sent the contact address of the repair site and the parts store to the user. In this way, users do not need to contact the store when there is a need. This feature is good from a starting point, because server push improves TCP connection utilization and makes it easier for clients.

❓ But this description is somewhat ambiguous compared to the other features – what kind of data counts as extra data, and how can the server be sure that the client will need it?

This feature is somewhat controversial. It breaks with the traditional “ask and answer” model of HTTP, allowing the server to push extra data when the client doesn’t clearly need it. There is no guarantee that the practice will not be abused (as in the annoying real-life admass). At the same time, the server may ignore the cache of the customer and insist on repeatedly pushing, resulting in a waste of bandwidth.

multiplexing

First, you need to understand the four layers in the HTTP/2 protocol:

  1. Frame: is the smallest unit of data, whether it carries a transmission HTTP header or an entity.
  2. Message: A complete request/response composed of multiple frames.
  3. Stream: Each request and response frame is grouped with a common ID, and each group is logically called a stream. For example, if a client requests three files a.HTML, B.C. SS, and C. js to the same server at the same time, the request and response for each file are carried out in three streams respectively, and the streams do not interfere with each other.
  4. Connection: Indicates a TCP Connection. Most HTTP connections are short and emergent. As a rule of thumb, the more data we send at once, the more efficient the TCP connection becomes. Otherwise it is quite the suspicion of killing a chicken with a knife.

So what is multiplexing? The common explanation is that multiple pairs of requests/responses to the same target server become separate streams, all merged into a SINGLE TCP connection. Because each frame is identified by a stream, both sides can send frame data out of order at the HTTP level, and then each assembles the response/request from the same stream through flags.

In HTTP/1.1, if the response to a request is time-consuming, the entire subsequent request is blocked. The HTTP/2 protocol ameliorates this problem by parallelizing requests/responses at the HTTP level, thus greatly improving transport efficiency. As for how fast? Consider the following example.

X vs HTTP/2

You can experience the difference between HTTP/1.1 and HTTP/2.0 at Akamai’s website, which asks the server for more than 350 scattered images to make a “earth” based on HTTP/1.1 and HTTP/2, respectively:

Open Chrome debug mode to view the requested Waterfall flow :(Waterfall)

All six connections that HTTP/1.1 allowed to be open had queue head blocking because the pages were requesting too many resources at once. We clicked on one of the requests, as shown in the figure below, and found that each request had been Stalled most of the time but had had little time to actually set up a connection and download the resource.

Under the HTTP2.0 protocol, almost all images are loaded at the same time because of the introduction of multiplexing mechanism. This assumes, of course, that the TCP connection to the server side is overall stable.

🌎 If you want to learn more about Chrome browser Waterfall content, see the following article:

  • Front-end performance of Chrome Waterfall
  • Analyze Chrome’s waterfall flow

🌎 For more details on HTTP/2, see the following article:

  • HTTP/2 is very fast

Does HTTP/2 really solve queue header blocking?

It’s not completely solved. Because from the point of view of the TCP send window, no matter which frames of the stream, they still need to be sorted in the send window to be sent. If packet loss occurs, the entire sending window waits for the packet loss to be retransmitted until all the lost packets are recovered.

In addition, the previous TCP congestion control: slow start, fast recovery, etc., causes THE TCP protocol to perform poorly in the weak network environment, and directly affects the efficiency of upper-layer HTTP communication, even if they are using HTTP/2.0 protocol.

The TCP stack is usually implemented by the operating system, in other words it is ingrained into the Linux, Windows kernel, or other mobile device operating system. Modifying THE TCP protocol is a huge project and requires a large number of users to upgrade. Therefore, it is not practical to make drastic changes to TCP. Because of this, later Google, IETF and other teams and organizations to build the next generation of HTTP hope on UDP protocol, QUIC was born in this way.

HTTPS

The HTTP transmission mode is HTTP->TCP. The HTTPS transmission mode is HTTP->TSL->TCP.

HTTPS uses the asymmetric key exchange protocol to generate symmetric encryption keys that are known only to both communication parties over insecure data channels.

HTTPS interaction can be divided into two phases:

  1. The client authenticates its identity based on asymmetric encryption and exchanges random numbers with each other.
  2. After the authentication is successful, the two parties generate symmetric keys based on the exchanged random numbers using the encryption algorithm agreed in advance. The following communication between the two parties is using symmetric encryption.

The reason for this is that the asymmetric method has high security, but requires a large amount of computation, and the efficiency of data interaction is far less than that of the symmetric encryption method. Therefore, both parties use symmetric encryption to communicate after the TLS handshake.

On the Internet, secure network communication requires two things:

  1. The identity of the other party must be trusted in the first place, otherwise the encrypted communication itself is meaningless.
  2. Ensure that the key data used to generate the key is not leaked.

In order to successfully introduce HTTPS to the workflow, it’s worth talking a little bit about the basics of encryption, especially asymmetric encryption.

Asymmetric encryption

How to prove “you” is “you”? To prove to everyone that only you can do something, “you” is “you”.

HTTPS requires asymmetric encryption to achieve security. Asymmetric encryption dates back to the invention of locks:

The lock was outside the door and anyone could see it, but the key was only in the owner’s hand.

In a computer, ** has such a pair of keys, one of which is common, we call it the public key; One of them is owned by only one person, and we call it a private key. ** They have the following properties:

  • If the message is encrypted by the public key, only the corresponding private key can unlock it. – (1)
  • If a message is encrypted by a private key, only the corresponding public key can unlock it. – (2)

Of course, there is a difference between the concept of public and private keys and real locks — no lock has two cores. Note that “asymmetric” here means “the private key cannot be inferred from the public key” (which I will explain later), not “the private key is encrypted and the public key cannot be decrypted”.

In HTTPS protocol, RSA algorithm and ECC algorithm are commonly used asymmetric encryption technology.

So let’s say we have two characters, A and B. A has A pair of public and private keys, called public key A and private key A. B has a pair of public and private keys, called public Key B and private Key B. Based on these two features, we can have two uses:

Public key encryption, private key decryption

If user A wants to send A private message to user B, user A uses public key B to encrypt the private message using feature (1). In this case, since only B holds private key B, the message is meaningful only to B.

Private key signature, public key verification

If A wants to sign A contract with B, B requires proof that A has agreed to the above terms and signed it, and guarantees that the contract will not be tampered with by A in the future (non-repudiation), THEN A will use feature (2). Because only A has private key A, this private key represents the identity of A.

If A message can be decrypted using public key A, we can infer from property (2) that it must be encrypted by A, because only A has A private key A.

We can also see that when using property (2), there is no guarantee of the confidentiality of the message itself, because anyone can get public key A to decrypt it. It is used to prove that the message was actually sent by A, not to guarantee the confidentiality of the message. At this point, the encryption/decryption is semantically called signing/checking.

If the confidentiality of message needs to be guaranteed, A will naturally think of using property (1) and using the public key of the other party for encryption and transmission, so property (1) and property (2) are not contradictory.

The actual signature/verification, or in other words, before A and B sign A contract, in addition to proving that A himself agrees and signs, but also to ensure that A can not later tamper with (or refuse to admit) the message sent at the time, that is, we want to guarantee the non-repudiation of the contract.

This process is described in pseudocode. Assume that A and B have agreed on A certain digest algorithm Hash(·) in advance, the contract content is contract, and the result obtained by the digest algorithm is called digest:

Hash(contract) -> digest; Signature (private_key_A, digest) -> signature; Encrypted_data (contract,signature)) -> encrypted_data; Since B used his private key for decryption, this process also proves B's own identity. Decrypt (private_key_B,encrypted_data) -> (contract,signature); # B receives the message from A and passes the verification, then it means that IT is signed by A himself. Verify the signature (public_key_A,signature) -> Digest # B computs the digest' of the contract itself using the Hash algorithm negotiated with A and compares it with the digest' claimed by A. Hash(contract) ->digest' if(digest'==digest) -> proves that the contract is not tampered with.Copy the code

The algorithm guarantees the imtamability (non-repudiation) of files.

Which comes first, the public key or the private key?

Here’s an important property about hash functions:

  • Hiding: The calculation of the Hash function is unidirectional. Let H(·) be the Hash function, and H(x) = y. This property states that H of x is equal to y with only one verification; But the other way around, we only know y, and there’s no good way to figure out which x prime would make H of x prime equal y.

So based on that, we pick x as the private key and y as the public key. Public key Y needs to be made public, and the Hiding feature of the hash function guarantees that any attempt to derive private key X from public key Y is not viable, except by using extremely inefficient brute force enumeration.

Therefore, the user first sets private key X (either the user’s date of birth or a string randomly generated by the program), and then generates a public key Y based on this private key X, which is put into the SSL certificate. The private key X is kept in the user’s hands and is used only for signing or decrypting ciphertext. It is not shown anywhere else, let alone written into the SSL certificate.

If the private key is disclosed (or is likely to be), then both pairs should be invalidated. For security reasons, the SSL certificate that holds the public key has a valid date to force users to periodically change their public and private keys.

CA Certification authority – Protects against MITM attacks

During HTTPS encryption, there is a Certificate Authority (CA) that specializes in keeping public keys. Isn’t it enough to just use asymmetric encryption? I just mentioned one point: the identity of the other party must be trusted, otherwise the encrypted communication itself is meaningless.

Let’s set aside the CA for the moment and list the asymmetric encrypted communication process in the ideal state. Note that the following is not an HTTPS interaction, just to illustrate the significance of the EXISTENCE of CA agencies.

  1. The Client first asks the Server for its public key.
  2. The Server provides its public key to the Client.
  3. The Client uses the public key provided for encryption and sends it to the Server.
  4. The Server receives the encrypted message and decrypts it using its private key.

If the first and second steps do not happen, and the Client is guaranteed to get the public key of the corresponding Server, then there is no problem with the information interaction.

But there is a fatal flaw: if the Server’s public key is intercepted by a middleman in the second step, the encrypted message sent by the Client is invalid to the Server. The point is that after intercepting the Client’s encrypted information, the middleman can use his private key to retrieve the private information sent by the Client. This type of attack is called a “man-in-the-middle” attack.

Therefore, in the whole HTTPS transmission process, we also need an authoritative CA to act as a notary, instead of verifying the one-to-one correspondence between a public key and a server.

The CA authority also needs to have certificates for self-certification authority, including the CA public key. The certificates of all the world’s leading CAS are built into every major browser.

SSL certificate

The Server submits its public key, organization information, personal information and domain name information to the issuer. The issuer will verify the authenticity of the information through online contact or offline investigation.

After the audit is passed, the issuer issues the authentication document to the Server, which is what we call the certificate. It contains the following information: Certificate validity time, certificate serial number, and domain Name (DNS Name) of the issuer. ), the Server’s public key, the Server’s other organization information, the certificate summary information, and the issuer’s private key signature.

Some certificates have no issuer, so the certificate is a self-signed certificate. Only the CA authority trusted by the browser has the right to issue a self-signed certificate. Otherwise, the self-signed certificate is illegal and cannot be recognized by the browser.

HTTPS encryption is done by TLS, which we’ll talk about later. The predecessor of TLS was the SSL standard technology, so we call this certificate an SSL certificate. In addition, some people are used to also according to the choice of encryption method, called SSL certificate called RSA certificate, ECC certificate, DSA certificate and so on. If not specified, the following certificates refer to SSL certificates in broad sense.

The certificate chain

In the process of issuing a certificate, the Issuer may not be the CA that is directly authorized by the browser. The Issuer may also be an intermediary B. How does the intermediary prove its authority? Intermediary B may claim that it is certified by intermediary A……

Simply put, the issuer may go through many levels. So the verification process, to the entire chain involved in the organization for authentication, this chain is the certificate chain. The issuer at root must be a trusted CA authority for the entire certificate chain to be legitimate.

The server needs to provide a full certificate chain:

  1. According to the DN of the issuer of the certificate, the certificate is requested from the issuer of the next level and put into the certificate chain.
  2. If the CA has an issuer DN inside the certificate, the first step is recursed, otherwise the self-signed certificate is returned, and the recursion ends when the certificate is returned.

The browser first obtains the certificate chain from the Server, starting the cycle with the “lowest level” certificate:

  1. Attempts to extract the public key from the certificate of the previous level to verify the signature of the certificate of the current level. If the certificate chain passes the check, the certificate chain of the previous level is checked. Otherwise, the certificate chain is invalid and exits the check.

    If there is no certificate of a higher level, the certificate should be a self-signed certificate of the root CA. For the root CA recognized by the browser, the public key of the root CA is returned (the root certificate of the legal CA built in the browser and the validity of the public key is unconditionally trusted). If the root CA is not approved or the verification fails, the certificate chain is invalid and exits.

  2. If the certificate chain still has certificates waiting to be checked, continue the cycle.

To make a long story short, the server keeps getting certificates and returning them, and the browser verifies signature layer by layer when it gets them.

🌎 The “signer” for this information comes from this link.

The SSL and TLS

HTTP focuses only on message transmission, authentication, and random key generation, which is done by the COMPUTER’s TLS layer.

The HTTPS protocol was originally encrypted using Secure Socket Layer (SSL) ** built by Netscape. As the middle layer between HTTP and TCP, SSL encrypts all past HTTP packets.

The Internet Standardization Organization (IETF) believes that SSL can be used for a wider range of purposes than just HTTP. Therefore, THE IETF added an Application Protocol field on the basis of SSL, hoping that SSL can be used in more occasions. Subsequently, SSL developed into the current Transport Layer Secure (TLS) Protocol.

TLS can be understood as an updated version of the SSL protocol, and some people refer to them collectively as TLS/SSL.

TLSv1.0 is based on SSLv3.1, and the latest version is TLSv1.3. If the Wireshark is used to capture packets, you can determine that the PACKET is an encrypted HTTP packet based on the port number and protocol type tlsv1. x.

In TLSv1.2, RSA key negotiation mechanism was used in the early stage, and then ECDH key negotiation mechanism based on ECC elliptic curve algorithm was transferred, because RSA key negotiation mechanism exposed some risks. I will cover all of these later in the article.

Of course, TLSv1.2 actually supports some mixed policies, such as using ECDH for key negotiation, but using RSA for signing (mainly for computational efficiency) — depending on how the server chooses, these will be reflected in its certificates. Here, we will not discuss that complicated case, and only discuss pure RSA key negotiation and ECDH key negotiation.

The two parties also need to communicate during the handshake establishment. If the client does not support the ECC certificate, the two parties still use THE RSA key negotiation mode.

Although TLSv1.3 has been created, the reality seems to be that TLSv1.2 is still the dominant version.

Aside: Distinguish BETWEEN SSL and SSH

SSH (Secure Shell), what is Shell? Shell in Linux, the user – facing command interface.

SSH, or Secure Transfer of Shell commands, is designed to provide a more secure Telnet service.

SSH provides two authentication modes:

  1. Password authentication, that is, the common setting of user and password, similar to QQ number and QQ password.
  2. Public and private key authentication.

If you manually log in to the remote server for maintenance, use the first method to set the user and password, and adjust the user rights based on the responsibilities of the maintenance personnel. When we use the Xshell tool, we often log in this way.

In the configuration of Hadoop cluster, the author uses the second method to allow nodes to authenticate each other (using software for batch authentication), so as to avoid manual password verification, so that each node can keep smooth contact.

SSH is designed specifically for the Shell and is supported only by SSH clients and SSH servers. SSL is a widely accepted Web security protocol that was later upgraded to the more general TLS protocol.

At least CentOS distributions (the ones I use frequently) have OpenSSH installed by default and the default port number is 22. SSH also provides a file transfer service that is nearly the same as FTP but more secure, namely SFTP.

🌎 For additional details on SSH, check out this article. Let’s continue to focus on the TLS authentication part of the HTTPS protocol.

TLSv1.2 Unidirectional authentication

I have purposely emphasized one-way authentication here: the client checks the identity of the server, but the server does not restrict the identity of the visiting client.

If bidirectional authentication is enabled on the server, the client sends its certificate to the server for authentication during the handshake. The certificate is usually used by financial service institutions that require high confidentiality to provide services for authentication clients. Only one-way validation is covered here.

Overall, HTTPS = TCP + TLS. Prior to TLSv1.2, connections required a four-way handshake (whether one-way or two-way authentication). So the round trip delay of HTTPS communication = 1.5rTT (TCP handshake) + 2RTT (TLS handshake) + 1RTT (HTTP one request response) = 4.5rtt.

The HTTPS protocol is a large delay user. We will first introduce the one-way verification process based on RSA, and then extend the verification process based on ECDH.

  1. The Client sends a Client Hello message to the server. The delay takes 0.5 RTT. These include:

    • Carries the protocol version supported by the client, encryption algorithm, compression algorithm, and the random number “client_random” generated by the client.
  2. The Server then sends a Server Hello message with a delay of 0.5 RTT. These include:

    • Specific protocol version, encryption method, session ID, random number “server_random” generated by the server;
    • Certificate, that is, the certificate chain of the server, which contains the domain name, issuer, and validity period supported by the certificate.
    • Server Key ExchangeThis is an extended field that depends on the encryption algorithm used by both the client and the server. In ECDH key exchange, the server puts its own random number Q1 into it. (The “Q1” parameter is described later.)
    • Certificate RequestTo verify the client certificate (if bidirectional authentication is used).
    • Server Hello Done, notifies the server that all relevant information has been sent;
  3. If the client (browser) verifies that the server certificate issued by the CA is reliable, it sends a Change Cipher Spec message to indicate that the server uses a key to communicate with each other. The delay is 0.5 RTT. These include:

    • Client Key ExchangeIn the RSA – based key exchange mode, this field containsA random string encrypted with the server public key — Pre Master Secret. In ECDH key exchange, the client will put its calculated parameter Q2 into it. (The “Q2” parameter is described later.)
    • Finished, which contains encrypted handshake information;
  4. The server sends a Change Cipher Spec message to the client, informing the client that the data segment will be encrypted and transmitted using a key. The delay is 0.5 RTT. The message includes:

    • FinishedThe Finished message is verified and the TLS handshake is completed.

Three important random numbers based on RSA key exchange

Three random numbers are mentioned in the above process:

  1. Client_random: random number generated by the Client in a Client Hello message.
  2. Server_random: Random number generated by the Server that appears in the Server Hello message.
  3. Pre Master Secret: Random number sent by the client during the third handshake.

The following is the key negotiation process based on RSA:

During the first two handshakes, the two parties exchanged their generated random numbers with each other via Hello messages. In the third handshake, the client generates a second random number, called Pre Master Secret, encrypted with the server’s public key and sent.

Why do you need Pre Master Secret? Before and after THE TLSv1.2 protocol, the handshake process is not encrypted. Therefore, the middle man may intercept both Client Random and Server Random, resulting in key leakage. The client needs to reproduce the Pre Master Key and encrypt it with the public Key. On the one hand, it generates a Key securely and on the other hand, it can check whether the other party holds the corresponding private Key.

In this way, both parties will eventually have the same Client Random, Server Random and Pre Master Secret. The two parties calculate the Session key based on the three parameters through the unified algorithm agreed in advance. Since both parties have the same three random numbers and the same algorithm, they are guaranteed to end up with the same Session Key. Finally, the Session Key is used as a symmetric Key to encrypt the communication.

The symmetric key is always generated based on the random numbers of Client random and Server Random of both parties, because the SSL protocol does not trust that each host can generate completely random numbers by default. If only the pseudo-random numbers generated by one party are used to generate the secret key, it is easy to crack.

Now, whether or not the two parties can establish a secure connection depends essentially on whether the Pre Master Key can be securely transmitted. Let’s consider the worst-case scenario: someone still guesses the server private key through the server’s certificate public key, which means that the session is no longer secure — or even all of the previous sessions are insecure: This is because the Pre Master Key used for the previous connection between the server and client is always passed based on the server public Key (which has been leaked).

For the middleman: “You can fend me off many times, but I only need to succeed once to get all your data.”

Therefore, RSA – based key exchange does not have forward security. TLSv1.2 introduced another key exchange mode based on ECC, ECDH, and even after TLSv1.3, the key exchange mode based on RSA was completely abandoned.

Here are some useful references.

  • Trust begins with a handshake — TLS 1.2 connection procedure
  • The SSL and TLS – TLS 1.2

Talk about the principle of ECDH

ECDH is based on the ECC algorithm and diffie-Hellman key exchange protocol. DH enables both parties to achieve a shared key through an insecure channel without the other party’s (private) information, and utilizes discrete mathematical problem DLP to ensure security.

The ECC algorithm is responsible for using elliptic curves (which are not elliptic curve equations learned in high school) to create a DLP problem, just as the RSA algorithm creates a large number decomposition problem. ECDH is an implementation of DH.

The author briefly describes the ECDH process with an explanation from Daniel and a reference to Wikipedia to simplify the original modular operations. Given a formula:

Q is equal to k times P, let’s say that P is known, or that P can be disclosed.

  1. Given a k, it’s easy to figure out Q. 1 – nature
  2. Given another point P, it’s hard to figure out k. Property 2 (it is essentially a DLP problem).

If P is known in advance, the two parties use G = a * P * b as the key for interaction. Where A and B are the random numbers generated by the communication parties respectively, which are equivalent to k in the formula.

How can two parties exchange random numbers in a secure way? First, based on property 1:

  • Assuming the client generates a random number, it sends Q1 = a * P to the server.
  • Assuming the random number generated by the server is B, it sends Q2 = b * P to the client.

After swapping Q1 over Q2, both sides can calculate G at the same time by using their own locally generated random number B over A. Similar to the commutative law:

Q2 * a = (b * P) * a = (server) Q1 * b = (a * P) * b = (a * P) * b = (a * P * b) = G

According to property 2, even if he had captured these two pieces of information, G1 and G2, it would have been difficult for him to derive the internal numbers a and B, so G would have been unknown.

In the process of negotiation, the client and the server do not know what each other’s generated random numbers A and b are — but they only need to perform modular operation between the exchanged Q1 (Q2) and their own generated random number B (a) to obtain the same key. This just confirms the characteristics of DH negotiation described above: neither knows the other’s secret, but generates the same key.

TLSv1.3 integrates the key negotiation process in one RTT based on ECDH. Each party only needs to send a Hello message once, and a secure communication is established: the delay is reduced by half compared to TLSv1.2’s 2 RTT.

Parameter description ECDH Key exchange parameters

Unlike the previous RSA key exchange, there are two random numbers and two ECDH parameters (Q1 and Q2) involved. The difference between Client Random and Server Random is that the Client will not directly share Pre Master Secret with the Server through public key encryption, but generate it through ECDH.

In other words, the private Key of the server will no longer participate in the TLS encryption handshake process. In addition, the pre-master Key transfer process will be replaced by the ECDH parameter exchange process.

The Server Hello packet sent by the Server to the client carries an additional Server Key Exchange field, which is the parameter Q2 used for ECDH Key Exchange. In addition, the client also signs this parameter, indicating that it is sent by the client. Similarly, in the third handshake, the Client Key Exchange field sent by the server is no longer Pre Master Secret, but Q1.

ECDH and ECDHE

ECDH, compared with ECDHE, lacks an E, “extemporaneous.” ECDHE if the server chooses to regenerate the parameter Q2 each time. ECDH if the server chooses to put it in an ECC certificate and chooses to use the same parameter on any request for a connection (primarily to reduce computation on the server side).

However, during the ECDH-based TLS handshake, the client’s Q1 is always regenerated, so the third random number “Pre Master Key” in ECDH (E) is different every time.

Even if the middleman gets away with cracking Q1 and Q2 for one session, he will have to crack the ECDH parameters again if he is greedy enough to crack the other sessions. Unlike previous RSA Key exchanges, the Pre Master Key, which relied on public Key encryption in the past, would all be leaked once the server private Key was compromised.

Determine a secure connection in the browser

The site will be deemed unsafe under three conditions:

  1. The website itself does not support HTTPS. Users cannot access such sites using the HTTPS protocol.
  2. The site itself supports HTTPS, but there are some internal links (src.aThe resource is still requested based on the HTTP protocol.
  3. The security certificate used by the website for HTTPS connections is not issued by the CA authority recognized by the browser.

How to intuitively determine if a site is safe? The browser labels insecure websites with Not Secure or insecure labels. In the latter two cases, you will be warned that you are unsafe even if you manually type https://access in your browser.

When checking a web site, you can turn on debugging mode on F12 to see if there are links to the HTTP protocol. If so, the site must not be secure.

Intuitive comparison between HTTP and HTTPS

Wireshark is used to track HTTP and HTTPS data packets. We trace the TCP stream of an HTTP packet in red on the client and blue on the server:

The image on the left is a normal HTTP packet, and by analyzing its TCP stream we can see the familiar header content and the image binary data without any processing.

On the right is an HTTP packet encrypted by THE TLS layer. Even if we intercept the packet, we will not get any useful information.

At the end of the article, we briefly demonstrate how to build HTTPS services based on Spring.

TLSv1.3 new features

TLSv1.3 was released in 2018. Compared with the previous VERSION of TLSv1.2, it is more efficient and secure in general, follows the principle of “Avenue simplification”, simplifies the handshake process, and abandoned the previous version of the algorithm that has security loopholes, including CBC, RC4 algorithms, and based on ECDH to complete the key negotiation in one RTT.

1-RTT establishment process

Once you’ve seen ECDH key exchange in the previous section, it’s easy to understand the 1-RTT connection setup process in TLSv1.3.

  1. The Client sends a Client Hello message, which costs 0.5 RTT. The message includes:

    • Key_share: list of DH key exchange parameters (the previously mentioned client-generated ECDH parameter Q1 and the exposed parameter P).
    • Protocol version and encryption suite supported by the client.
  2. The Server reply to the Server Hello message costs 0.5 RTT. The message includes:

    • Key_share: ECDH temporary public key generated using the client-provided parameter (P) (equivalent to Q2 mentioned above)
    • Finish: indicates that the server is ready to establish a connection.
    • The selected encryption suite, including the signature algorithm;
    • Server certificate chain;
    • Sign the private key for the handshake message by saying, “Not only is my certificate legitimate, but I am personally sending you this message,” to reassure clients about man-in-the-middle attacks.

    In addition, the server calculates the shared key of the session based on the Key_share sent by the client. (equal to G)

  3. After receiving the Server Hello message, the client uses the certificate public key to sign and verify the message. After the authentication is passed, the ECDH temporary public Key (Q2) of the server is obtained using the Key Share sent by the other party, and the shared Key (G) required by the session is also generated.

  4. Both parties use the generated shared key to encrypt and transmit messages to ensure message security. When the client sends an HTTP request for the first time, Finished is sent to indicate that the TLSv1.3 handshake is established. (Application data has been sent at this point, so RTT delay is not counted)

TLSv1.3 integrates the key negotiation process in one RTT based on ECDH. Each party only needs to send a Hello message once, and a secure communication is established: the delay is reduced by half compared to TLSv1.2’s 2 RTT.

0-RTT connection reconstruction based on pre-shared Key

Pre-shared Key (PSK for short) : pre-shared Key. As the name implies, the key used by both parties in the last connection is reused in the next connection, eliminating the authentication process in the previous 1-RTT handshake. The author has limited knowledge and energy at present, so he only elaborates the macroscopic process and does not go into the details.

When the two parties connect for the first time, the server can send the PSK identifier of the exported shared key to the client at any time.

When the client establishes a connection to the server again, it will wear the PSK identifier and negotiate with the server to use the PSK. In addition, the Client uses the early_data extension in the Client Hello message to carry application data such as HTTP requests. Because valid data is sent in the first handshake, it is called 0-RTT.

The following two scenarios degrade a TLS connection from 0-RTT to 1-RTT:

  1. The PSK identifier used by the client has expired.
  2. The server is not intended to process advance messages sent by the clientearly_dataInformation.

The 0-RTT connection comes at a price for extreme performance: 0-RTT handshakes don’t have forward security, messages can be replayed, and so on. Therefore, by default, the server will not actively adopt the 0-RTT connection policy.

QUIC => HTTP/3

QUIC (pronounced Quick, full name: Quick UDP Internet Connections), originally developed by the Google team, runs on UDP as a whole and provides a layer of HTTP/2.0 interfaces on top. After years of verification, this protocol was adopted as the standard protocol by the IETF. In collaboration with many organizations and individuals, the IETF has made so many refinements and improvements to the original Google QUIC that it can be called an entirely new protocol. Google’s version of QUIC is also known as gQUIC to show the distinction.

In 2018, the IETF officially announced the renaming of its HTTP over QUIC protocol to HTTP/3.

Google was instrumental in the creation of HTTP/3. The Google team has announced that it will gradually incorporate the IETF specification into its own version of the protocol to implement the same specification.

The QUIC layer replaces TCP with UDP, and the upper layer only needs a layer of HTTP/2 API for interacting with remote servers. This is because the QUIC protocol already includes multiplexing and connection management, and the HTTP/2 API only needs to parse the HTTP/2 protocol.

QUIC is designed to be transmitted at the top of UDP datagrams for “pluggable” deployment. As a result, browsers will be able to update more efficient protocols on the fly and get into applications without having to wait for (often costly) updates at the operating system level.

When QUIC was tested in the lab, the TLSv1.3 protocol was not yet available, so QUIC’s security was implemented by itself. The authors of QUIC protocol have made it clear that the future QUIC protocol will be secured by TLSv1.3 protocol.

The following are the major features of the QUIC protocol:

  • 0-RTT Establishes a connection
  • multiplexing
  • Forward error correction
  • Connect the migration

In addition, QUIC protocol has made many improvements, such as the optimization of TCP congestion control, the use of strictly increasing packet sequence numbers to eliminate the previously mentioned retransmission ambiguity, and a new Offset variable to ensure the internal data order, and so on.

Quickly establish connections in 0 to 1 RTT

We have intentionally mentioned the delay RTT of each protocol in our previous introduction to the protocols. If the time required for DNS resolution is taken into account, one more RTT delay is required:

  1. TCP = 1.5 RTT (also thought to be 1 RTT)
  2. HTTP = 1.5 RTT + 1 RTT = 2.5 RTT
  3. HTTPS = HTTP + TLSv1.2 = 2.5rTT + 2 RTT = 4.5rTT
  4. TLSv1.3 + TCP = 0 + 1.5RTT = 1.5rTT (0-RTT connection)

In QUIC, only one RTT is needed to connect to a new server because the QUIC protocol exchanges keys through the Diffie-Hellman algorithm, which is similar to the KEY_share exchange of TLSv1.3. The idea of the DH algorithm has been introduced before, and I won’t go into it here. In addition, QUIC inherently saves the cost of TCP’s 1.5 RTT latency because it is based on UDP.

On the first connection, the Server sends a Server Reject message (similar to Server Hello in TLS) to the client, and the internal Server Config records information about the Server. On the next connection, the client directly calculates the key information with the server based on the server config. Therefore, in theory, the subsequent connection between the client and the server has no RTT delay, and the client can send data directly, and the function is equivalent to TLS + TCP.

Connect the migration

A TCP connection is uniquely identified by a quadruple: source IP address, destination IP address, source port, and destination port.

When any of these quadruples changes, it is considered a brand new join. The source port and THE source IP address are the same, for example, www.baidu.com:443.

The source IP address and source port of the client often change with the network environment. For example, when our phones switch between WIFI and 4G, the source IP address is bound to change because our gateway is switched from the router to the base station.

Or, when the NAT egress is common, the source port needs to be rebound due to connection contention. The TCP connection needs to be rebuilt regardless of which of the four tuples changes.

QUIC takes a connection to a higher level of abstraction: no longer does any QUIC rely on a qutuple, but instead describes a connection with a 64-bit ID number. In this way, even if the IP or port of the client changes, the connection will not be interrupted as long as the ID number remains the same.

At the same time, the 64-bit ID number ensures that the likelihood of collisions is minimal. The process of connection migration is transparent to the upper-level business logic.

The head of the team blocked Again

The improved version of multiplexing in QUIC deals with this problem more thoroughly. In HTTP/2.0, no matter how many streams are created in a Connection, because they share the same TCP sending/receiving window, the relationship between them can be simply described as:? Stream1 + Stream2 +\ … \ + StreamN = TCP\ Window\ Size ? In QUIC, each Stream has its own separate send/receive window. The size of a Connection’s send/receive window is the sum of the size of the Stream’s send/receive window:? QUIC\ Window\ Size = Stream1 + Stream2 + … + StreamN ? While at first glance it may seem like an exchange of the two sides, the meaning is quite different: in HTTP/2.0, even if one Stream is blocked, it affects the remaining N-1 streams. In QUIC, when one Stream blocks, it does not affect the other N-1 streams, because each Stream has its own independent window size.

In addition, in the QUIC protocol, within a Stream, the movement of the sliding window depends on the maximum byte offset received, regardless of whether part of the byte sequence is missing. This is different from TCP: TCP’s receive window does not move back until the previous byte sequence is fully ordered, which is one reason TCP is prone to blocking.

Forward error correction and retransmission

The QUIC protocol treats the lost packet as follows: If the packet can be recovered actively, retransmission is not required. If you can retransmit quickly, don’t wait for the timeout retransmission. QUIC uses a forward-error Correction code (FEC) to reduce packet loss.

Each N packages counts as a set of Butch, and for each set of Butch data sent, an additional FEC package is attached. If a packet is lost in a group, it can be recovered with the remaining N-1 packets.

Important packets, such as those for initial key negotiation, are critical for QUIC connections because if the connection cannot be established, the entire data flow is blocked. For such packets, QUIC will repeatedly try to retransmit them before confirming their loss. In this way, multiple identical packets will be transmitted simultaneously in the network, and as long as one of them reaches the other side, the connection will be established.

QUIC’s challenges

Starting over is costly. While QUIC is a brave step toward embracing UDP, and has proven to be a more efficient way to connect, it has its share of problems, starting with the UDP protocol itself and compatibility with current network devices:

  1. Most network devices currently have fixed TCP and UDP policies, and currently only a few ports (such as DNS service port 53) may allow UDP data to pass through. The firewall will not easily allow UDP data through port 443 (the protocol port used by QUIC).
  2. Too many UDP packets may be mistaken for a Denial of Service (DOS) attack.
  3. In the classic HTTP technology stack, there are responsive debugging tools at each layer, such as ping and Traceroute at the IP layer, Telnet at the transport layer, curl at the application layer, and so on. QUIC is a huge change, and without the tools to debug and support it, QUIC will be hindered.
  4. QUIC retains the server push feature of HTTP/2, with a slight compromise: the client must first agree to push before the server can act accordingly. Even so, there doesn’t seem to be a clear answer to the question of how to make good use of push.

Therefore, objectively speaking, the popularization of QUIC has a long way to go.

Socket

Socket is not a programming language, nor is it a protocol. It is an API for processes to communicate with each other. It can be understood as the encapsulation of TCP/IP protocol, so that the program developers can use the functions provided by the Socket to easily implement TCP/UDP programming, without worrying about the content of the protocol itself.

Socket is translated into English as “Socket” and its official name is “Socket”. A long time ago, two people in two places to contact, must rely on the phone, the phone provides dial, answer, hang up and other logical functions, this phone is equivalent to the Socket. In addition, if you want to contact the other party, you must know the area code and phone number of the other party.

All the implementation of end-to-end communication between hosts, or all upper-layer network protocols that rely on TCP/UDP communication, is silently supported by sockets. Some chatroom software applications are also based on sockets.

For hosts, what do they use as “phone numbers” to communicate with each other? The answer is IP address + Port Port number. Before we can formally introduce sockets, we need to have a basic understanding of ports.

port

Ports are a basic and important concept in Socket programming, and even if you are not a back-end developer, you should have a basic understanding of ports. When we talk about ports, we specifically refer to logical ports of the TCP/IP protocol. There are three types of logical ports:

  • Ports 0-1023 are known. For example, port 22 is used for the SSH remote login protocol or the SFTP service. Port 20 and 21 are used for FTP transfer services. These are common services and therefore are not allowed to be used by other processes.
  • 1024-49151 Is a registered port. These allow user programs to use them. For example, the Default port for the Tomcat server is 8080; The default port of the MySQL server is 3306.
  • 49152-65535 are dynamic ports, used to randomly assign a port for network communication when needed.

A host can provide multiple services by exposing its IP address (or domain name), and external services are differentiated by port numbers. For generic services, we tend to make conventions within a fixed port range.

For example, if you want to access a page over HTTPS, you should choose port 443 instead of port 80. As the developer of the server, we should also abide by this convention. We should not arbitrarily make the Web server listen to port 81, in case the user cannot normally request the service.

Remember, when we access someone else’s port number, we need to open the dynamic port ourselves to connect. It’s simple: If you want to get into someone else’s house, you have to open your own door before you can get out.

Ports, connections, processes and applications

By understanding these issues, we can more effectively understand how subsequent sockets work later.

Q1: How many connections can be established on a port?

From practical experience, we know that a port can establish many connections. For example, thousands of clients can make requests to the Web service on the server’s port 8080, 443, or 80 at the same time. In HTTP/1.1, a client can even make six connections at the same time. Of course, the client needs to have six ports enabled. On the server side, it will consider how to process these requests concurrently.

It is incorrect to assume that a host can only establish a maximum of 65535 connections at a time. As long as the client IP or port that initiated the connection is different, it is a completely new connection.

Q2: How many processes can a port be listened on? (Stampede effect) *

A port can only be listened to by a single process, because we’ve grown accustomed to port usage.

However, a process can bind a port number and then fork its children, so that both the parent and its children accept the same port number. But this induces a stampede effect: when a parent process acquires a resource, all the children are awakened (multithreading can also cause this problem).

The author once said in the introduction of Nginx that workers obtain a new connection through competition. This way will also cause the stampede effect: it causes the system to occupy a lot of system resources in a moment, but only one Worker succeeds in the competition at last.

Nginx provides an accept_mutex lock to ensure that only one Worker is listening on the port at a time. In this way, when there is a new external connection, only one Worker will handle it. It is a controllable option and the lock is open by default. We can remove the lock manually.

The processes that are started first declare that they occupy the ports in advance. The processes that are started later cannot occupy the ports used by previous processes.

Q3: How many ports can a process listen on?

Typically, a process listens on one port: For example, Tomcat allocates thread pools for multiple servlets deployed under it, but provides only one port 8080 externally.

There are also ways to enable a process to listen to multiple ports: for example, create multiple sockets with different source port numbers within a process by creating multiple threads.

A process can indeed listen on more than one port, but it cannot listen on a port that is already in use. In addition, only processes that need network communication will use the port, which means processes can choose not to listen on the port.

Q4: How many processes can an application have?

An application can have multiple processes. In the case of IntelliJ IDEA, every time we run a Java main program, IDEA will actively open one or more processes for us to keep the program running. Therefore, it is possible for an application to have multiple ports open, or not to have ports open.

Socket triples and quintuples

TCP uses a quad to identify a unique connection: [Source ADDRESS, source port number, destination ADDRESS, destination port number]. On the basis of Socket, a protocol is added. Typically, there are two options for this protocol: TCP or UDP.

Such a quintuple is called an association. In addition, there is the concept of a triple, which is [local address, local port number, protocol]. Obviously, this triplet is half of the quintuple, so it is also called semi-correlation. In a network, a triplet uniquely identifies a specific process running on a specific host.

We usually talk about generating a Socket, is actually equivalent to generating a description of the quintuple.

Both parties need to create a Socket to read and write messages from each other. The sockets of the two parties must use the same protocol. One party cannot use TCP while the other uses UDP.

Windows users can run the netstat -an command to view the status of all running ports on the local computer.

Why do you need HTTP when you have Socket programming?

Both sockets and HTTP can be used to communicate over a network. So why do we need the upper-layer HTTP protocol?

Socket is generally used for instant messaging, or message push, such as QQ, wechat and other software can not wait for the customer to initiate a request to refresh the message. In addition, the connection established through Socket is a long connection. For mobile phones, the network condition of mobile phones is relatively complex, which requires frequent retransmission. Long connection will occupy memory on the one hand, and accelerate power consumption on the other hand. By checking the battery usage of mobile phones, the author found that QQ, wechat and other instant messaging software accounted for the most power consumption.

HTTP is used for situations with low real-time requirements, such as information sites that allow minutes of delay, and pay attention to “ready-to-read, ready-to-go”. It itself is designed to be short-connected — so short that the time it takes to actually transmit data may be far less than three handshakes and four waves.

Therefore, from the perspective of services, sockets are generally used for more peer-to-peer full-duplex communication. For example, in chat room applications, the identities of two communication parties are “peer”. In HTTP, client and server prefer a “question and answer” relationship.

We didn’t even notice the Socket when we talked about HTTP. Where does the Socket run?

The running level of a Socket

Sockets run between the application layer and the transport layer. The Socket’s job responsibilities can be extended to one of the 23 design patterns – the appearance pattern/facade pattern.

For application programmers, they no longer need to know all the details of TCP/UDP. Sockets provide programmers with some off-the-shelf logic: establish sockets, listen for ports, accept connections, send messages, disconnect resources…… To implement these functions, the programmer simply provides parameters based on the network protocol of his choice — the Socket does the rest.

Logically, sockets visualize complex TCP/UDP connections as functions, and programmers can easily design communication applications based on network protocols in the upper layer.

Socket Workflow (TCP)

The following from the Socket level to introduce the client and server to establish a connection communication process, as well as the use of the function prototype. This section only describes the part related to TCP.

Socket’s design concept comes from Unix: everything is a file, including a network connection. In Unix/Linux, when we use socket() to open a network connection, the function returns a File Descriptor.

Our read() and write() of this “network file” are abstracted messages received and sent. In the Unix/Linux environment, Socket network programming can be regarded as I/O operations on files. (Sockets aren’t files themselves, of course, but Unix provides a unified way of looking at things.)

In Windows, sockets are treated as mere network connections, not files. So message receiving and sending are replaced with recv() and send(), but functionally the same.

Creating a triple

Both the server and the client need to use the socket() method to create the socket. The function prototype is as follows:

int socket(int af, int type, int protocol)
Copy the code

The af parameter is called Address Family. Our common address families are IPv4 and IPv6. AF_INET indicates IPv4, and AF_INET6 indicates IPv6.

Parameter type indicates the data transmission mode adopted by this Socket, among which the most representative ones are SOCK_STREAM (stream-oriented mode, based on TCP) and SOCK_DGRAM (datagram oriented mode, based on UDP). These two methods inherit the characteristics of TCP and UDP respectively.

Parameter protocol: indicates the network protocol used by the Socket. Usually, when we enter the type parameter, we already indicate the potential protocol type. For example, SOCK_STREAM corresponds to TCP. If an attempt is made to create a Socket with an incompatible combination, the Socket throws a Socket Exception. Therefore, more often than not, we will choose to assign protocol = 0, which means to ignore it.

The socket() function returns an int, which is the socket descriptor I mentioned earlier, hereinafter referred to as Socketfd, used to describe a socket connection.

Two kinds of sockets

For the server, the socket created through the socket() function is used to listen to and accept client requests, also known as listening socket. It will persist for the lifetime of the server process, and only one will exist. You need to proactively call bind() to bind a fixed port number so that the outside world can access it.

One other thing to note. After bind(), the server has declared possession of the port to the system. A separate server process will no longer be able to bind its socket to the occupied port. (Take on question Q2)

int bind(int sock, struct sockaddr *addr, socklen_t addrlen)
Copy the code

Where sockADDR is a compound structure that describes the address number and port number of the selected address family AF. Socklen_t is the length of addr, usually calculated by sizeof.

For clients, the socket created through the socket() function is used to transmit messages to each other, also known as the connection socket. Instead of using bind() to pre-bind its own port number, it uses connect() to connect to the listening socket on the server, and the client’s system randomly selects an idle, random port to assign to it.

int connect(int sock, struct sockaddr *serv_addr, socklen_t addrlen); 
Copy the code

Sock is the description of the Socket file generated by the client, and serv_addr is the address family, address number, and port information on the server.

Why distinguish between these two types of sockets? The only job of a listening socket is to listen. It does not know the identity of the recipient until it listens to the request, nor does it have a substantial connection to any client. For a connection socket, we can use quintuples to describe a unique end-to-end connection.

The server process listens for the request and accepts it

After the server has called socket() and bind() to create a listening socket, it calls listen() to enter the listening state.

int listen(int sock, int backlog)
Copy the code

In the case of high concurrent access, when the server cannot immediately call Accept () to accept the incoming request, the server must temporarily suspend subsequent requests to the socket queue.

The backlog parameter is used to set the queue size, and the setting of this value depends on traffic and server performance. You can also set SOMAXCONN to let the system determine the size of the buffer.

After the server has called listen() to enter the listening state, we can call accept() to receive external connection requests:

int accept(int sock, struct sockaddr *addr, socklen_t *addrlen); 
Copy the code

The sock parameter describes the listening socket descriptor of the server, and the addr structure describes the protocol, IP address, port number, and other information of the client. The server rarely sets restrictions on client addresses and port numbers. The latter two are usually NULL. The accept() function returns a new connection socket specifically for message transmission with the client.

The simple idea is that the server uses a listening socket as the front of the service entry. Each time a new connection is connected, a new connection socket is created specifically to interact with the client message. Many sockets can flow through a port.

These sockets “reuse” the port previously bound by the server process using the bind() function. The server process uses the socket quintuple to distinguish between different clients (because the client IP address and port must be different).

Wait, don’t connect sockets and listen sockets work a bit like Nginx’s “Master and Worker” routine? Readers will be frowning, remembering the shock effect I mentioned earlier in the ports section. In the early days, the accept() function did cause this problem. However, the Linux 2.6 kernel has been optimized to address the stampede effect caused by this function. A post on Zhihu details Linux’s solution to the Accept swarm effect.

📣 If there are no more connections in the wait queue to be processed, accept() blocks the thread executing it.

Server and client to perform data data

Linux doesn’t explicitly distinguish between sockets and regular files, so we can directly “read and write from a socket” by reading () and writing (), which means receiving and sending data (formally).

The prototype for write() and read() is:

ssize_t write(int fd, const void *buf, size_t nbytes);  // Write a message to socketFD, which represents send.
ssize_t read(int fd, void *buf, size_t nbytes);  // Send a message to socketFD, indicating receive.
Copy the code

Fd is the Socketfd used to connect the local end to the peer end, buf is the address of the data buffer for sending and receiving messages, and nbytes is the number of bytes of messages. The return value is the number of bytes sent/received successfully, otherwise -1 is returned.

Each time a Socket is established, the host allocates two buffers in the user space of memory for receiving and sending messages. For write() in particular, as long as the data is written to the buffer, the function returns directly successfully because the underlying TCP protocol is responsible for the reliable transfer of the message, and the Socket does not interfere with the process.

TCP is independent of the write()/read() functions. Data may be sent immediately after being written to the cache, or after the cache has accumulated to a certain extent. In addition, network conditions need to be taken into account.

Send ()/recv() is similar to read() in principle to write().

How does the server handle concurrent sockets

If 10K clients initiate a connection through the connect() method at a given time, the server will have 10K connection sockets in the concurrent state. To understand concurrent operations on the server, we must drill down to the operating system level to find the causes. Before that, we first need to have the following basic understanding:

The first is the division of labor between CPU and I/O devices:

  • The CPU is mainly used to calculate and execute instructions. In the early days, the CPU also needed to handle I/O operations. The desire to separate the CPU from I/O operations as much as possible led to the development of various I/O control modes (see later DMA).
  • Tasks that require a lot of CPU computing are called CPU-intensive tasks, and those that require a lot of I/O operations are also called I/O intensive tasks. Highly concurrent Web applications fall into the latter category.
  • The CPU only reads and writes data and instructions from memory. Most of the data is stored in the external storage (refers to the disk and other media), when needed by I/O operations to bring the data from the external storage into memory.

The relationship between processes and threads:

  • At the process level, CPU scheduling is accomplished by the operating system through scheduling algorithms. A process is the smallest unit of resource allocation.
  • A CPU can be used by only one process in a time slice. A thread is the smallest unit in which a CPU performs a task.
  • A CPU can only process one thread at a time. If multithreading is enabled inside the process, it is up to the process to schedule how the CPU should be allocated.
  • In the following states, processes only have CPU usage in the Running state.
  • A process does not go directly from Waiting (or blocking) to Running. When the required I/O operations are complete, the process enters the Ready state and waits for the operating system to allocate CPUS to the process again.
  • If the number of cpus is greater than or equal to the number of threads, then these threads are in a Parallel state. Otherwise, multiple threads need to compete for the CPU. This is a common situation.

The following are several states of the process:

While the process is Running, its internal threads are in the following state, and only the threads in the Run state actually acquire the CPU.

Kernel mode and user mode

Kernel mode and user mode describe the two execution levels of the CPU.

  1. Kernel mode: THE CPU can access all data in memory, including peripheral devices, such as disks and network cards, and can execute privileged instructions.
  2. User mode: The CPU can only access limited memory, does not allow access to peripheral devices, and can only execute unprivileged instructions.

Why do we divide user mode and kernel mode? Core functions, such as process scheduling, I/O operations, memory allocation, and even powering off the computer, are not expected to be called directly by user processes. Instead, they are stored in kernel space, which user processes cannot access directly, but only the system kernel can. This tight access control improves system security and stability.

The CPU is mostly in user mode while the application process is in execution state. However, when a user process is involved in a “privileged” operation, a System call is required to temporarily give CPU access to the System kernel, which assigns a kernel thread to handle it. The CPU register temporarily holds the original user mode instruction location, then the CPU switches to kernel mode, and updates to the new location of kernel mode (privileged) instruction to run kernel tasks. After the system call ends, the CPU registers revert to their original context, the user process regains the CPU, and the CPU is back in user mode.

The switch from kernel mode to user mode is achieved by the CPU. Frequent switching reduces the efficiency of CPU task execution.

Two cases of system calls

System calls are actually divided into fast system calls and slow system calls.

Fast system calls: some system calls that can be returned immediately, or only in microscopic time. Calls that simply read the state of the system (for example, check the process number) or modify the state of the system (kill a process) are fast system calls, or do not block.

A slow system call is a system call that depends on the occurrence of an external event and is blocked until the result of the external event is obtained. For example, reading data from pipes, terminals, and network devices may not exist because they haven’t sent a message yet. In other words, pause() and wait() are actively called, putting the process in a paused or waiting state.

View the transmission process of Socket data from the bottom

The operating system divides a small portion of memory for kernel operations, called kernel space. The contents of kernel space are protected and accessible only when the CPU is in kernel state. During Socket communication, messages are sent and received in user space. So messages need to go through a copy from kernel space to user space, which involves system calls.

This is shown in the figure above. When sending data, the server user program switches to the kernel mode, copies the message from the user space to the kernel space through the system call, and then sends the message to the other side through the network interface.

After the data arrives at the peer’s network interface, its driver writes the message to the kernel space in memory, and then sends an interrupt request to its CPU to indicate that new data has arrived. When the client process on the other side wants to read the data, the data is copied from kernel space to user space through a system call.

Copies of data in user space and kernel space are pure value copies.

Kernel threads and user threads

Do threads within a process block? If so, how does thread blocking affect the process?

From the thread perspective, we can divide the user thread and the kernel thread.

User threads sit on top of the kernel and are managed by each user process. Kernel threads are managed directly by the operating system. There are three models between user threads and kernel threads: many-to-one, one to one, and one to many.

In the many-to-one model, only one kernel thread is associated with the user thread (regardless of which process it belongs to). If a thread makes a blocking system call, the process will be blocked. In the case of multiple threads making system calls, user multithreading does not improve concurrency because only one kernel thread is available. Few operating systems use this model anymore.

The one-to-one model provides better concurrency performance, with the only drawback: it is too expensive to create kernel threads equal to user threads. Therefore, most operating systems actively limit the number of threads. Both Linux and Windows operating systems implement the one-to-one model.

The many-to-many model improves the thread “waste” phenomenon in the one-to-one model by reusing kernel threads and saves resources.

To sum up, whether thread blocking causes process blocking depends on what kind of threading model the system uses. In addition, if it is a single-process, single-threaded service, then thread blocking must result in process blocking.

DMA

I/O control has many control modes, here the author introduces the usual situation of DMA control mode, because it is microcomputer (that is, we use the notebook) commonly used I/O control mode; Mainframes typically use the I/O channel mode.

In the old days, I/O operations would first send an interrupt signal to the CPU, and the CPU would set a breakpoint, pause the running thread to process the I/O, and then return to the suspended work. This approach is only suitable for burst, small amounts of input (such as keyboard input) that require frequent breakpoints and on-site protection and recovery.

When transferring large amounts of data, the CPU is forced to take a long time to process I/O operations, resulting in low processing efficiency of the original thread. So people want to keep I/O as separate from the CPU as possible, and DMA (Direct Memory Access) is one way to do that.

In DMA control, memory, CPU, and DMA controller are all connected to the control bus, so the DMA device can interact directly with memory without CPU intervention. When an I/O operation needs to be performed, the CPU does some simple preprocessing, ** then the DMA controller briefly takes control of the bus from the CPU, takes care of the actual data transport ** and sends an interrupt request to the CPU when the execution is complete. The CPU then executes the interrupt service routine to finish the process: verify that the data was transferred to memory correctly, that there were any errors during the transfer, and that the DMA needs to process other data blocks.

Therefore, most of the time in the DMA control mode, CPU and read/write devices work separately. This has the advantage that even if a thread is blocked while waiting for an I/O result, the CPU can still be scheduled for use by another thread that is ready.

When is the blocking that we care about most?

We often talk about blocking situations where the server thread needs to perform an operation based on the data sent by the client (such as looking up a database table based on a field sent by the user), but the data is not sent from the other side.

A blocking problem occurs when a thread is “unable to read” or “unable to write” for a long time, but it must wait synchronously for the completion of the operation, or rely on the return of the operation, to continue executing the logic.

During a synchronous operation, a thread blocks depending on whether it returns immediately after making a system call:

  • Blocking: After a thread makes a system call and waits for the DMA to finish processing the message, the thread is suspended and loses CPU access.
  • Non-blocking: The thread makes a system call and immediately returns to continue to work on other tasks using the CPU, but at regular intervals the thread actively asks through the system call.

Based on blocking and non-blocking, several synchronous processing models of concurrent server processing are derived: blocking I/O model, non-blocking I/O model, I/O multiplexing.

Single thread blocking model

Before exploring the three synchronization models, let’s start with the simplest model: a single main thread that handles requests. This means that the server process can only accept and process one connection at a time. When the last connection is processed, the server waits to accept the next connection.

The pseudo code for the server to process the request is as follows:

while(1) {accept(a);// Accept the request and establish the TCP connection. This is where the program blocks when there are no new connections.
	read(a);// Accept the user's requested content.
  queryData(a);// Query the database.
	write(a);// Send a message.
}
Copy the code

Obviously with this pipelined approach, thread blocking is inevitable. If the main thread is still stuck in response to the previous client request, a new connection request will fail because the server thread cannot call the Accept () function in time to process the connection.

Multi-threaded or multi-process?

As soon as the amount of concurrency increases a little bit, it becomes too much for a single thread to handle. One optimization that comes to mind is that every time accept() gets a Socketfd, a new thread or child process is created to handle the connection. Here is the pseudo code for multithreaded/multiprocess processing:

while(1){
	skfd = accept();
	new Thread(exec(skfd));
    # or create a new Process to tackle this new socketfd:
    # new Process(exec(skfd));
}
------------------------------
exec(fd) {
	read();
	queryData();
	write();
}
Copy the code

We have two options:

  1. If we choose multi-process, we need tofork()Create a child process to handle the new connection.
  2. If we choose multithreading, we need tonewTo create aThreadTo deal with it.

So should you choose multi-process or multi-threaded? Creating a subprocess consumes a lot of system resources, but the resources are independent of each other, and the maintenance cost is lower.

All threads in a process share the same global memory, global variables, and so on. Although starting a thread is much less expensive than a process, there are additional issues for the programmer to consider compared to multi-process concurrency:

  1. The introduction of critical resource access control and lock mechanism not only increases the complexity of the program, but also affects the performance easily, and even occurs deadlock.
  2. In addition, if one thread crashes (essentially a memory error) and another thread accesses the wrong memory address, the other threads will be affected and crash together, resulting in the entire server process crashing.

The detailed differences between multithreading and multiprocess can be found in this article. As to which one to choose, it is different for different people. The three models we discuss next are all based on multithreading.

Blocking I/O model

The blocking I/O model, in which the child threads created by the service process process each connection according to blocking logic, blocks the thread while making some system calls. However, blocking only one thread is much less costly than blocking the entire service process. In addition, when a thread blocks because it does not receive a message (the Socket connection is idle), the server process can reasonably schedule the CPU to other threads that need it.

Non-blocking I/O model

The non-blocking I/O model is one in which the child threads created by the server process process each connection according to non-blocking logic. Threads do two things while they own the CPU:

  1. Every once in a while, a system call is made to inquire whether the message has arrived. If the message arrives, the data is returned. Otherwise, the system call simply returns an error indicating that the message did not arrive. The thread needs to try the next query.
  2. Handle the rest of the logic.

This may sound more “clever” than the blocking I/O model, but it is not necessarily less expensive than the blocking I/O model, because making frequent system calls means the CPU has to switch contexts frequently. And how often should threads poll? This is a harder problem to define — in a multithreaded environment, the cost to the CPU of this extra work may be even greater, so the non-blocking I/O model is not necessarily more efficient than the blocking I/O model.

I/O multiplexing

Multiplexing refers to using one main thread as a “sentinel” to check the state of multiple SocketFDs: When a Socketfd message has arrived in kernel space, another thread (or within the main thread) executes the Socketfd’s read() method to transfer the data from kernel space to user space, so that the user program can process the message.

In both models, assuming 10K connections, the server creates 10K threads. ** However, most connections are not very active and each thread remains idle most of the time. ** This makes the creation of multiple threads, switching, consume a lot of extra CPU resources – but the CPU could have saved the time to do more meaningful things.

This model uses as few threads as possible to complete concurrent operations and avoids the waste of system resources caused by the creation of a large number of additional threads, so it is called “reuse”. The I/O multiplexing model does not make a single thread faster, but rather supports more connections with less overhead.

The specific implementation methods are select, poll, epoll.

Thus, the process blocks only when the select, poll, and epoll functions are called, such as when no Socketfd is registered, or when no Socketfd is ready. I/O multiplexing is also known as the response-to-event model because it follows a “who-who-happened” approach.

By the way, the epoll function causes a herd effect, which I won’t discuss in detail here.

WebSocket

What does WebSocket have to do with sockets? This is like asking what is the relationship between JavaScript and Java? — They are not directly related. Because the Socket support long connection, so the WebSocket borrowed Socket name, by the way, said his long connection protocol is HTTP level, at least it sounds much better than WebLongConntectionProtocol.

Websockets, as the name suggests, are Sockets at the Web level. It was born in 2008 and became an international standard in 2011. For those who need to develop services based on WebSocket, you can use this page to test your interface.

If Socket is born, why WebSocket?

Since there is already a Socket API, why did HTML5 develop a separate WebSocket protocol to implement persistent HTTP connections? The reason is simple: because browsers work at the application layer and do not support calling the system Socket directly, each time the browser makes an HTTP request, the underlying Socket is done by the operating system, while the browser’s real purpose is to receive the HTTP response and render the page for the user to browse.

The browser is a sandbox execution environment; in other words, the browser itself exists to isolate the system environment. If the browser is involved in Socket communication, it has to deal with additional Socket communication, which is not the responsibility of an application-level program.

What is a sandbox? For example, we can create a virtual machine and do some destructive behavior inside the virtual machine — but this does not affect the host machine at all, because the two systems are completely isolated. The virtual machine is like a sandbox: you can paint on the inside of the sandbox, but it never affects the outside.

Furthermore, giving the browser the right to call the native Socket directly is a dangerous practice: the browser could secretly send your private data by calling the Socket. Socket is just a tool, it doesn’t care if the user’s intent is evil.

Therefore, from the perspective of security, the power of the browser is limited to the HTTP protocol system by W3C, ECMA and other organizations, and the responsible vendors will follow the specification. However, HTML5 wanted a socket-like full-duplex communication at the HTTP level, so it had to create another portal, which is how WebSocket was born.

What does it compensate for in the HTTP protocol?

The HTTP protocol is designed for short connections, so it doesn’t take into account long connections. If the client needs to obtain the latest data in a long period of time, the client has to choose short polling or long polling to implement:

  1. Short Polling – Ajax JSONP Polling: Using Ajax to send a request to a server at short intervals, and the server responds immediately.
  2. Ajax Long Polling: Use Ajax to send a request to the server to keep the connection open for a Long time until new data is available.

Short polling occupies a lot of bandwidth, while long polling makes the server have to keep the connection for a long time, increasing the pressure on the server. And as we explained earlier, setting up an HTTPS connection can take up to 4.5 RTT delays! For applications that rely on real-time communication, having to experience such a high latency for each message is intolerable.

Readers may wonder, since HTTP/1.1 already supports keep-alive by default, isn’t that enough? Servers can indeed set time to extend some connections, but for hours-long connections, relying on HTTP keep-alive alone is not enough.

One more question: HTTP/2 already supports server push, so why not take advantage of it? The main thrust of HTTP/2.0 is to cache content on the client side in advance to minimize the number of requests. It is not used for real-time, full-duplex communication between the server and client.

As a truth has it, contradiction is the engine of progress. WebSocket was created to solve these problems. It naturally supports the following features:

  1. The two parties that establish communication through The WebSocket protocol can transmit data to each other at any time, and the form is very close to the Socket.
  2. Instead of the “ask and answer” model, the two parties decide how to communicate and what actions to take when receiving a message based on four webSocket-defined events.
  3. Good compatibility with THE HTTP protocol.
  4. It is extensible.

The WebSocket protocol identifier is WS. WSS if communication is encrypted based on TLS. (In Tomcat/Spring Boot, the configuration method is similar to HTTPS)

ws://websocket.com:8080/endPoint
wss://secureWebsocket.com:8080/endPoint
Copy the code

WebSocket is intended to be compatible with existing network facilities built over the HTTP protocol, so port 80 or 443 is still used by default, depending on whether WS or WSS is used for the network connection.

In the WebSocket protocol, message dominance has changed. In THE HTTP protocol, the server needs to wait passively for the client to send a request to give the corresponding, while in the WebSocket protocol, whenever the server has a new message, it will actively push to the client, the client is no longer absolutely occupy the active side.

Before using the WebSocket protocol, front-end developers need to write logic to check whether the browser itself supports WS or WSS connections. Not all browsers and devices support it (e.g., browsers below IE9). Over time, however, WebSocket will become more and more supported across browsers and devices.

Establish WebSocket connections based on HTTP

WebSocket currently initiates connections based on HTTP/1.1. The client first sends a GET request. This GET request has the following header added compared to a normal HTTP request:

  • Upgrade: indicates that the request is upgraded to WebSocket request. This header has only one value:websocket.
  • Connection: indicates that this is a connection that contains an upgrade request. This header has only one value:Upgrade.
  • Sec-WebSocket-Key: generated by the browser to provideBasic connection protectionTo prevent malicious connections.
  • WebSocket—Version: indicates the WebSocket version supported by the browser. If RFC 4655 is used by default, the value is13.

When a client initiates a WebSocket connection request, it cannot send data until it receives a correct response from the server. If the server agrees to establish a WebSocket connection, the status code 101 (Switching Protocols) is returned with the following header:

  • Upgrade: same as above.

  • Connection: ditto.

  • The Sec – WebSocket – Accept: The server concatenates the sec-websocket-key + 258eAFa5-E914-47da-95CA-C5ab0dc85b11 in the request header, calculates the feed through SHA-1, and converts it into a Base64 string.

  • Sec-websocket-protocol: indicates the WebSocket version number adopted by the browser.

  • Access-control-allow-origin: This is a cross-domain response header that I wrote here to indicate that WebSocket connections are based on mutual trust based on the CORS same-origin policy.

WebSocket uses readyState to indicate the connection state:

field value meaning
CONNECTING 0 Indicates that the connection is in progress.
OPEN 1 The connection is successful and the two parties can communicate normally.
CLOSING 2 Indicates that the connection is closing.
CLOSED 3 Connection closed or WebSocket connection failed to open.

After the handshake is successful, both parties can ping each other at any time to check whether the connection is still maintained. When the ping request reaches the other party, the other party immediately sends a Pong reply message, which ensures that the other party is still active and prevents one party from unwittingly sending a large amount of useless data after the party is disconnected.

The frame format

After a WebSocket communication is established, the two parties will use WebSocket data frames for data communication instead of traditional HTTP data frames. A message may be sent in multiple frames. At the same time, data frames can be sent in plain text or binary format files.

0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-------+-+-------------+-------------------------------+ |F|R|R|R| opcode|M| Payload len | Extended payload length | |I|S|S|S| (4) |A| (7) | (16/64) | |N|V|V|V| |S| | (if payload len==126/127) | | |1|2|3| |K| | | +-+-+-+-+-------+-+-------------+ - - - - - - - - - - - - - - - + | Extended payload length continued, if payload len == 127 | + - - - - - - - - - - - - - - - +-------------------------------+ | |Masking-key, if MASK set to 1 | +-------------------------------+-------------------------------+ | Masking-key (continued) | Payload  Data | +-------------------------------- - - - - - - - - - - - - - - - + : Payload Data continued ... : + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + | Payload Data continued ... | +---------------------------------------------------------------+Copy the code

The following is a brief introduction to flag bits.

  • FIN: indicates whether this frame is the last frame of a message.
  • RSV1.RSV2.RSV3: The value is 0 if there is no extension to define these extra flag bits.
  • opencode: A 4-bit hexadecimal number represents the type of transmission frame.
    • x0: This is a subsequent frame.
    • x1: indicates that this frame transmits text.
    • x2: indicates that this frame transmits binary content.
    • x8: indicates that a connection is closed.
    • x9: indicates a ping.
    • xA: indicates a pong.
    • The remaining values are reserved options. Currently, if the flag bit is X8, X9, or xA, the frame is a control frame.
  • PayLoad Length: represents the net load length of this frame. If the length is greater than 127, a subsequent extension of 16 or 64 bits is required. If the flag bit indicates a length of 126, the following 16 bits are used to indicate the length. If the flag bit value is 127, the following 64 bits are used to indicate the length. So the length is either less than or equal to 125, or 126 plus 16 bits, or 127 plus 64 bits.
  • Masking-key: When the client sends a frame to the server, this flag is used to mask the net load.
  • Payload Data: The load clearing portion of the frame. Can actually be divided intoExtention DataandApplication Data(in order), but if the negotiation does not make any extension, thenExtention DataLength is 0. I don’t want to distinguish between the two.

Part of RFC 6455 frame format is as follows:

  1. If there are no frames, thenFin= 1 andopencode! = 0.
  2. The frame message contains: header frame (Fin= 0,opencode! =0), middle (Fin= 0,opencode=0), endframe (Fin= 1,opencode= 0). In the case of only two frames, there are only the first and last frames.
  3. The message sender must send frames in an ordered order. Unless the extension of the protocol specifies how to process out-of-order frames.
  4. In order to ensure that the other party can respond to the control frame in time, so the control frame can be inserted in the general message frame sent.
  5. The control frame is not allowed to split frames.
  6. Regardless of how both parties extend the WebSocket protocol, both parties must be able to handle receiving framed and unframed messages.
  7. The sender can frame the message at any block size.
  8. All frames in a message must be of one frame type, either binary or text.

WebSocket event

WebSocket defines what action should be taken when a connection is established, a message is received, an error occurs, and the connection is closed.

  • onopen: specifies the action (or callback function) to take when the WebSocket connection succeeds.
  • onmessage: Specifies the action to be taken after receiving a message from the peer.
  • onerror: Specifies the action to be taken when the connection fails.
  • onclose: Specifies the action to take when the connection is closed.

Implementations vary in JavaScript, Java, Python, and other languages, and the actions taken by each event depend on the business logic the application is implementing, so I won’t describe them too much here.

WebSocket is not a replacement for any one technology

For any technology, we need to put it in context of a usage scenario.

The classic HTTP protocol has been around for decades, and various specialized optimization mechanisms are well established. WebSocket is essentially a stand-alone protocol based on TCP (HTTP is used only to establish connections) : this means that it abandons many of the services supported by browsers and requires applications to provide their own implementations. From the message frames, we can see that WebSocket has “white space” in many places.

WebSocket is designed in a very different way from the short connection HTTP that many Web servers and their proxies and intermediaries maintain. Long connections and idle sessions consume memory and socket resources on all intermediate devices and servers and are costly. This is a challenge for WebSocket, SSE, or HTTP/2.

In addition, not all clients are able to support the WebSocket protocol and even block WebSocket traffic on some networks. At this point, as the designer of the product, it is necessary to prepare some standby means to ensure the normal operation of the service…… For example, node.js based socket. IO is a good choice because its library functions provide a means to “degrade communication” if the WebSocket protocol is not available.

In short, we should choose HTTP, WebSocket, or other SSE protocol, XHR technology according to their actual needs. If your Web project doesn’t rely on real-time messaging, the HTTP protocol is still the primary option.

Cross domain

Cross-domain issues must be taken into account in Web development, because the current development mode is generally separated from the front and back ends: for example, the front-end service is built based on the React framework, and the URL is assumed to be: A:3000 port. The real data source on the front end comes from another Tomcat based JavaWeb service, assuming the URL is B:8080.

When the user uses the browser to access the page, the browser is requesting the page resources in domain A at the moment, while the front-end page writes JavaScript scripts in domain A and uses Ajax to asynchronously access the resources in domain B. This request is considered as A cross-domain request by the browser, so it is intercepted.

Why would browsers intentionally intercept cross-domain requests? This is related to CSRF cross-domain attacks. To defend against this attack, browsers enact same-origin policies, which cause cross-domain problems.

Of course, from a Web developer’s point of view, the focus is on two things:

  1. The server can normally return resources (data) when a GET request is initiated to a trusted source.
  2. The server needs to set up a check mechanism to prevent cross-domain POST requests from changing/invading the database.

🌎 The content in this section is derived from: HTTP Access Control (CORS)

What is cross-domain

Cause of cross-domain problems – Same origin policy

The cross-domain problem arises because of the browser’s same-origin policy restrictions. The Same Origin Policy is a convention. It is the most core and basic security function of the browser. If the Same Origin Policy is missing, the normal functions of the browser may be affected. The Web is built on the same origin policy, and the browser is just an implementation of the same origin policy. The same origin policy prevents JavaScript scripts in one domain from interacting with the contents of another domain. Same-origin means that two pages have the same protocol, host, and port number.

A form is given below to indicate which requests are cross-domain requests.

Current page URL URL of the requested page Whether the cross-domain why
http://www.test.com/ http://www.test.com/index.html no Same-origin (same protocol, domain name, and port number)
http://www.test.com/ https://www.test.com/index.html Cross domain Different protocols (HTTP/HTTPS)
http://www.test.com/ http://www.baidu.com/ Cross domain Different primary domain (test/baidu)
http://www.test.com/ http://blog.test.com/ Cross domain Different subdomains (WWW /blog)
http://www.test.com:8080/ http://www.test.com:7001/ Cross domain Different port numbers (8080/7001)

Nonhomologous restriction

Non-same source requests have the following restrictions:

  1. Unable to read cookies from non-identical web pages.

  2. You cannot touch the DOM of non-homologous web pages.

  3. Unable to send Ajax requests to non-same-origin addresses (a common problem in front and back end development).

Cross-domain CSRF attacks

CSRF, Cross – Site Request Forgery. Cross-site request forgery, also known as: One Click Attack/Session Riding, abbreviated as: CSRF/XSRF.

As with XSS attacks, CSRF can be extremely damaging. Here’s an example: An attacker to steal your identity, in the name of your (Cookies) transmit malicious request, the request for server is perfectly legal, but finished the attacker’s expectation of an operation, such as in the name of your email, send messages, to steal your account, add a system administrator, or even to buy goods, virtual currency transfer, etc.

But don’t worry too much. Banks have already figured out and solved all the problems we can think of. 🤗

Use an example to describe the process of cross-domain attack

Suppose there is a request:

http://bank.example/withdraw?account=bob&amount=1000000&for=bob2
Copy the code

This is a GET request for a URL. It allows Bob to transfer 1 million to boB2’s account. Typically, after the request is sent to a web site, the server first verifies that the request is from a valid session and that the session user Bob has logged in successfully.

Mallory himself had an account at the bank and knew that the URL could be used to transfer money. Mallory can send a request to the bank himself:

http://bank.example/withdraw?account=bob&amount=1000000&for=Mallory
Copy the code

But the request came from Mallory and not Bob. He cannot pass security authentication, so the request will not work.

At this time, Mallory thought of using CSRF attack method. He first made a website by himself and put the following code in the website:

src = "http://bank.example/withdraw? account=bob&amount=1000000&for=Mallory"Copy the code

And lure Bob to visit his site with ads and so on. When Bob visits the site, the URL is sent from Bob’s browser to the bank, and the request is sent to the bank server along with the cookie in Bob’s browser.

(Important) Most of the time, this request will fail because it requires Bob’s authentication information. However, if Bob happens to have just visited his bank and the session between his browser and the bank’s website has not expired, the browser’s cookie contains Bob’s authentication information. At that point, tragedy struck. The URL request would be answered, and the money would be transferred from Bob’s account to Mallory’s without Bob knowing it.

Later, when Bob finds out that his account is short of money, even if he checks the bank log, he can only find that there was a legitimate request from him to transfer the money, and there is no sign of any attack. Mallory gets the money and gets away with it.

How to defend against CSRF attacks

Currently, there are three strategies to defend against CSRF attacks: Verifying the HTTP Refer header; Add a token to the requested address and validate it; Customize and validate attributes in the HTTP header.

This is focused on requests that trigger updates to the database, such as POST behavior to submit a form.

For token authentication, here is a simple idea:

  1. [Back-end] When the user jumps to the page, a random token is generated and stored in the session.
  2. [Page] The form puts the token in the hidden field and submits the form with the token in the header.
  3. [Back-end] Obtains the header token, verifies it, and submits it together with the token in the session. If the token is consistent, it passes; otherwise, it does not submit.
  4. [Back end] Generate new tokens and pass them to the front end.

CORS: You don’t have to stop eating for fear of choking

Before CORS, there is actually another way to solve cross-domain requests: JSONP. However, this method only supports GET requests and has many limitations for requests such as POST, so I won’t go into details here.

Cross-domain resource sharing (CORS) is a mechanism that uses additional HTTP headers to tell browsers that Web applications running on one domain are allowed to access specified resources from different source servers. When a resource requests a resource from a different domain, protocol, or port than the server where the resource itself resides, the resource initiates a cross-domain HTTP request.

For security reasons, browsers restrict cross-source HTTP requests from within scripts unless the response packet contains the correct CORS response header.

The cross-domain resource sharing (CORS) mechanism allows Web application servers to control cross-domain access, so that cross-domain data transmission can be carried out securely.

A simple request

Some requests do not trigger CORS precheck requests. Such requests are called “simple requests.” A request may be considered a “simple request” if all of the following conditions are met:

  1. Use one of the following methods:

    {the GET, HEAD, POST}.

  2. An HTTPS with only the following request header will be considered a simple request by CORS:

    {Accept, accept-language, content-language, content-type, DPR, DownLink, save-data, viewport-width, Width}.

  3. The value of content-type is limited to one of the following:

    {text/plain, multipart/form-data, application/x-www-form-URLencoded}

The CORS header field is used between the client and server to handle cross-domain permissions:

Preview the request

Unlike the simple request mentioned above, a prechecked request requires that you first send a prechecked request to the server using the Options method to see if the server will allow the actual request. The use of precheck request can avoid the waste of network resources when cross-domain clients transfer large amounts of data but are rejected.

Here is an HTTP request that requires a precheck request:

var invocation = new XMLHttpRequest();
var URL = 'http://bar.other/resources/post-here/';
var body = '
      
      
        Arun
       ';
    
function callOtherDomain(){
  if(invocation)
    {
      invocation.open('POST', URL, true);
      invocation.setRequestHeader('X-PINGOTHER'.'pingpong');
      invocation.setRequestHeader('Content-Type'.'application/xml'); invocation.onreadystatechange = handler; invocation.send(body); }}Copy the code

The above code sends an XML document using a POST request that contains a custom request header field (X-Pingother: pingpong). In addition, the content-type of the request is Application/XML. Therefore, the request needs to initiate a precheck request first.

The way the Spring framework addresses cross-domain requests for resources

Regardless of the server technology, the idea for handling cross-domain requests is the same: Add an HTTPS response header that tells the browser that the server allows the cross-domain request.

For the Spring framework, the most common approach is to configure a WebMvcConfigurer with a CORS response header that passes the inspection on the browser side.

import org.springframework.context.annotation.Configuration;
import org.springframework.web.servlet.config.annotation.CorsRegistry;
import org.springframework.web.servlet.config.annotation.WebMvcConfigurer;
 
@Configuration
public class CorsConfig implements WebMvcConfigurer {
 
    @Override
    public void addCorsMappings(CorsRegistry registry) {
        registry.addMapping("/ * *")
                .allowedOrigins("*")
                .allowedMethods("GET"."HEAD"."POST"."PUT"."DELETE"."OPTIONS")
                .allowCredentials(true)
                .maxAge(3600)
                .allowedHeaders("*"); }}Copy the code

For more solutions, click here

Cookie

HTTP is short and has no memory. So we need some way for the server to maintain its memory. In the last two parts of this article, I will introduce the use of cookies and sessions respectively.

Cookies are small text files, usually no more than 4K. When you browse a web site, the site is stored in a small text files on your machine, it records your user ID and password, visit the web page, the information such as time, when you once again came to the site, site by reading the Cookie, the information that is relevant to know that you can make the corresponding action, as in the slogan of page shows welcome you, Or remember your login status and so on.

Because cookies typically contain encrypted information, servers tend to encrypt sensitive content.

A Cookie records a k-V key-value pair. When the server responds, the Cookie is recorded into the SET-cookie in the HTTP packet header in the format of {name}={value}. Each pair of K-Vs is placed in a separate set-cookie row.

The browser automatically carries all possible cookies when we visit the site, so we don’t need to worry too much about this process. During the request, the browser will put the Cookies in the request header according to the request Path. The format is {n1}={v1},{n2}={v2}… . The browser sets a maximum number of cookies for the same domain name, and this value cannot be set. When the number of cookies from the same site exceeds the allowed value, browsers typically adopt policies such as eliminating the earliest cookies.

Again, cookies are stored on the client side. Cookies cannot be shared between browsers in different kernels because each browser kernel does not necessarily process and store cookies in the same way.

The structure of the Cookie

The main content of the Cookie is listed here.

The property name describe
name The name of this Cookie.
value The value of this Cookie. If Unicode characters such as Chinese characters are included, they need to be encoded. If it is a binary file, the BASE64 encoding is required.
expires Expiration time. If the value is -1, the Cookie is a session Cookie.
secure If this value is true, the browser will carry this Cookie only when sending a request to the server over the SSL-based security protocol (HTTPS).
domain Set the domain name of the Cookie. Such as setting up.baidu.com, then this Cookie in all to.baidu.comIs valid in the domain name with the suffix. If the server is an IP address, only this unique IP address can be specified.
path The path where the Cookie is used. If set to/hello/, only urls prefixed with it carry this Cookie.

Note: Spring calculates the GMT time of Cookie Expires by setting the maxAge (valid time) relative to the time the request was received.

A session Cookie

In Spring Boot, a Cookie is considered a session Cookie when its maxAge property is set to -1 (no expires is set by default). Session cookies are stored only in client memory and disappear as soon as the user closes the browser.

Persistent cookies

If maxAge is greater than 0, the browser will save the Cookie to the hard disk. So even if the browser is opened and closed, the Cookie remains valid until it “spoils.”

Setting cookies is a cross-domain concern

Domain defaults to the current domain when nothing is set for it.

For IP addresses, there is no parent IP/ child IP. For example, if the Tomcat service 192.168.229.105:8080 is requested, the returned Cookie belongs to domain 192.168.229.105.

For domain names, domain can also (and only) be set as the parent domain. If the domain of the Cookie is set to a secondary domain name (such as taobao.com), all subdomains under it can be shared. If the domain of a Cookie is set to a third-level domain name (such as www.taobao.com), all subdomains under this domain (pay.www.taobao.com) are accessible, but other domains (such as buy.taobao.com) are inaccessible.

It is worth mentioning that the relationship between the subdomains under the same parent domain name belongs to cross-domain. For example, a client accessing a server through the edu.cookie.com domain will not carry a Cookie whose domain is set to www.cookie.com, even though their parent domain is the same. In this case, we need the CORS sharing mechanism to solve the problem.

If cross-domain requests (such as remittances) require authentication, we need to set the access-control-allow-Credentials of the response header to true in the server-side program. In addition, you need to set the access-control-allow-origin header to strictly restrict cross-domain domain names. The value cannot be “*”.

In addition, in the request, the client (front) by XMLHttpRequest. WithCredentials also should be set to true, said requests to back-end launched a Credentials the nature of the request.

Set cookies under Spring Boot

We mainly rely on two packages:

import org.springframework.web.bind.annotation.*;
import javax.servlet.http.*;
Copy the code

Here is a brief demonstration of how to add cookies to a response packet.

/** * A simple demonstration of setting Cookie. *@param response javax.servlet.http.HttpServletResponse.
 * @returnReturns a simple text. * /
@RequestMapping("/hi")
public @ResponseBody
String setCookie(HttpServletResponse response) {

    Cookie cookie = new Cookie("name"."value");

    // All Cookie attributes have a corresponding set method.
    // Count in seconds, and then automatically convert to Greenwich mean Time.
    cookie.setMaxAge(60);

    // Add one or more cookies to the response header.
    response.addCookie(cookie);

    return "hi";
}
Copy the code

In general, we will specify several values for cookies, including domain and path. After all, smaller scopes mean more security. If you need to secure the transmission of this Cookie, you can set the Secure item to true. This way, the Cookie is only carried when the browser accesses the site over HTTPS.

Session

Session a Session is actually describes a user from the access services to access to the end of an abstract process: from the user to open the browser to login a site, and for a period of time, under the page of the site continuously request or submit some resources, in the end to the user closes the browser page said a Session came to an end. During this session, the user wants the server to remember his identity and not have to submit his identity every time he requests (commits) a resource.

So the “Session” we’re talking about is actually the implementation of this requirement. Basically, the server generates a temporary credential and issues the credential number to the user, allowing the user to submit new requests over a period of time with only the credential number. If the server can verify that the credential is valid and valid, the user does not need to perform additional authentication. In addition, you want to be able to put some information on the credential so that the server can find them by the credential number. In terms of implementation, we can even use JWT (Json Web Token) for a similar purpose, but the details are quite different. We are only talking about traditional Session implementations here.

Is there any other way for the server to proactively send the credential number to the client? The most effective way of doing this is, of course, cookies.

In the JavaWeb implementation, for example, each time a new request is sent to the server, the server can optionally generate a JSESSIONID (its name doesn’t matter, it just stands for “Java SessionID”) as an identifier for a session. And returned to the user as a Cookie. Now, we are tacitly allowing clients to use cookies.

During the time the user keeps the browser page open, the browser will carry the Cookie with the JSESSIONID id to interact with the server the next time he makes a request. The server stores the jSessionids of multiple users in memory in a hash table-like structure (to allow the server to read them at high speed). Or it can be written to a file and persisted in a database (typically used for sharing across server sessions).

A typical application of sessions is the “shopping cart” case. Since the HTTP protocol itself does not store the user’s state (because it is connectionless), we need a JSESSIONID to track the user’s state as they switch between different pages and browse for items, and to store the information in the Session scope when the user adds items to the cart.

How Spring Boot processes sessions

As with cookies, we rely on HttpServletRequest and HttpServletSession in the Java extension package javax.servlet. HTTP to set up the Session.

For example, here’s a Controller that simply sets Session Session properties:

import org.springframework.stereotype.Controller;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.ResponseBody;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpSession;

@Controller
@RequestMapping("/testSession")
public class SessionWaiter {

    @RequestMapping("/hi")
    public @ResponseBody String
    getSession(HttpServletRequest httpServletRequest){

        GetSession () getSession() getSession()
        // The JSESSIONID is automatically generated by the framework.
        HttpSession session = httpServletRequest.getSession();
        // Set the properties
        session.setAttribute("name"."tom");
        // Read the properties
        session.getAttribute("name");
        // Remove attributes
        session.removeAttribute("name");
        // Set the validity period in seconds. -1 indicates that the validity period will never expire
        session.setMaxInactiveInterval(1000);
        // Disable it
        session.invalidate();  
        returnsession.getId(); }}Copy the code

We started the Spring Boot service and tested the interface with Postman software, which captured a session Cookie:

Duration of the Session

As you can see from the above example, the Cookie that records the JSESSIONID is a session Cookie. This means that a session ends when the user closes his or her browser page.

That was not a very precise statement. Whether the user’s browser is closed or not does not affect the server’s judgment of Session validity. The server doesn’t know when the user has closed the browser and uses that to determine the session is over.

The server actually sets a maximum inactivity time for each Session (the setMaxInactiveInternal() method in the code above). If the server does not process the request with this JSESSIONID credential number for a considerable period of time, the server will think that the client unilaterally ended the Session and will delete the Session and the associated attribute information from memory/file in time to save space.

The session Cookie is cleared when the user closes the browser. The user will no longer use this JSESSIONID to maintain a session (the next time he opens the browser to view the page, the server will return a new JSESSIONID to maintain the session), However, the unilaterally discarded JSESSIONID is still stored on the server side for some time until the setMaxInactiveInternal() function times out.

In other words, even if the user does not close the browser, the server will still clear the inactive JSESSIONID when he stays on a page in the “shopping cart” and does not make any further requests to keep the session active for a considerable period of time. How much time is “quite a long time”? Each server has its own definition.

We stayed under the web pages of some sites for a long time. As a result, a pop-up window indicating that the login information had expired occurred during subsequent visits. The server set the MaxInactiveInterval of the Session to a small value, so that when we did not do any operation for a long time, the server mistakenly thought that the Session had ended.

How do you want to keep a user’s login status for a long time? At this point, I recommend a more general session Cookie to maintain the login state of the user, and set the Cookie to be valid for a little longer.

Other ways to implement “Session”

In some cases, however, users have banned their browsers from storing cookies because they don’t like their browser to remember their sensitive information directly (it’s a little uncomfortable when you visit your bank’s website and your browser has kindly filled out your card number and password for you).

Disabling cookies means that your login status cannot be saved at all (you have to re-enter your account and password every time you log in to some page), and that Session has no way to track user sessions through cookies. As the developer, we must take this into account and ensure that the Session functionality can be implemented even if the client is not allowed to use cookies.

In the Jsp era, we usually did this by attaching sessionIDS, or URL rewrites, to all the hyperlinks inside the returned page.

<a href = "www.example.com/subPage"> ↓ append the SessionID to the hyperlink, so that the next time the user clicks the link, it sends a GET request with parameters. <a href = "www.example.com/subPage?JSESSION=1234567">Copy the code

With the rise of front and back end separation, the front end and back end communicate in the form of a token token: for example, after receiving a user request, the server returns a token string. In subsequent requests, the front-end only needs to put the token into the HTTP header and hand it over to the back-end server for authentication and calculation.

Including the author mentioned before JWT, is to “Json format to achieve token exchange”. In addition, JWT guarantees that internal information has not been tampered with based on asymmetric encryption (but it does not encrypt the data itself).

Note: Even if the browser turns off the Cookie function, it can still receive Cookie information by reading the set-cookie header of the response packet, but the difference is that the browser does not actively save the Cookie to disk.

A solution for storing sessions in distributed applications

Generally, sessions are stored directly in memory for read efficiency. Without any sharing policy, other servers will not be able to access Session records within a server. Therefore, the load balancing policy of Nginx has an IP_hash, which enables the user to be directed to the server that has recorded the corresponding session information without the session invalidity.

But with the gradual growth of Internet traffic, distributed application and cluster processing has gradually become a trend. People try to make each node in the cluster share sessions with each other. This way, the load balancer doesn’t have to deliberately direct every user to a fixed server node.

Session sharing can be implemented in a variety of ways, such as all nodes storing Session information in a unified Redis in-memory database. Spring, like Java, already provides a ready-made Spring-Session solution, which implements Session sharing while ensuring the performance of the service, and developers can implement the function with simple configuration.

Cookie or Session ?

In terms of form and concept, sessions and cookies are two separate things. Many people (including myself) have chosen to compare the two together, probably because both cookies and sessions can implement Session tracking, and in the early days, the common implementation of sessions used Session cookies, so the two do have a bit of a connection. To be clear, however, sessions do not necessarily need to be implemented using cookies, as there are other technologies and frameworks that can achieve similar functionality.

The difference between Cookies and sessions

Access mode

Cookies are small files used by browsers to store user information. In a Cookie, only ASCII strings can be stored. If you need to access Unicode characters or binary data, you need to encode ahead of time. In the case of a Java program, if we wanted to save an object entity as a Cookie, we might need to do some complicated serialization.

Session is an abstract concept, and the Web framework of each major language provides its own implementation of Session, so the data can be saved with the data structure supported by each development language. In Java, for example, you can set the Attribute in Session to an entity type inherited from Object so that your persistence logic can quickly manipulate the entity without serializing it.

privacy

Cookies are stored on the client, or on a computer disk. Normally we don’t care where cookies are stored, but there are ways to find them, including other applications running on your computer that inadvertently or intentionally read Cookie content. In addition, HTTP packets sent by the browser are not encrypted by THE TLS layer, and the contents of cookies are not encrypted. Therefore, some sensitive data and information may be snooped and copied by middlemen. So now some computer users don’t allow browsers to access Cookie information.

If a Cookie is like a credential, then a Session simply provides the user with a credential number, whereas the actual contents of the credential are stored on the server and only an ID number is known to the outside, so the Session approach is superior in terms of security.

Different expiration dates

Although sessions have better security, the information held by the Session is lost after a certain period of time after the user closes the browser.

Cookies are divided into session cookies and persistent cookies. If you want to keep user information for a long time, just set the Expires attribute for cookies.

The pressure on the server is different

The Cookie is stored on the client side, so its preservation does not consume server resources. The management and implementation of Session are both on the server side. In the high concurrency scenario, a large number of sessions are generated, so a large amount of memory space will be consumed.

For high-concurrency scenarios, if I had to choose between cookies and sessions, I would choose the former to spread some of the pressure of storing data among clients.

Actual: based on Spring Boot + Nginx receive files

Requirement: Users use the front-end page to POST files to the Spring Boot server, and Spring Boot forwards the files to the /root/images.{data}/ directory on the Nginx server. Where data is the date of upload, such as 20200624.

After the upload is complete, we only need to access the Nginx server via GET to access the static content.

The only problem with the project is how to use Java to connect to the remote centOS host. Therefore, we need to introduce the following Maven dependencies. Jcraft is used to connect to a remote Linux host and enable SFTP connection based on port 22:

<dependency>
    <groupId>com.jcraft</groupId>
    <artifactId>jsch</artifactId>
    <version>0.1.54</version>
</dependency>        
<dependency>
    <groupId>joda-time</groupId>
    <artifactId>joda-time</artifactId>
    <version>2.10.3</version>
</dependency>
Copy the code

The Nginx configuration file needs to be modified in two places. Because Nginx accesses a folder under the root user, we need to change the user from Nginx to root in /etc/nginx/nginx.conf. (Avoiding 403 errors)

+ user root;
- user nginx;work_process 1; .Copy the code

Configure location in the server block:

# 127.0.0.1/images/a.jpg -> /root/usr/images/a.jpg location ^~ /images {expires 1d; root /root/usr; autoindex on; }Copy the code

Create a new sftp.properties configuration file in the project resources directory to connect to the Nginx server:

Host = hadoop102 # Username and password of the Nginx server Sftp. user = root sftp. PWD = 123456 # SFTP is a subservice belonging to SSHD. Sftp.port = 22 # (important) The initial directory is /root because you are logging in as root. # So the actual path of the resource is /root/usr/images. sftp.rootPath = /usr/imagesCopy the code

In addition, configure the UTF-8 decoding mode in the application.properties file to avoid the garbled text problem once and for all:

Configure UTF-8 to decode Unicode characters.
spring.http.encoding.charset=UTF-8
spring.http.encoding.enabled=true
spring.http.encoding.force=true
Copy the code

Create a connection tool and delegate Spring to auto-assemble from the configuration file with the @Component annotation:

package bin.util;

import com.jcraft.jsch.Channel;
import com.jcraft.jsch.ChannelSftp;
import com.jcraft.jsch.JSch;
import com.jcraft.jsch.Session;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.context.annotation.PropertySource;
import org.springframework.stereotype.Component;

import java.util.Properties;

/** * PropertySource=> Manually load the ftP-related configuration file information, utF-8 character set. * Component=> delegate to Spring for auto-assembly. * /
@PropertySource(value = "classpath:ftp.properties", encoding = "utf-8")
@Component
public class SftpUtil {

    @Value("${sftp.host}")
    private String host;

    @Value("${sftp.user}")
    private String user;

    @Value("${sftp.pwd}")
    private String pwd;

    @Value("${sftp.port}")
    private int port;

    @Value("${sftp.rootPath}")
    private String rootPath;

    public ChannelSftp getChannel(a) throws Exception {

        //Create a new session
        Session sshSession = new JSch().getSession(user, host, port);
        //set session
        sshSession.setPassword(pwd);
        Properties sshConfig = new Properties();
        sshConfig.setProperty("StrictHostKeyChecking"."no");
        sshSession.setConfig(sshConfig);
        //try connect;
        sshSession.connect();

        //get channel;
        Channel channel = sshSession.openChannel("sftp");
        channel.connect();

        return (ChannelSftp) channel;
    }
	//TODO omits the get and set methods.
}
Copy the code

Create a Service for file upload, using the @Resource note section to resolve dependencies between components.

package bin.service;

import bin.util.SftpUtil;
import com.jcraft.jsch.ChannelSftp;
import com.jcraft.jsch.SftpException;
import org.joda.time.DateTime;
import org.springframework.stereotype.Service;
import org.springframework.web.multipart.MultipartFile;
import javax.annotation.Resource;
import java.io.IOException;
import java.io.InputStream;

/** * JavaWeb - - -> (trans2Nginx -> putFile) - - - > Nginx */
@Service
public class UploadFileService {

    @Resource
    private SftpUtil sftpUtil;

    /** * File transfer method exposed to Controller. *@paramUpload Passes a file as a parameter. UploadFileService is responsible for the transmission. *@returnReturns true if the transmission is normal. * /
    public boolean trans2Nginx(MultipartFile upload){

        try {
            // Create a dir for the date
            String subPath = new DateTime().toString("/yyyyMMdd/");
            return putFile(
                    upload.getInputStream(),
                    subPath,
                    upload.getOriginalFilename()
            );
        } catch (IOException e) {
            e.printStackTrace();
        }

        return false;
    }

    /** * Before uploading the file, first check whether there is a corresponding directory. If there is no directory, the directory is created first. *@paramPath requires an incoming path. *@paramSFTP An SFTP connection is passed in. *@throwsSftpException A connection exception may occur. * /
    private void createDir(String path, ChannelSftp sftp) throws SftpException {

        String[] folders = path.split("/");

        //mkdir {path} -p
        for ( String folder : folders ) {
            if ( folder.length() > 0 ) {
                try {
                    sftp.cd( folder );
                }
                catch( SftpException e ) { sftp.mkdir( folder ); sftp.cd( folder ); }}}}/ * * *@paramInputStream converts files into input streams. *@paramSubPath Name of the subdirectory under /root/images/*@paramFileName Indicates the actual name of the transferred file *@returnReturns true on success. * /
    private boolean putFile(InputStream inputStream, String subPath, String fileName) {

        try {

            //get Channel
            //pwd> /root
            ChannelSftp remote = sftpUtil.getChannel();

            //put file
            String path = sftpUtil.getRootPath() + subPath;
            //real path => (/root)/usr/images/20200624/
            this.createDir(path,remote);

            // Add the full absolute path.
            remote.put(inputStream,"/"+sftpUtil.getUser() +path + fileName);

            //disconnect
            remote.quit();
            remote.exit();
            return true;

        } catch (Exception e) {
            e.printStackTrace();
        }
        return false; }}Copy the code

Add the corresponding mapping method to the Controller and receive the file with key as file in the multipart/form-data form via @requestparam.

@Resource
private UploadFileService uploadFileService;

@PostMapping("/postFile")
public @ResponseBody
    String postFile(@RequestParam("file") MultipartFile file){
    if(uploadFileService.trans2Nginx(file)){
        return file.getOriginalFilename() +
            " has been uploaded successfully.";
    }return "fail";
}
Copy the code

Practice: Build HTTPS services in a Spring project

To implement the HTTPS effect, we first need to prepare a certificate. If there is a cloud server, we can apply for an authoritative certificate from the manufacturer. In this case, however, we just generate a mock self-signed certificate using Java’s KeyTool tool (only authoritative CA authorities have the authority to issue self-signed certificates). In other words, our self-signed certificate is not recognized by the browser.

keytool -genkey -alias myKey -keyalg RSA -keysize 2048  -keystore D:\myKey.p12 -validity 365
Copy the code

The meanings of parameters are as follows:

parameter meaning
-genkey Generate a new public key.
-alias Represents the alias for this certificate.
-keyalg Selected encryption algorithm. Here the selectionRSAAsymmetric encryption algorithm.
-keysize Length of the public key.
-keystore The location, or output path, of the certificate.
-validity The validity period of the certificate is calculated in days.

We need to remember the password for the key we set. After executing, copy mykey.p12 to the Resource folder in the Spring Boot project and do the following configuration in application.properties:

# Path where the generated certificate resides.
server.ssl.key-store=classpath:myKey.p12
# The alias used to generate the certificate.
server.ssl.key-alias=myKey
The password set when the certificate is generated is equivalent to the private key.
server.ssl.key-store-password=12345678
Copy the code

Start the service, we are the input in your browser and visit http://localhost:8080/test/hi, you will be prompted to server side didn’t make timely response to this request, debug mode to see 504 error.

Type https://localhost:8080/test/hi in your browser and visit, we can get the response. Of course, since we generated the HTTPS certificate ourselves, it will not be recognized by the browser, so the browser will issue a warning. In a real deployment environment, we would replace this virtual certificate with one that is recognized by the browser and issued by a CA authority.

Appendix: References

Thanks to the authors of the following posts for their knowledge and insight into my review of Internet protocols.

HTTP based

  • Is It so hard that Spring Boot supports Https?
  • How to use Cookies in Spring Boot
  • Annotations @ CookieValue
  • One article on the HTTP protocol is enough
  • HTTP request headers, the basics that you need to remember
  • The difference between a HTTP1.0/1.1/2.0
  • HTTP protocol super detailed
  • GET,POST request, common data encoding format
  • Structure of HTTP request/response packets
  • Why is a precheck request sent before every request?
  • HTTP request for the front test
  • The request method described in 8 HTTP request modes
  • Spring Boot starts Gzip compression
  • POSTMAN has several ways to submit
  • Are you still struggling with the concepts of Http?
  • Spring starts Gzip compression
  • ContentType and applications in Spring
  • This section describes the content-type in the Http request
  • What are the steps behind typing a URL into the browser address and pressing enter?

A TCP connection

  • Why does it take three TCP connections, not two
  • The principle of TCP sliding window mechanism is summarized intuitively
  • Detailed description of TCP sliding window and congestion control mechanism
  • TCP congestion control
  • What are we talking about when we talk about HTTP header blocking?
  • Low latency and user experience talk
  • TCP timeout and retransmission mechanism
  • A TCP packet is confirmed to an ACK
  • Will TCP be replaced by UDP?
  • Is UDP the future of the new generation?
  • QUIC became the standard transport protocol for HTTP/3!
  • Everything you need to know about QUic-HTTP /3

HTTPS

  • The Wireshark captures and analyzes HTTPS and HTTP packets
  • HTTP and HTTPS protocols, look at this one!
  • Note on illustrated HTTP
  • ➤ The principles and procedures of CA certification and HTTPS principles
  • Principles of HTTPS and CA
  • HTTPS principle and process
  • HTTPS encryption (handshake) process

TLS

  • TLS1.3 Specification (RFC documentation)
  • TLS1.3 development history
  • Seven handshakes with nine times latency
  • Reprint -HTTPS practical one-way and two-way authentication
  • This section describes the operation mechanism of the SSL/TLS protocol
  • Illustrate the SSL/TLS protocol
  • What are the fundamentals of ECDSA public and Private key encryption and signing?
  • ECC certificate and RSA certificate
  • TLSv1.3 Detailed handshake
  • TLSv1.3,
  • TLSv1.3 0 RTT
  • Public key encryption algorithm | RSA and ECC system contrast to do those things
  • The popular science TLS1.3
  • TLS1.3 vs TLS 1.2
  • Introduction to DH key exchange

HTTP2.0

  • HTTP2.0
  • From HTTP/0.9 to HTTP/2: Understand the history and design of the HTTP protocol
  • HTTP2.0 new features

HTTP Connection Procedure

  • A complete HTTP request process flow chart for client browser (annotate the request process with text and text)
  • A complete HTTP request process
  • MDN web docs

Cross-domain implementation with Spring

  • Spring MVC interceptors defend against CSRF attacks

  • How is Session implemented? Where is it stored?

  • SpringBoot customizes Response Headers

  • In a clustered environment, Spring Sessions are used to share sessions

QUIC

  • How does TLS1.3/QUIC achieve 0-RTT
  • QUIC agreement
  • Interpret QUIC
  • How to play WITH HTTP 3?
  • QUIC Protocol learning 3: Connection lifetime
  • QUIC Protocol Probe – ios practice
  • The Road to QUIC
  • Science: Analysis of QUIC protocol principle
  • QUIC’s five characteristics and external performance
  • What do you think of HTTP/3?

Socket

  • What is a Socket
  • Details the Socket principle
  • Sock_stream (sock_dgram, sock_stream)
  • I met a Socket
  • How to determine a protocol cluster?
  • SocketType enumeration
  • What types of sockets are there?
  • What is the port number generated by accept?
  • Socket accept parsing
  • Blocking, non-blocking, asynchronous, and synchronous?
  • Socket Indicates the difference between HTTP, long and short connections, and TCP/UDP
  • Socket,Websocket,HTTP
  • What is Socket programming?
  • What is CPU intensive and what is I/O intensive?
  • Will I/O always occupy the CPU?
  • Timing of process scheduling
  • CPU context Switch
  • Port Occupation Problem
  • Can a process have more than one port?
  • Jing group of problem
  • Nginx works in a multi-process manner
  • Nginx in the stampede phenomenon solution
  • Port number & Can a port number be bound to multiple processes? & Can a process bind multiple port numbers?
  • User mode and kernel mode
  • Multithreaded model
  • Thread system call blocking causes the process to block
  • Does thread crash cause process crash?
  • Basic status of a process
  • Thread, process, multi-process, multi-thread. Concurrency, distinction and relationship of parallelism
  • Really understand the difference between blocking
  • Linux packet sending process
  • Underlying principle of Socket sending and receiving
  • Understanding of the DMA
  • Synchronous blocking and I/O multiplexing
  • I/O blocking and non-blocking, synchronous and asynchronous

The WebSocket protocol

  • WebSocket and HTTP2.0
  • WebSocket
  • WebSocket data frames
  • Getting to the bottom of HTTP and WebSocket Protocols (part 2)
  • WebSocket from beginner to proficient
  • MDN WebsSocket
  • Why does the browser not support calling the system Socket? The higher level protocol, WebSocket?
  • Why not just use the socket instead of defining a new Websocket?
  • What exactly is a browser sandBox?
  • Can HTTP/2.0 replace WebSocket?
  • Why HTTP when you have sockets?

Session

  • How does the backend generate sessions
  • Based on the differences and pros and cons of JWT and Session
  • The difference between Cookies and sessions

Practical article

  • SpringBoot+ NGINx to achieve resource upload function
  • SpringBoot series – Loads custom configuration files
  • Use JSCH to create a recursive directory remotely