WebSocket for full-duplex communication

What is a WebSocket?

WebSocket is a network communication protocol. Born in 2009, in 2011 by IETF as the standard RFC 6455 communication standard. And is supplemented by RFC7936. The WebSocket API is also a W3C standard.

WebSocket is a protocol that HTML5 is beginning to provide for full-duplex communication over a single TCP connection. Without the concepts of Request and Response, they are completely equal. Once a connection is established, a true persistent connection is established, and both parties can send data to each other at any time.

HTML5 is the latest version of HTML, with some new tags and a brand new API. HTTP is a protocol, the latest version is HTTP/2, so WebSocket and HTTP have some overlap, there are still many differences between the two. The intersection is in the HTTP handshake phase. After the handshake succeeds, data is transferred directly over the TCP channel.

Why was WebSocket invented?

Before WebSocket, in order to realize instant communication, Web had the following schemes: polling at first, Long polling later, streaming at last, and SSE at last, which also experienced several different evolution modes.

(1) At the beginning of the short Polling stage

In this way, it is not suitable to obtain real-time information, the client and the server will always be connected, every once in a while to ask. The client will poll for new messages. This way the number of connections will be large, one to accept, one to send. In addition, HTTP headers are sent each time a request is sent, which can consume traffic and CPU utilization.

At this stage, you can see that one Request corresponds to one Response, back and forth.

On the Web side, the short Polling is implemented with AJAX JSONP Polling.

Because HTTP cannot hold a connection indefinitely, data cannot be pushed frequently and for long periods between the server and the Web browser, so Web applications implement round-robin through frequent asynchronous JavaScript and XML (AJAX) requests.

Advantages: Short connection, simple server processing, cross-domain support, and good browser compatibility.
Disadvantages: certain latency, large server pressure, waste of bandwidth traffic, most invalid requests.

(2) Comet Long Polling (Comet Long Polling)

Long polling is an improved version of polling, in which the client sends HTTP to the server and waits for any new messages. It is not returned to the client until there is a message or a timeout. When the message returns, the client establishes the connection again, and so on. This reduces network bandwidth and CPU utilization to some extent.

This way also has certain disadvantages, real-time is not high. If it is a high real-time system, it will not adopt this approach. Since a GET request takes two RTTS back and forth, it is likely that the data will change a lot during this time, and the client will have received much of the data late.

In addition, the problem of low network bandwidth utilization has not been solved from the root. Each Request will have the same Header.

The Web also has AJAX long polling, also known as XHR long polling.

The client opens an AJAX request to the server and waits for the response. The server needs some special functionality to allow the request to be suspended. As soon as an event occurs, the server sends back the response in the pending request and closes the request. After processing the information returned from the server, the client requests again, reestablishes the connection, and so on.

Advantages: Reduced polling times, low latency, good browser compatibility.
Disadvantages: The server needs to maintain a large number of connections.

(3) Based on Comet Streaming

1. Iframe and HTMLFile based Streaming

The iframe streaming method inserts a hidden IFrame into the page, using its SRC property to create a long link between the server and the client, and the server transmits data (usually HTML, with JavaScript to insert the information) to the iframe to update the page in real time. The advantage of iframe streaming is browser compatibility.

There is one obvious drawback to using iframe to request a long connection: the progress bar at the bottom of Internet Explorer and Morzilla Firefox shows that the load is not complete, and the icon at the top of Internet Explorer rotates continuously to indicate that the load is in progress.

The geniuses at Google solved the problem of loading and displaying in IE using an ActiveX called “htmlFile” and applied this approach to gmail+ GTalk products. Alex Russell in “What Else is Burried Down in the depth of Google’s Amazing JavaScript?” This method is introduced in the article. Comet-iframe.tar. gz, which encapsulates a JavaScript Comet object based on iframe and HTMLfile, supports Internet Explorer and Mozilla Firefox, and can be used as a reference.

Advantages: Simple implementation, available on all browsers that support iframe, client connection once, server multiple push.
Disadvantages: The connection state cannot be accurately known. During the iframe request of Internet Explorer, the browser title is always in the loading state, and the status bar at the bottom also shows that it is loading, which results in poor user experience (htmlFile can be dynamically written into memory through ActiveXObject to solve this problem).

2. AJAX Multipart Streaming (XHR Streaming)

Implementation idea: The browser must support the multi-part flag, the client sends a Request through AJAX, the server holds the connection, Data can then be pushed to the client continuously through HTTP1.1’s chunked Encoding mechanism until timeout or manual disconnection.

Advantages: When the client is connected once, the server data can be pushed multiple times.
Disadvantages: Not all browsers support the multi-part flag.

3. Flash Socket (Flash Streaming)

Implementation idea: Embed a Flash program using the Socket class in the page. JavaScript communicates with the Socket interface on the server by calling the Socket interface provided by the Flash program. JavaScript receives data from the server through the Flash Socket.

Advantages: real instant communication, rather than pseudo instant.
Disadvantages: Flash plug-in must be installed on the client; If the protocol is not HTTP, the firewall cannot be automatically traversed.

4. Server-Sent Events

Server send event (SSE) is also a technology announced by HTML5 that a server initiates data transfer to a browser client. Once the initial connection is created, the event flow remains open until the client is closed. The technology is sent over traditional HTTP and has various features that WebSockets lack, such as automatic reconnection, event ID, and the ability to send arbitrary events.

SSE uses the server to declare to the client that the next message to be sent is streaming, which will be continuously sent. Instead of closing the connection, the client waits for a new data stream from the server, analogous to a video stream. SSE uses this mechanism to push information to the browser using streaming information. It is based on THE HTTP protocol and is currently supported by all browsers except IE/Edge.

SSE is a one-way channel that can only be sent from the server to the browser because streaming information is essentially a download.

The SSE data sent by the server to the browser must be UTF-8 encoded text with the following HTTP headers.

Content-Type: text/event-stream
Cache-Control: no-cache
Connection: keep-alive
Copy the code

In the first of the three lines, the content-type must specify the MIME Type as event-Steam

Advantages: It is suitable for frequent updates, low latency, and data is sent from the server to the client.
Disadvantages: Browser compatibility difficulty is high.

The above are four common stream-based practices, Iframe Streaming, XHR Streaming, Flash Streaming and Server-sent Events.

From browser compatibility difficulty – short poll /AJAX > Long poll /Comet > long connection /SSE

The arrival of the WebSocket

From the perspective of the above evolution, it is also a process of continuous improvement.

Short polling is inefficient and wastes resources (network bandwidth and computing resources). There is some latency, server stress is high, and most requests are invalid.

Long polling saves a lot of invalid requests, reduces the server pressure and occupies a certain amount of network bandwidth, but still needs to maintain a large number of connections.

Finally, in the stream-based mode, the server pushes the data to the client. The real-time streaming in this direction is better. But it is still one-way, and the client requesting server still needs one more HTTP request.

So people are considering, is there such a perfect solution, which can two-way communication, can save the header network overhead of the request, and has stronger scalability, it is better to support binary frame, compression and other features?

So they invented a solution that currently seems “perfect” — WebSocket.

With the release of the WebSocket standard in HTML5, it has directly replaced Comet as the new method of server push.

Comet is a push technology for the Web that enables a server to deliver updated information to a client in real time without a request from the client. It is currently implemented in two ways: long polling and IFrame streams.

Advantages:
Less control overhead, with relatively small packet headers for protocol control when data is exchanged between the server and client after the connection is created. Without extensions, this header size is only 2 to 10 bytes (depending on packet length) for server-to-client content; For client-to-server content, an additional 4-byte mask is required for this header. This overhead is significantly reduced compared to HTTP requests that carry the full header each time.
Stronger real-time, because the protocol is full-duplex, so the server can actively send data to the client at any time. Compared with HTTP requests that need to wait for the client to initiate the request before the server can respond, the latency is significantly less; Even compared to long polling like Comet, data can be delivered more times in a shorter time.
Long connection and keep the connection state. Unlike HTTP, Websocket needs to create a connection first, which makes it a stateful protocol that can then communicate without some state information. HTTP requests, on the other hand, may need to carry status information (such as authentication) with each request.
Two-way communication, better binary support. It has good compatibility with HTTP protocol. The default ports are also 80 and 443, and the handshake phase uses HTTP protocol, so it is not easy to be shielded during the handshake and can pass various HTTP proxy servers.
Disadvantages: Some browsers are not supported (more and more browsers are supported). Application scenarios: Supported by newer browsers, not limited by frameworks, and with high scalability.

To sum up WebSocket in one sentence:

WebSocket is a stateful protocol (it is different from stateless HTTP) which is provided by HTML5 to carry out full-duplex communication independently on a single TCP connection. It also supports binary frames, extension protocols, some customized sub-protocols, compression and other features.

For now, WebSocket is the perfect alternative to AJAX polling and Comet. But some scenarios can’t replace SSE, WebSocket and SSE have their strengths!

WebSocket handshake

WebSocket’s RFC6455 standard provides two advanced components, one is an open HTTP handshake for negotiating connection parameters, and the other is a binary message framing mechanism for supporting low overhead message-based text and binary data transmission. Let’s take a closer look at these two advanced components. In this section we’ll go into the details of the handshake, and in the next section we’ll talk about binary message framing.

First, RFC6455 writes the following:

The WebSocket protocol attempts to implement two-way HTTP communication in the existing HTTP infrastructure, so it also uses HTTP ports 80 and 443…… However, this design is not limited to WebSocket communication over HTTP, and future implementations could use a simpler handshake on a dedicated port without having to redefine a protocol.

– the WebSocket Protocol RFC 6455

WebSocket supports handshakes on any port, not just HTTP handshakes.

Most handshakes currently rely on HTTP. Because the HTTP infrastructure is pretty well established.

Standard handshake procedure

Let’s look at a concrete Example of a WebSocket handshake. Take my own website, Threes.halfrost.com/.

Open this site and a WSS handshake request will be opened as soon as the page is rendered. The handshake request is as follows:

GET wss://threes.halfrost.com/sockjs/689/8x5nnke6/websocket HTTP / 1.1 / / request method must be GET the HTTP version must be at least 1.1 Host: threes.halfrost.com Connection: Upgrade Pragma: no-cache Cache-Control: no-cache Upgrade: Websocket / / request to upgrade to the websocket protocol Origin: https://threes.halfrost.com Sec - websocket - Version: 13 // WebSocket protocol version used by the client User-agent: Mozilla/5.0 (Linux; The Android 6.0. Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.181 Mobile Safari/537.36 Accept-encoding: gzip, deflate, br Accept-Language: zh-CN,zh; Q = 0.9, en. Q = 0.8 cookies: _ga = GA1.2.00000006.14111111496; _gid = GA1.2.23232376.14343448247; Hm_lvt_d60c126319 = 1524898423152574, 369152206, 975152784, 803; Hm_lpvt_d606319=1526784803; _gat_53806_2=1 Sec-WebSocket-Key: WZgx0uTOgNUsHGpdWc0T +w== // An automatically generated key to verify server support for the protocol, whose value must be a randomly selected 16-byte Base64-encoded value of sec-websocket-extensions consisting of nonce: permessage-deflate; Client_max_window_bits // List of optional client supported protocol extensions, indicating the protocol level extensions that the client wishes to useCopy the code

There are several differences between this protocol and plain HTTP:

The requested URL is ws:// or WSS ://, not HTTP:// or HTTPS://. Since websocket may be used outside of the browser, custom URIs are used here. Analogous to HTTP, WS: a common request occupies the same port 80 as HTTP. WSS: SSL – based secure transmission, using the same port 443 as TLS.

Connection: Upgrade
Upgrade: websocket
Copy the code

Common HTTP packets generally do not have these two parts. Here, Upgrade is used to Upgrade the protocol, indicating that the protocol is upgraded to webSocket.

Sec-WebSocket-Version: 13
Sec-WebSocket-Key: wZgx0uTOgNUsHGpdWc0T+w==
Sec-WebSocket-Extensions: permessage-deflate; client_max_window_bits
Copy the code

Sec-websocket-version Indicates the Version of WebSocket. At first, there are many WebSocket protocols, and different vendors have their own Version, but it has been decided. If the server does not support the Version, you need to return an sec-websocket-version containing the Version number supported by the server. (See the multi-version WebSocket handshake below for details.)

The latest version is 13, of course very early versions 7 and 8 are possible.

Note: Values 9, 10, 11, and 12 are not used as valid SEC-websocket-version, although draft versions of this document (09, 10, 11, and 12) have been released (they are mostly not editorial modifications and clarifications rather than changes to the Wire Protocol). These values are retained in the IANA registry but will not be used.

+--------+-----------------------------------------+----------+
|Version |                Reference                |  Status  |
| Number |                                         |          |
+--------+-----------------------------------------+----------+
| 0      + draft-ietf-hybi-thewebsocketprotocol-00 | Interim  |
+--------+-----------------------------------------+----------+
| 1      + draft-ietf-hybi-thewebsocketprotocol-01 | Interim  |
+--------+-----------------------------------------+----------+
| 2      + draft-ietf-hybi-thewebsocketprotocol-02 | Interim  |
+--------+-----------------------------------------+----------+
| 3      + draft-ietf-hybi-thewebsocketprotocol-03 | Interim  |
+--------+-----------------------------------------+----------+
| 4      + draft-ietf-hybi-thewebsocketprotocol-04 | Interim  |
+--------+-----------------------------------------+----------+
| 5      + draft-ietf-hybi-thewebsocketprotocol-05 | Interim  |
+--------+-----------------------------------------+----------+
| 6      + draft-ietf-hybi-thewebsocketprotocol-06 | Interim  |
+--------+-----------------------------------------+----------+
| 7      + draft-ietf-hybi-thewebsocketprotocol-07 | Interim  |
+--------+-----------------------------------------+----------+
| 8      + draft-ietf-hybi-thewebsocketprotocol-08 | Interim  |
+--------+-----------------------------------------+----------+
| 9      +                Reserved                 |          |
+--------+-----------------------------------------+----------+
| 10     +                Reserved                 |          |
+--------+-----------------------------------------+----------+
| 11     +                Reserved                 |          |
+--------+-----------------------------------------+----------+
| 12     +                Reserved                 |          |
+--------+-----------------------------------------+----------+
| 13     +                RFC 6455                 | Standard |
+--------+-----------------------------------------+----------+
Copy the code

[RFC 6455]

The |Sec-WebSocket-Key| header field is used in the WebSocket opening handshake. It is sent from the client to the server to provide part of the information used by the server to prove that it received a valid WebSocket opening handshake. This helps ensure that the server does not accept connections from non-WebSocket clients (e.g., HTTP clients) that are being abused to send data to unsuspecting WebSocket servers.

The sec-websocket-key field is used for the handshake phase. It is sent from the client to the server to provide part of the content, which the server uses to prove that it has received the information and that the WebSocket handshake is valid. This helps ensure that the server does not accept connections from non-WebSocket clients (such as HTTP clients) that are abused to send data to unsuspecting WebSocket servers.

Sec-websocket-key is generated randomly by the browser to provide basic protection against malicious or unintentional connections.

Sec-websocket-extensions are part of the upgrade negotiation and are covered in the next section.

Then look at Response:

HTTP/1.1 101 Switching Protocols // 101 HTTP response code confirmation upgrade to WebSocket protocol Server: nginx/1.12.1 Date: Sun, 20 May 2018 09:06:28 GMT Connection: upgrade Upgrade: websocket Sec-WebSocket-Accept: 375guuMrnCICpulKbj7+JGkOhok= // The signature key-value verification protocol supports sec-Websocket-Extensions: permessage-deflate // The WebSocket extension selected by the serverCopy the code

In Response, the HTTP 101 Response code is used to confirm the upgrade to the WebSocket protocol.

There are also two WebSocket headers:

Sec-websocket-accept: 375guuMrnCICpulKbj7+JGkOhok= // Signature keyvalue verification protocol support sec-websocket-extensions: Permessage-deflate // The WebSocket extension selected by the serverCopy the code

Sec-websocket-accept is an sec-websocket-accept Key that is encrypted after being confirmed by the server.

Sec-websocket-accept is calculated as follows:

Sec-websocket-key = 258eAFa5-E914-47DA-95CA-C5AB0DC85b11; (258eAFa5-E914-47DA-95CA-C5AB0DC85B11) Globally Unique identifiers (GUIDS, [RFC4122])
The sha-1 hash and base64-encoded result is sec-websocket-accept.

Pseudo code:

> toBase64(sha1( Sec-WebSocket-Key + 258EAFA5-E914-47DA-95CA-C5AB0DC85B11 ))
Copy the code

Similarly, sec-websocket-key/sec-websocket-accept only guarantees a successful handshake, but does not guarantee data security. WSS :// is slightly safer.

Subprotocols in the handshake

WebSocket handshake may involve subprotocol issues.

Let’s look at the WebSocket object initialization function:

WebSocket WebSocket(
inDOMString URL, // indicates the URL to connect to. This URL should be the address that responds to the WebSocket.inOptional DOMString protocols // Can be a single protocol name string or an array of multiple protocol name strings. The default is an empty string. ;Copy the code

So here’s an optional, which is an array of protocols that can be negotiated.

var ws = new WebSocket('wss://example.com/socket'['appProtocol'.'appProtocol-v2']);

ws.onopen = function () {
if (ws.protocol == 'appProtocol-v2') {... }else{... }}Copy the code

When you create a WebSocket object, you pass an optional array of subprotocols to tell the server which protocols the client understands or wants the server to accept. The server can select several supported protocols from the data and return them. If none is supported, the handshake fails. The onError callback is triggered and the connection is disconnected.

The sub-protocol can be a custom protocol.

Multiple versions of websocket handshake

Using the WebSocket Version notification capability (sec-websocket-version header field), a client can initially request the Version of the WebSocket protocol it chooses (this does not have to be the latest one supported by the client). If the server supports the requested version and the handshake message is valid, the server will accept that version. If the server does not support the requested Version, it must respond with an SEC-websocket-version header field (or multiple SEC-websocket-Version header fields) containing all the versions it will use. At this point, if the client supports a notification version, it can redo the WebSocket handshake with the new version value.

Here’s an example:

GET /chat HTTP/1.1 Host: server.example.com Upgrade: websocket Connection: Upgrade... Sec-WebSocket-Version: 25Copy the code

If the server does not support version 25, the following message is displayed:

HTTP/1.1 400 Bad Request... Sec-WebSocket-Version: 13, 8, 7Copy the code

If the client supports version 13, you need to re-shake the handshake:

GET /chat HTTP/1.1 Host: server.example.com Upgrade: websocket Connection: Upgrade... Sec-WebSocket-Version: 13Copy the code

4. WebSocket upgrade negotiation

In the WebSocket handshake phase, there are five WebSocket headers. These five headers are related to upgrade negotiation.

Sec-websocket-version Indicates the Version it wants to use (usually Version 13), and if the server does not support this Version, it needs to return the Version it supports. Once the client gets the Response, it needs to re-shake the version number it supports. The header client must send.
Sec-websocket-key The client requests a Key that is automatically generated. The header client must send.
Sec-websocket-accept Indicates the response value calculated by the server for the client’s sec-websocket-key. The header server must send.
Sec-websocket-protocol Is used to negotiate application subprotocols: the client sends a list of supported protocols, and the server must respond with only one Protocol name. If the server does not support any protocol, the handshake fails. The client may not send the subprotocols, but if it does, the server cannot support any of them and the handshake fails. This header client can send optionally.
Sec-websocket-extensions are used to negotiate WebSocket Extensions for this connection: the client sends the supported Extensions, and the server confirms that it supports one or more Extensions by returning the same header. This header client can send optionally. If the server does not support either, the handshake will not fail, but no extensions can be used for this connection.

The negotiation is in the handshake phase. After the handshake is complete, the HTTP communication ends and all the following full-duplex is handed over to WebSocket (TCP communication).

5. WebSocket protocol extension

The HyBi Working Group, which is responsible for developing the WebSocket specification, has two sec-websocket-extensions:

Multiplex Extension A Multiplexing Extension for WebSockets This Extension can separate the logical connection from the WebSocket and share the underlying TCP connection.
Compression Extensions for WebSocket add Compression capabilities to the WebSocket protocol. (e.g. X-webkit-deflate-frame extension)

Without multiplexing extensions, each WebSocket connection has only one dedicated TCP connection, and it is prone to queue head blocking when a large message is split into multiple frames. Queue head blocking causes latency, so it’s key to keep it as small as possible when splitting multiple frames. However, after the multiplexing extension, multiple connections multiplexing a TCP connection, each channel will still have the problem of queue head blocking. In addition to multiplexing, messages are also multiplexed in parallel.

Performance is better if WebSocket transport is performed over HTTP2, which natively supports streaming multiplexing. Using HTTP2’s framing mechanism for WebSocket framing, multiple WebSockets can be transmitted in the same session.

6. WebSocket data frames

Another advanced component of WebSocket is the binary message framing mechanism. WebSocket splits the application message into one or more frames. The receiver assembles the frames and notifies the receiver after receiving the complete message.

WebSocket Data frame structure

WebSocket data frame format is as follows:

0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-------+-+-------------+-------------------------------+ |F|R|R|R| opcode|M| Payload len | Extended payload length | |I|S|S|S| (4) |A| (7) | (16/64) | |N|V|V|V| |S| | (if payload len==126/127)   |
 | |1|2|3|       |K|             |                               |
 +-+-+-+-+-------+-+-------------+ - - - - - - - - - - - - - - - +
 |     Extended payload length continued, if payload len == 127  |
 + - - - - - - - - - - - - - - - +-------------------------------+
 |                               |Masking-key, if MASK set to 1  |
 +-------------------------------+-------------------------------+
 | Masking-key (continued)       |          Payload Data         |
 +-------------------------------- - - - - - - - - - - - - - - - +
 :                     Payload Data continued ...                :
 + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +
 |                     Payload Data continued ...                |
 +---------------------------------------------------------------+
Copy the code

FIN:0 indicates that the fragment is not the last, and 1 indicates that the fragment is the last.
RSV1, RSV2, RSV3:

In general, they’re all 0’s. When the client and server negotiate to use WebSocket extension, the three flag bits can be non-0, and the meaning of the value is defined by the extension. If a non-zero value is present and the WebSocket extension is not used, the connection fails.

Opcode:

%x0: indicates a continuation frame. When Opcode is 0, it indicates that the data transmission adopts a data fragment, and the received data frame is one of the data fragments. %x1: indicates a text frame; %x2: indicates a binary frame; %x3-7: reserved operation code for subsequent defined non-control frames; %x8: Indicates that the connection is down. %x9: indicates a heartbeat request (ping); %xA: indicates a heartbeat response (pong); % xb-f: Reserved operation code for subsequent defined control frames.

Mask:

Indicates whether to perform a mask xor operation on the data payload. 1 indicates yes, 0 indicates no. (This applies only to messages sent from clients to servers)

Payload len:

Represents the length of the data payload, where there are three cases:

If the data length is between 0 and 125, the Payload len is 7 bits long. If the length of the frame is 126, the Payload len must be 7 + 16 bits. The next 2 bytes of the 16-bit unsigned integer represent the length of the frame. If the length of the frame is 127, the Payload len needs to be expressed as 7 + 64 bits, and the next 8 bytes of the 64-bit unsigned integer is the length of the frame.

Masking – key:

If Mask is 0, there is no Masking key. If Mask is 1, the length of Masking-key is 4 bytes and 32 bits.

The mask is a random 32-bit value selected by the client. When preparing frames for a mask, the client must select a new mask key from the set of allowed 32-bit values. The mask key needs to be unpredictable; Therefore, the mask key must come from a strong entropy source, and the mask key used for a given frame must not be easily predicted by the server/agent to be used for subsequent frames. The unpredictability of the mask key is necessary to prevent the author of a malicious application from selecting the bytes that appear on the message. RFC4086 [RFC4086] discusses what is needed for a suitable entropy source for security-sensitive applications.

The mask does not affect the length of load Data. To transform mask data to unmask data, or vice versa, the following algorithm is applied. The same algorithm applies regardless of the direction of transformation. For example, the same steps apply to mask data as well as unmask data.

Original-octet-i: indicates the I th byte of the original data.
Transformed -octet -I: indicates the i-th byte of the transformed data.
J: is the result of I mod 4.
Masking -key-octet-j: indicates the JTH byte of the mask key.

The octet I (” poly-OCteT-i “) of the transform data is the octet I (“original- OCtet-I “) of the original data, XOR (XOR) I of the mask key (“masking- key-OCtet-j “) at the position of module 4:

j = i MOD 4
transformed-octet-i = original-octet-i XOR masking-key-octet-j
Copy the code

The algorithm is described as follows: It performs a cyclic xor operation by bit. First, it obtains the corresponding value X in Masking-key by moduling the index of the bit. Then, it performs xor on the bit and X to obtain the real byte data.

Note: Masks are not used to prevent data leaks, but rather to prevent proxy cache poisoning attacks by malicious scripts running in clients on intermediate devices that do not support Websockets.

For details of this attack, see W2SP 2011’s paper Talking to Yourself for Fun and Profit.

There are two main stages of attack,

First, make a WebSocket connection. The hacker sends a WebSocket handshake to his server through the proxy server. Since the WebSocket handshake is an HTTP message, when the proxy server forwards the hacker’s Response back to the hacker, it considers that the HTTP request ends.

Second, create a “poison” attack on the proxy server. With the WebSocket handshake successful, the hacker can now send data to his server, sending a text message in a carefully crafted HTTP format. The host for this data needs to be forged to be the server that the average user is going to access, and the requested resource is the resource that the average user is going to request. The proxy server will think that this is a new request, so to the hacker’s own server request, at this time also need the hacker’s own server cooperation, after receiving this message “poisoned”, immediately return “poison”, return some malicious script resources and so on. So far, “poisoning” success.

When the user requests the security resource to be requested through the proxy server, the host and URL have been cached by hackers using HTTP format text information into the proxy server, and the “poisoned” resources are also cached. At this time, when the user requests the resources of the same host and URL, When the proxy cache server discovers that it has been cached, it will immediately return the malicious script or resource “poisoned” to the user. That’s when the user is attacked.

Payload Data:

Load data can be divided into extended data and application data.

Extended data: 0 bytes of extended data if no extension is negotiated. The length of extended data, if present, must be fixed during the handshake phase. The length of the payload data is also included in the extended data.

Application data: After extended data, if any.

WebSocket control frame

The control frame is determined by the opcode, the highest bit of which is 1. The opcodes currently defined to control the frame include 0x8 (Close), 0x9 (Ping), and 0xA (Pong). The opcode 0xB-0xF is reserved for future control frames that have not yet been defined.

The control frame is used to convey the state about the WebSocket. A control frame can be inserted into the middle of a sub-frame message.

All control frames must have a payload length of 125 bytes or less, and control frames must not be split.

After receiving the control frame of the 0x8 Close opcode, you can Close the underlying TCP connection. The client can also close its TCP connection after the server is shut down and there is no response for a period of time.

In RFC6455, the recommended status code for shutdown is given. There is no specification definition, only a predefined status code is given.

Status code	instructions	Save ✔︎ or disable * ︎
0-999.	The status code in this range is not used.	✖ ︎
1000	Indicates normal closure, meaning that the proposed connection has been completed.
1001	Represents an endpoint “going away,” such as a server shutdown or browser navigation to another page.
1002	Indicates that the endpoint terminated the connection because of a protocol error.
1003	Indicates that the endpoint terminates the connection because it received a data type that it could not receive (for example, the endpoint only understands text data, but received a binary message).
1004	Retained. May define its specific meaning in the future.	✔ ︎
1005	Is a reserved value and cannot be set by an endpoint in a close control frame. It is specified for use in applications that expect a status code to indicate that no status code actually exists.	✔ ︎
1006	Is a reserved value and cannot be set by an endpoint in a close control frame. It is specified for use in applications that expect a status code to indicate that the connection is abnormally closed.	✔ ︎
1007	Indicates that the endpoint terminates the connection because the data received in the message does not match the message type (for example, non-UTF-8 [RFC3629] data exists in the text message).
1008	Indicates that the endpoint terminates the connection because the received message violates its policy. This is a generic status code that can be returned when no other appropriate status code is available (such as 1003 or 1009) or if the details of the policy need to be hidden.
1009	Indicates that the endpoint terminates the connection because the received message is too large for its processing.
1010	Indicates that the endpoint (client) terminates the connection because it expects the server to negotiate one or more extensions, but the server does not return them in a WebSocket handshake response message. The required list of extensions should appear in the Reason section of the close frame.
1011	Indicates that the server terminates the connection because it encounters an unexpected condition that prevents it from fulfilling the request.
1012
1013
1014
1015	Is a reserved value and cannot be set to a status code by an endpoint in a close frame. It is specified for use in applications that expect a status code to indicate that the connection was closed due to a TLS handshake failure (for example, the server certificate could not be verified).	✔ ︎
1000-2999.	Status codes within this scope are reserved for the extended definitions specified in this Agreement, its future amendments, and a permanent and readily available public specification.	✔ ︎
3000-3999.	The status codes in this range are reserved for use by libraries, frameworks, and applications. These status codes are registered directly with IANA. This specification does not define the interpretation of these status codes.	✔ ︎
4000-4999.	Status codes in this range are reserved for private use and therefore cannot be registered. These status codes can be used by previous protocols between WebSocket applications. This specification does not define the interpretation of these status codes.	✔ ︎

When a control frame containing the 0x9 Ping opcode is received, a response frame containing the Pong opcode should be sent immediately, unless a close frame is received. Both ends send a Ping frame at any time after the connection is established but before it is closed. A Ping frame can contain “application data”. A Ping frame can be used as a Keepalive heartbeat packet.
After receiving the control frame of 0xA Pong opcode, we know that the other party can still respond. The Pong frame must contain exactly the same data as the application data for the Ping frame being responded to. If an endpoint receives a Ping frame and has not yet sent a Pong response to the previous Ping frame, it may choose to send a Pong frame to the most recently processed Ping frame. A Pong frame may be actively sent, which acts as a one-way heartbeat. Try not to actively send pong frames.

WebSocket frame splitting rule

Frame splitting rules are defined by RFC6455, and applications are not aware of how frames are broken. Framing is done by the client and the server.

Framing also makes better use of protocol extensions for multiplexing, which requires the ability to split messages into smaller segments to better share the output channel.

RFC 6455 provides the following frame splitting rules:

A sharded message consists of a single frame with a FIN bit set and a non-zero opcode.
A sharded message consists of a single frame with FIN bit clearance and a non-zero opcode, follows zero or more frames with FIN bit clearance and opcode set to 0, and terminates in a frame with FIN bit clearance and opcode set to 0. A fragmented message is conceptually equivalent to a single large message, and its load is equivalent to the load of sequential concatenated fragments. However, in cases where an extension exists, this may not apply to the interpretation of the existence of “extension data” defined by the extension. For example, “extended data” may exist only at the beginning of the first fragment and be applied to subsequent fragments, or “extended data” may exist in every fragment that is used only to a particular fragment. The following example shows how sharding works without “extended data.”

Example: For a text message sent as three fragments, the first fragment will have a 0x1 opcode and a FIN bit clearing, the second fragment will have a 0x0 opcode and a FIN bit clearing, and the third fragment will have a 0x0 opcode and a FIN bit setting. The 0x0 opcode, explained above, represents a continuation frame. When the O opcode is 0x0, it indicates that the data transmission adopts a data fragment, and the received data frame is one of the data fragments.)

A control frame may be injected into the middle of a shard message. The control frame itself must not be split.
Message shards must be delivered to recipients in the order in which they were sent by the sender.
A message in a fragment must not alternate with another message in a fragment unless an extension has been negotiated that explains the alternation.
An endpoint must be able to process control frames in the middle of a fragmented message.
A sender can create fragments of any size for a non-control message.
Clients and servers must support receiving shard and non-shard messages.
Since control frames cannot be sharded, a middleware must not attempt to change the fragmentation of control frames.
If any reserved bit values are used and the meaning of these values is unknown to the middleware, a middleware must not change the fragmentation of a message.
In a connection context where extensions have been negotiated and the middleware does not know the semantics of the negotiated extensions, a middleware must not change the sharding of any messages. Similarly, a middleware that does not see a WebSocket handshake (and is not notified of its contents) and leads to a WebSocket connection must not change the fragmentation of any messages for that link.
Because of these rules, all shards of a message are of the same type, set with the opcode of the first fragment. Because control frames cannot be sharded, the type used for all shards in a message must be either text, binary, or a reserved opcode. Note: If the control frame cannot be inserted, a ping delay, for example, if followed by a large message will be very long. Therefore, a control frame is required to be processed in the middle of a shard message.

Implementation note: in the absence of any extension, a receiver does not have to buffer the entire frame sequentially to process it. For example, if a streaming API is used, a portion of a frame can be delivered to the application. Note, however, that this assumption may not apply to all future WebSocket extensions.

WebSocket API and data format

1. WebSocket API

The WebSocket API is extremely concise and can only call the following functions:

var ws = new WebSocket('wss://example.com/socket');
ws.onerror = function (error) { ... }
ws.onclose = function() {... } ws.onopen =function () {
ws.send("Connection established. Hello server!");
}
ws.onmessage = function(msg) {
	if(msg.data instanceof Blob) {
   		processBlob(msg.data);
  	} else{ processText(msg.data); }}Copy the code

After creating a new WebSocket object and sending (), there are only four callback methods left.

One extra thing to note about the send() method is that it is asynchronous, not synchronous. This means that when we throw something into this function, it returns asynchronously, so don’t mistake it for sent. WebSocket itself has a queuing mechanism. Data is first dropped into the data cache and then sent in the queue order.

If you are in a huge file queue, then some message comes with a higher priority than this message, for example, the system has failed and needs to be disconnected immediately. Because the queue is behind the large file, you must wait for the large file to be sent before sending this higher-priority message. This causes the problem of queue head blocking, resulting in higher priority message delays.

The WebSocket API maker took this into account and gave us two additional properties that are one of the few that can change the behavior of WebSocket objects: bufferedAmount and binaryType.

if (ws.bufferedAmount == 0)
    ws.send(evt.data);
Copy the code

In this case, you can use bufferedAmount to listen for the number of buffers in the cache to avoid blocking at the head of the Queue, and you can also use it in conjunction with the Priority Queue to send messages by Priority.

2. Data format

WebSocket has no restrictions on the format of the transmission. It can be text or binary, either. The protocol uses the Opcode type field to distinguish between UTF-8 and binary. The WebSocket API can accept UTF-8 encoded DOMString objects as well as binary data such as ArrayBuffer, ArrayBufferView, or Blob.

The browser handles the received data without manually setting any other options. By default, text is converted to a DOMString object, and binary data or Blob objects are passed directly to the application without any processing.

var ws = new WebSocket('wss://example.com/socket'); 
ws.binaryType = "arraybuffer";
Copy the code

The only interference is to cast all the received binary data to arrayBuffer rather than Blob. As for why to convert to arrayBuffer, the W3C candidates suggest the following:

The user agent can take this option as a hint as to what to do with the binary data it receives: if it is set to “BLOb,” it can be safely dumped to disk; If set to “ArrayBuffer,” it’s probably more efficient to process it in memory. Naturally, we encourage user agents to use more subtle cues to decide whether or not to put incoming data into memory.

— The WebSocket API W3C Candidate Recommendation

Simply put: if converted to a Blob object, it represents an immutable file object or raw data. If you don’t need to modify it or shard it, leaving it as a Blob object is a good choice. If you want to process this raw data, it is obviously more appropriate to process it in memory, so convert it to arrayBuffer.

WebSocket performance and Usage scenarios

Here’s a test from WebSocket.org that compares XHR polling with WebSocket:

The blue bar chart above shows the traffic consumed by Polling. In this test, the HTTP request and response headers cost a total of 871 bytes. Of course, the overhead of the header is different each time you test different requests. This test was conducted with 871-byte requests.

Use case A: 1000 clients polling every second: Network throughput is (871 x 1,000) = 871,000 bytes = 6,968,000 bits per second (6.6 Mbps) Use case B: 10,000 clients polling every second: Network throughput is (871 x 10,000) = 8,710,000 bytes = 69,680,000 bits per second (66 Mbps) Use case C: 100,000 clients polling every 1 second: Network throughput is (871 x 100,000) = 87,100,000 bytes = 696,800,000 bits per second (665 Mbps) while the Websocket Frame is Overhead: just two bytes of overhead instead of 871 bytes!

Use case A: 1000 Clients receive 1 message per second: Network throughput is (2 x 1,000) = 2,000 bytes = 16,000 bits per second (0.015 Mbps) Use case B: Clients Receive 1 message per second: Network throughput is (2 x 10,000) = 20,000 bytes = 160,000 bits per second (0.153 Mbps) Use case C: 100,000 Clients receive 1 message per second: Network throughput is (2 x 100,000) = 200,000 bytes = 1,600,000 bits per second (1.526 Mbps)

The same number of client Polling times per second, when the number of high frequency up to 10W/s, Polling needs 665Mbps, while Websocket only costs 1.526Mbps, nearly 435 times!!

As a result, WebSocket is much better than polling efficiency and network speed consumption.

In terms of usage scenarios, XHR, SSE and WebSocket have their advantages and disadvantages.

XHR is simpler than the other two approaches and is easy to implement with HTTP’s well-established infrastructure. However, it does not support request Streams, nor does it have perfect support for corresponding Streams (Streams API support is required to support response Streams). Data transfer format, text and binary support, also support compression. HTTP is responsible for framing its messages.

SSE also does not support request flow. After a handshake, the server can send data as a response flow to the client using the event source protocol. SSE only supports text data, not binary. Because SSE was not designed to transfer binaries, binary objects can be encoded as Base64 if necessary and then transferred using SSE. SSE also supports compression, and event streams are responsible for framing it.

WebSocket is currently the only full-duplex protocol implemented over the same TCP connection, which supports both request flow and response flow perfectly. Support text and binary data, its own binary frame. The compression aspect is a bit worse, because some of it is not supported, such as the X-Webkit-Deflate-frame extension, which the server did not support in my mid-distance WS request above.

It would be nice if all network environments could support WebSocket or SSE. However, this is not practical, the network environment is changing, some networks may block WebSocket communication, or user devices do not support the WebSocket protocol, so XHR comes into play.

If the client doesn’t need to send messages to the server and only needs constant updates in real time, SSE is also a good option to consider. However, SSE is currently poorly supported on IE and Edge. WebSocket is better than SSE in this respect.

Therefore, different protocols should be selected according to different scenarios.

Reference:

RFC6455 Server-sent Events tutorial Comet: Definitive Guide to WEB Performance based on HTTP Long Connection “Server Push” Technology 10.3. Attacks On Infrastructure (Masking) Why are WebSockets masked? How does websocket frame masking protect against cache poisoning? What is the mask in a WebSocket frame?

Making Repo: Halfrost – Field

Follow: halfrost dead simple

Source: github.com/halfrost/Ha…