Introduction to the

WebSocket consists of several standards: the WebSocket API is defined by W3C, while the WebSocket protocol (RFC 6455) and its extensions are defined by HyBi Working Group(IETF).

HTML5 began to provide a browser and server for full duplex communication network technology, belongs to the application layer protocol. It is based on the TCP transport protocol and reuses the HTTP handshake channel. WebSocket can realize bidirectional, message-based text or binary data transmission between client and server. A WebSocket connection is far from being a network socket, because browsers hide all the complexity behind this simple API and offer more services:

  • Connection negotiation and same-origin policy;
  • Interoperability with existing HTTP infrastructures;
  • Message-based communication and efficient message framing;
  • Subprotocol negotiation and scalability.

The characteristics of

The reason why websockets are needed when you already have polling is because short and long polling have a drawback: communication can only be initiated by the client. WebSocket provides a civilized and elegant full-duplex communication solution. It is generally suitable for scenarios with strong real-time requirements on data, such as communication, stock, live broadcast and shared desktop, especially for frequent interactions between clients and services, such as chat rooms, real-time sharing, multi-person collaboration and other platforms. His main characteristics are as follows:

  • Based on TCP protocol, the implementation of the server side is relatively easy.
  • It has good compatibility with HTTP protocol. The default ports are also 80 and 443, and the handshake phase uses HTTP protocol, so it is not easy to mask the handshake and can pass various HTTP proxy servers.
  • The data format is relatively light, with low performance overhead and high communication efficiency. The header information exchanged between the server and client is only about 2 bytes;
  • You can send text or binary data.
  • There are no same-origin restrictions, and clients can communicate with any server.
  • The protocol identifier isws(WSS if encrypted), the server URL is the URL. Ex:ws://example.com:80/some/path
  • Do not frequently create and destroy TCP requests, reduce the occupation of network bandwidth resources, and save server resources.
  • WebSocket is purely event-driven, and once a connection is established, it listens for events to handle incoming data and changing connection state, all in the form of a sequence of frames. After the server sends the data, messages and events arrive asynchronously.
  • No timeout processing.

webSocket.readyState

The readyState property returns the current state of the instance object, of which there are four types.

  • CONNECTING: The value is 0, indicating that a connection is being established.
  • OPEN: the value is 1, indicating that the connection is successful and communication can be started.
  • CLOSING: A value of 2 indicates that the connection is CLOSING.
  • CLOSED: the value is 3, indicating that the connection is CLOSED or fails to be opened.

webSocket.onopen

The onopen property of the instance object, which specifies the callback function if the connection is successful.

ws.onopen = function() {
  ws.send("Hello Server!");
};
// If you want to specify multiple callback functions, use the addEventListener method.
ws.addEventListener("open".function(event) {
  ws.send("Hello Server!");
});
Copy the code

webSocket.onclose

The onClose property of the instance object, which specifies the callback function if the connection is closed.

ws.onclose = function(event) {
  var code = event.code;
  var reason = event.reason;
  var wasClean = event.wasClean;
  // handle close event
};
ws.addEventListener("close".function(event) {
  var code = event.code;
  var reason = event.reason;
  var wasClean = event.wasClean;
  // handle close event
});
Copy the code

webSocket.onmessage()\webSocket.send()

The onMessage property of the websocket.onMessage () instance object, which specifies the callback function to receive server data. It can also handle binary data.

ws.onmessage = function(event) {
  var data = event.data;
  // Process data
};

ws.addEventListener("message".function(event) {
  var data = event.data;
  // Process data
});

// Note that the server data can be either text or binary data (' blob objects' or Arraybuffer objects).
ws.onmessage = function(event) {
  if (typeof event.data === String) {
    console.log("Received data string");
  }

  if (event.data instanceof ArrayBuffer) {
    var buffer = event.data;
    console.log("Received arraybuffer"); }};// BloB data is received
ws.binaryType = "blob";
ws.onmessage = function(e) {
  console.log(e.data.size);
};

// Receive ArrayBuffer data
ws.binaryType = "arraybuffer";
ws.onmessage = function(e) {
  console.log(e.data.byteLength);
};
Copy the code

The send() method of the websocket.send () instance object is used to send data to the server.

ws.onmessage = function(event) {
  var data = event.data;
  // Process data
};

ws.addEventListener("message".function(event) {
  var data = event.data;
  // Process data
});
// Send an example of Blob objects.
var file = document.querySelector('input[type="file"]').files[0];
ws.send(file);

// Send an example of an ArrayBuffer object.
// Sending canvas ImageData as ArrayBuffer
var img = canvas_context.getImageData(0.0.400.320);
var binary = new Uint8Array(img.data.length);
for (var i = 0; i < img.data.length; i++) {
  binary[i] = img.data[i];
}
ws.send(binary.buffer);
Copy the code

webSocket.bufferedAmount

The bufferedAmount property of the instance object, indicating how many bytes of binary data remain unsent. It can be used to determine whether the transmission is complete.

var data = new ArrayBuffer(10000000);
socket.send(data);

if (socket.bufferedAmount === 0) {
  // Send complete
} else {
  // Send is not finished yet
}
Copy the code

webSocket.onerror

The onError property of the instance object, which specifies the callback function when an error is reported.

socket.onerror = function(event) {
  // handle error event
};
socket.addEventListener("error".function(event) {
  // handle error event
});
Copy the code

WebSocket learning

For network application layer protocol learning, the most important is often the connection establishment process, data exchange tutorial. Of course, the format of the data is unavoidable, as it directly determines the capabilities of the protocol itself. Good data formats make protocols more efficient and scalable. It can be learned through the following aspects:

  • How to Establish a connection
  • Data frame format
  • The data transfer
  • Connection hold + heartbeat
  • The role of the Sec – WebSocket – Key/Accept
  • The function of the data mask

The instance

Before going into the details of the protocol, let’s take a look at a simple example to get a feel for it. Examples include WebSocket server and WebSocket client (web page). The full code can be found here. Here the server uses the WS library. The WS implementation is lighter and more suitable for learning purposes than the familiar socket. IO.

The service side

The code is as follows: listen on port 8080. When a new connection request arrives, a log is printed and a message is sent to the client. Logs are also generated when a message is received from the client.

const express = require("express");
const app = express();
const server = require("http").Server(app);
const WebSocket = require("ws");

const wss = new WebSocket.Server({ port: 8080 });
wss.on("connection".function connection(ws) {
  console.log("server: receive connection");
  ws.on("message".function incoming(message) {
    console.log("server: recevied: %s", message);
  });
  ws.send("world");
});

app.get("/".function(req, res) {
  res.sendfile(__dirname + "/index.html");
});
app.listen(3000);
Copy the code

The running result of the server is as follows:

The client

Initiate a WebSocket connection to port 8080. After the connection is established, logs are generated and messages are sent to the server. Logs are also generated after receiving messages from the server.

const ws = new WebSocket("ws://localhost:8080");
ws.onopen = function() {
  console.log("ws onopen");
  ws.send("from client:hello");
};
ws.onmessage = function(e) {
  console.log("ws onmessage");
  console.log("from server:" + e.data);
};
Copy the code

The running result of the client is as follows:

How to Establish a connection

As mentioned earlier, WebSockets reuse the HTTP handshake channel. Specifically, the client negotiates the upgrade protocol with the WebSocket server through HTTP requests. After the protocol upgrade, the subsequent data exchange follows the WebSocket protocol.

Client: Applies for a protocol upgrade

First, the client initiates a protocol upgrade request. As you can see, the standard HTTP packet format is adopted and only the GET method is supported.

    GET / HTTP/1.1
    Host: localhost:8080
    Origin: http:/ / 127.0.0.1:3000
    Connection: Upgrade // Indicates to upgrade the protocol
    Upgrade: websocket // Indicates the webSocket protocol to be upgraded.
    Sec-WebSocket-Version: 13 // Indicates the websocket version. If the server does not support the version, you need to return an sec-websocket-versionheader containing the version number supported by the server.
    Sec-WebSocket-Key: w4v7O6xFTi36lq3RNcgctw== Sec-websocket-accept is compatible with sec-websocket-Accept at the end of the server response header, providing basic protection, such as malicious or unintentional connections.
Copy the code

The significance of the key request header is as follows:

  • Connection: Upgrade: indicates that the protocol is to be upgraded
  • Upgrade: websocket: indicates that the webSocket protocol is to be upgraded.
  • Sec-WebSocket-Version: 13: indicates the websocket version. If the server does not support the version, you need to return an sec-websocket-versionheader containing the version number supported by the server.
  • Sec-WebSocket-Key: is compatible with sec-websocket-Accept at the end of the server response header, providing basic protection against malicious or unintentional connections.

Note that the above request omits part of the unfocused request header. Since it is a standard HTTP request, the headers of requests such as Host, Origin, and Cookie are sent as usual. During the handshake phase, security restrictions and permission verification can be performed through the relevant request headers.

Server: responds to protocol upgrade

The status code 101 indicates protocol switchover. The protocol upgrade is completed here, and subsequent data interaction is based on the new protocol.

    HTTP/1.1 101 Switching Protocols
    Connection:Upgrade
    Upgrade: websocket
    Sec-WebSocket-Accept: Oy4NRAQ13jhfONC7bP8dTKb4PTU=
Copy the code

As shown below:

Note: Each header ends with \r\n and an extra blank line \r\n is added to the last line. In addition, the HTTP status code that the server responds to can only be used during the handshake phase. After the handshake phase, only specific error codes can be used.

The calculation of the Sec – WebSocket – Accept

Sec-websocket-accept Is calculated based on the sec-websocket-key in the header of the client request. The calculation formula is:

  • willSec-WebSocket-Keywith258EAFA5-E914-47DA-95CA-C5AB0DC85B11Joining together.
  • throughSHA1Compute the abstract and convert it tobase64A string.

>toBase64(sha1(sec-websocket-key + 258eafa5-E914-47DA-95CA-C5AB0DC85b11))

Verify the previous result:

const crypto = require("crypto");
const magic = "258EAFA5-E914-47DA-95CA-C5AB0DC85B11";
const secWebSocketKey = "w4v7O6xFTi36lq3RNcgctw==";

let secWebSocketAccept = crypto
  .createHash("sha1")
  .update(secWebSocketKey + magic)
  .digest("base64");

console.log(secWebSocketAccept);
// Oy4NRAQ13jhfONC7bP8dTKb4PTU=
Copy the code

Data frame format

The data exchange between client and server is inseparable from the definition of data frame format. So, before we actually talk about data exchange, let’s take a look at the data frame format of WebSocket. The minimum unit of communication between WebSocket client and server is frame, which consists of one or more frames to form a complete message.

  • Sender: the message is cut into multiple frames and sent to the server;
  • Receiver: Receives message frames and reassembles the associated frames into complete messages;

Format of data frames. Refer to section 5.2 of RFC6455 for detailed definitions.

An overview of data frame formats

A unified format for WebSocket data frames is given below. Those of you familiar with TCP/IP are familiar with this diagram.

  • From left to right, in bits. Such asFIN,RSV1One bit each,opcodeIt takes 4 bits.
  • The content includes identification, operation code, mask, data, and data length. (Expanded in the next section)
  0                   1                   2                   3
  0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
 +-+-+-+-+-------+-+-------------+-------------------------------+
 |F|R|R|R| opcode|M| Payload len |    Extended payload length    |
 |I|S|S|S|  (4)  |A|     (7)     |             (16/64)           |
 |N|V|V|V|       |S|             |   (if payload len==126/127)   |
 | |1|2|3|       |K|             |                               |
 +-+-+-+-+-------+-+-------------+ - - - - - - - - - - - - - - - +
 |     Extended payload length continued, if payload len == 127  |
 + - - - - - - - - - - - - - - - +-------------------------------+
 |                               |Masking-key, if MASK set to 1  |
 +-------------------------------+-------------------------------+
 | Masking-key (continued)       |          Payload Data         |
 +-------------------------------- - - - - - - - - - - - - - - - +
 :                     Payload Data continued ...                :
 + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +
 |                     Payload Data continued ...                |
 +---------------------------------------------------------------+
Copy the code

Data frame format details

FIN: 1 bit. The value is 1, indicating that it was the last fragment of message. The value is 0, indicating that it was not the last fragment of message.

RSV1, RSV2, RSV3: each occupies one bit. In general, they’re all 0’s. When the client and server negotiate to use WebSocket extension, the three flag bits can be non-0, and the meaning of the value is defined by the extension. If a non-zero value is present and the WebSocket extension is not used, the connection fails.

Opcode: 4 bits. The value of Opcode determines how subsequent data payloads should be resolved. If the operation code is unknown, the receiver should fail the connection. Optional operation codes are as follows:

  • %x0: indicates a continuation frame. When Opcode is 0, data shards are used in data transmission, and the received data frame is one of the data shards.
  • %x1: indicates a text frame.
  • %x2: indicates a binary frame.
  • %x3-7: Reserved operation code for later defined non-control frames.
  • %x8: The connection is down.
  • %x9: indicates a ping operation.
  • %xA: indicates this is a PONG operation.
  • % xb-f: Reserved operation code for subsequent defined control frames.

Mask: 1 bit. Indicates whether to mask the data payload. When sending data from the client to the server, mask the data. When sending data from the server to the client, there is no need to mask the data. If the data received by the server has not been masked, the server needs to disconnect the data. If Mask is 1, Masking key is defined in Masking-key and used to Mask the data payload. Mask 1 is used for all data frames sent by the client to the server.

**Payload Length: ** Indicates the length of the data Payload, in bytes. It is 7 bits, or 7+16 bits, or 1+64 bits. Payload length === x

  • If x ranges from 0 to 126, the length of data is x bytes.
  • X is 126: the next two bytes represent a 16-bit unsigned integer whose value is the length of the data.
  • X is 127: The next 8 bytes represent a 64-bit unsigned integer (highest bit 0) whose value is the length of the data. In addition, if the payload length occupies more than one byte, the binary representation of the payload length is big endian.

Masking-key: ** or 4 bytes (32 bits) All data frames transmitted from the client to the server are masked. Mask is 1 and 4 bytes of Masking-key are carried. If the Mask is 0, there is no Masking-key.

Note: Payload data length, excluding mask key length.

Payload Data (x+ Y) bytes Payload data: includes extended data and application data. Where, the extension data is x bytes, and the application data is Y bytes.

Extended data: 0 bytes of extended data if no extension is negotiated. All extensions must declare the length of the extended data, or how the length of the extended data can be calculated. In addition, how the extension will be used must be negotiated during the handshake phase. If the extended data exists, the payload data length must include the length of the extended data.

Application data: Any application data, after the extension data (if any), occupies the remaining space of the data frame. The length of the application data is obtained by subtracting the payload data length from the extension data length.

Mask algorithm

Masking-key is a random 32-bit number selected by the client. The mask operation does not affect the length of the data payload. The following algorithms are used for mask and inverse mask operations:

First, assume:

  • Original-octet-i: indicates the I th byte of the original data.
  • Transformed -octet -I: indicates the i-th byte of the transformed data.
  • J: is the result of I mod 4.
  • Masking -key-octet-j: indicates the JTH byte of the mask key.

The algorithm is described as original-octet-i and masking-key-octet-j, and then transformed- OCtet-i is obtained.

j = i MOD 4 transformed-octet-i = original-octet-i XOR masking-key-octet-j

The data transfer

Once the WebSocket client and server establish a connection, subsequent operations are based on the transmission of data frames.

WebSocket distinguishes operation types based on Opcode. For example, 0x8 indicates disconnection, and 0x0-0x2 indicates data interaction.

Data fragmentation

Each WebSocket message may be split into multiple data frames. When the WebSocket receiver receives a data frame, it determines whether the last data frame of the message has been received according to the VALUE of the FIN.

FIN=1 indicates that the current data frame is the last data frame of the message. In this case, the receiver has received the complete message and can process the message. If FIN=0, the receiver needs to continue listening to receive other data frames.

In addition, opCode represents the type of data in the case of data exchange. 0x01 indicates text, and 0x02 indicates binary. 0x00 is special and represents a continuation frame, which, as the name suggests, means that the data frame corresponding to the complete message has not been received.

Data sharding example

It’s better to look at examples. The following example from MDN is a good example of data sharding. The client sends messages to the server twice. The server responds to the client after receiving the messages. This section describes the messages sent from the client to the server.

The first message FIN=1, indicating that it is the last data frame of the current message. Once the server receives the current data frame, it can process the message. Opcode =0x1: indicates that the client sends a text message.

Second message

  • FIN=0, opCode =0x1, indicating that the message type is text, and the message has not been sent yet, and there are subsequent data frames.
  • FIN=0, opCode =0x0, indicating that the message has not been sent yet and there are subsequent data frames. The current data frame must be followed by the previous data frame.
  • FIN=1, opCode =0x0: Indicates that the message has been sent and no subsequent data frame is displayed. The current data frame must be followed by the previous one. The server can assemble the associated data frames into complete messages.
    Client: FIN=1, opcode=0x1, msg="hello"
    Server: (process complete message immediately) Hi.
    Client: FIN=0, opcode=0x1, msg="and a"
    Server: (listening, new message containing text started)
    Client: FIN=0, opcode=0x0, msg="happy new"
    Server: (listening, payload concatenated to previous message)
    Client: FIN=1, opcode=0x0, msg="year!"
    Server: (process complete message) Happy new year to you too!
Copy the code

Connection hold + heartbeat

WebSocket To maintain real-time bidirectional communication between the client and server, ensure that the TCP channel between the client and server is not disconnected. However, if a connection is maintained for a long time without data exchange, the connection resources may be wasted. However, in some scenarios, the client and server need to be connected even though no data has been exchanged for a long time. At this point, a heartbeat can be used to achieve this.

  • Sender -> Receiver: ping
  • Recipient -> Sender: Pong

Ping and pong operations correspond to two control frames of WebSocket with opcode 0x9 and 0xA respectively.

For example, a WebSocket server can ping a client using the following code (using the WS module)

ws.ping("".false.true);
Copy the code

The role of the Sec – WebSocket – Key/Accept

As mentioned earlier, sec-websocket-key/sec-websocket-Accept is used to provide basic protection against malicious and unexpected connections.

The functions are summarized as follows:

  • Prevent the server from receiving illegal WebSocket connections (for example, if an HTTP client accidentally requests to connect to the WebSocket service, the server can directly reject the connection)
  • Make sure the server understands websocket connections. Since the WS handshake phase uses HTTP, it is possible that the WS connection is processed and returned by an HTTP server, in which case the client can use sec-websocket-key to ensure that the server is aware of the WS protocol. (Not 100% safe, there are always boring HTTP servers, light sec-websocket-key, but no WS protocol…)
  • Sec-websocket-key and other related headers are disallowed when setting headers for ajax requests in the browser. This avoids websocket upgrade when the client sends ajax requests.
  • This prevents the reverse proxy (which does not understand the WS protocol) from returning incorrect data. For example, when the reverse proxy receives two ws connection upgrade requests, it returns the first one to the cache, and then returns the second one directly to the cache (meaningless return).
  • The main purpose of SEC-websocket-key is not to ensure data security, because the calculation formula of sec-websocket-key and SEC-websocket-accept conversion is public and very simple, and the main function is to prevent some common accidents (unintentional).

Note: the conversion of SEC-websocket-key/SEC-websocket-Accept can only bring basic guarantee, but there is no actual guarantee whether the connection is safe, whether the data is safe, whether the client/server is legitimate WS client, WS server.

The function of the data mask

In the WebSocket protocol, the data mask enhances the security of the protocol. However, the data mask is not to protect the data itself, because the algorithm itself is public and the operation is not complicated. There don’t seem to be many effective ways to secure communications other than encrypting the channel itself.

  • 1. Proxy cache contamination attacks
  • 2. The original proposal was to encrypt data. Considering security and efficiency, a compromise scheme is finally adopted: mask processing for data load.

reference

WebSocket: 5 minutes from beginner to master

WebSocket communication process and implementation