preface

Recently, I summarized what I learned in the real-time communication section of building two systems.

This is a series of articles, temporarily mainly conceived in four parts

  • Websocket (1) Websocket protocol
  • Websocket (2) Distributed Websocket server cluster
  • IO source code and WS-wrapper analysis and disconnection (socket.io)

The body of the

This article mainly introduces what Websocket is and its protocol content.

The WebSocket protocol implements full-duplex communication between a client running untrusted code in a controlled environment and a remote host from which the code has chosen to communicate. The protocol includes an open phase handshake specification and the definition of basic message frames for communication. It is based on TCP. The goal of this technology is to provide a mechanism for browser-based applications that need two-way communication with the server without relying on opening multiple HTTP connections (for example, using XMLHttpRequest or

What can Websocket do

In the past, creating Web applications that required two-way communication between the client and the service (for example, instant messaging and gaming applications) required polling the server over HTTP to get updates and then sending another request if it was a push message (many applications still do this). There are problems with this.

  • The server is forced to provide two types of interfaces, one for the client to poll for new messages, and one for the client to push messages to the server.
  • The HTTP protocol has overhead, with an HTTP header every time a message is sent and a handshake every time if keep-alive is not used.
  • Client scripts such as JS may also need to track the entire process, meaning that when I send a message, I may need to track the message’s return.

A simple approach is to use a single TCP connection to transmit in both directions. This is why the WebSocket protocol is provided. Combined with the WebSocket API [WSAPI], it provides an alternative to HTTP polling for two-way communication from web pages to remote servers.

Content of the agreement

Websocket protocol mainly consists of two parts, one is the rules of handshake, the other is the way of data transmission and carrier format. Here’s an example of what you can find online (click here). You can use developer tools to see the content in Network.

Once the client and server shake hands successfully, the data transfer part begins, which is a full-duplex communication. The basic unit for transmitting data between the client and server is called “Messages” in the specification. In the actual network, these messages consist of one or more Frames. The frame in the Websocket Message does not correspond to the frame in the computer network. The structure of frame will be described in detail later.

Shake hands

The purpose of the open phase handshake is to be compatible with HTTP-based server software and middleware so that a single port can be used for both HTTP clients communicating with the server and WebSocket clients communicating with the server. So the WebSocket client handshake is an HTTP Upgrade request (HTTP Status code 101) :

Request information in Charles Request headers (top) and response headers (bottom)

Here are a few fields and their considerations

Origin request header

Origin specifies the source of the request. The Origin header is used to protect the Websocket server from unauthorized cross-domain scripts that call the Websocket API. That is, you don’t want unauthorized cross-domain access to establish a connection with the server. The server can use this field to determine the source domain and selectively reject it.

Sec-websocket-key (request header) and sec-websocket-accept (response header)

On the other hand, the Websocket protocol needs to ensure that Websocket connection requests initiated by clients are only recognized by servers that can understand the Websocket protocol.

Really, as you are mentioned, if you are aware of websockets (that is what to be checked), you could pretend to be a websocket server by sending correct response. But then, if you will not act correctly (e.g. form frames correctly), it will be considered as a protocol violation. Actually, you can write a websocket server that is incorrect, but there will be not much use in it.

And another purpose is to prevent clients accidentally requesting websockets upgrade not expecting it (say, by adding corresponding headers manually and then expecting smth else). Sec-WebSocket-Key and other related headers are prohibited to be set using setRequestHeader method in browsers.

Stackoverflow reference

The data transfer

The following describes the Frame structure

As mentioned earlier, the basic unit for transmitting data between the client and server is called “Messages” in the specification. In a real network, these messages consist of one or more Frames.

  1. FIN, indicating whether a Frame is the last Frame in a Message.
  2. RSV1-3.Must beIs 0 unless there is an extension that defines the meaning of a non-zero value.
  3. Opcode.This is more important, the following values are defined by the protocol
    • %x0 denotes a continuation frame

    • %x1 represents a text frame

    • %x2 represents a binary frame

    • %x3-7 are reserved for further non-control frames

    • %x8 indicates that the connection is closed

    • %x9 indicates ping (heartbeat detection related, more on later)

    • %xA means PONG (heartbeat detection related, more on that later)

    • %xB-F are reserved for further control frames

  4. MaskPayload Data specifies whether the payload data is a calculated mask. This one and this oneMasking-keyThe relevant
  5. Payload lenThe length of the data.
  6. Masking-keyI won’t repeat it here. Give me oneMeaning of the mask in Websocket
  7. Payload dataThe actual data to be sent by a frame can be of any length, but although there is no limit to the size of a frame, the data to be sent must not be too large, otherwise it will be impossibleEfficient use of network bandwidthAs mentioned above, Websocket provides sharding.

Do the math and here’s an excerpt from Charles

// Hexadecimal 81 84 3A A6 AC E4 51 C3 C7 81 // Binary 10000001 10000100 00111010 10100110 10101101 11100100 01010001 11010011 11010111, 10000001,Copy the code

Opcode is 0001,0 x1 represents a Text frame

Payload Len is 0000100. 0x4 indicates a length of 4 bytes

The mask is 00111010 10100110 10101101 11100100

The payload is 01010001 11010011 11010111 10000001

Specific treatment can be referred toNode.js WS sourceOne of thebuffer-utils

Websocket usage and API

After the Websocket protocol section, now talk about how to relate the Web API.

/ / the client
var ws = new WebSocket('wss://example.com/socket'); ➊ ws. Onerror =function (error) { ... } ➋ 
ws.onclose = function () { ... } ➌
ws.onopen = function () {➍ ws. Send ("Connection established. Hello server!"); ➎
}
ws.onmessage = function(msg) {➏if(msg.data instanceofBlob) {➐ processBlob (MSG) data); }else{ processText(msg.data); }}Copy the code
  1. Open a new secure WebSocket connection (WSS)
  2. Optional callback to be called when a connection error occurs
  3. Optional callback called when the connection terminates
  4. Optional callback, called when a WebSocket connection is established
  5. The client first sends a message to the server
  6. Callback function, called every time the server sends back a message
  7. Depending on the message received, decide whether to invoke binary or text processing logic

The heartbeat detection

Sometimes when using webSocket, the client network is down and the onClose event is not raised on the server. Will be like this:

  • Redundant connection
  • The server will continue to send data to the client, and the data will be lost

Therefore, a mechanism is needed to detect whether the client and server are properly connected. Heartbeat detection is such a mechanism, generally speaking, the client every time a certain period of time

Ws module heartbeat processing

How does the WS module detect and close broken connections through heartbeat detection

const WebSocket = require('ws');

const wss = new WebSocket.Server({ port: 8080 });

function noop() {}

function heartbeat() {
  this.isAlive = true;
}

wss.on('connection'.function connection(ws) {
  ws.isAlive = true;
  ws.on('pong', heartbeat);
});

const interval = setInterval(function ping() {
  wss.clients.forEach(function each(ws) {
    if (ws.isAlive === false) return ws.terminate();

    ws.isAlive = false;
    ws.ping(noop);
  });
}, 30000);
Copy the code

According to the specification, a Pong response message is automatically sent when a Ping message is received.

Resolve ws and WSS co-existence

Here is my Nginx configuration, along with load balancing. The certificate is self-signed, so there is a problem.

How does Websocket do identity authentication

In general, Websocket authentication takes place during the handshake phase and is authenticated by the contents of the request. A common example is to attach parameters to a URL.

new WebSocket("ws://localhost:3000? token=xxxxxxxxxxxxxxxxxxxx");
Copy the code

Taobao live bullet screen is also used in this way to do identity authentication

Take NPM’s IMPLEMENTATION of the WS module as an example, which provides the verifyClient method when creating Websocket servers.

const wss = new WebSocket.Server({
  host: SystemConfig.WEBSOCKET_server_host,
  port: SystemConfig.WEBSOCKET_server_port,
  // Verify token identification
  verifyClient: (info) = > {
    const token = url.parse(info.req.url, true).query.token
    let user
    console.log('[verifyClient] start validate')
    // TokenExpiredError will be raised if the token expires
    if (token) {
      try {
        user = jwt.verify(token, publicKey)
        console.log(`[verifyClient] user ${user.name} logined`)}catch (e) {
        console.log('[verifyClient] token expired')
        return false}}// verify token and parse user object
    if (user) {
      info.req.user = user
      return true
    } else {
      info.req.user = {
        name: Tourists `The ${parseInt(Math.random() * 1000000)}`.mail: ' '
      }
      return true}}})Copy the code

The relevant WS source code is located at WS /websocket-server

  // ...
  if (this.options.verifyClient) {
    const info = {
      origin: req.headers[`${version === 8 ? 'sec-websocket-origin' : 'origin'}`].secure:!!!!! (req.connection.authorized || req.connection.encrypted), req };if (this.options.verifyClient.length === 2) {
      this.options.verifyClient(info, (verified, code, message) => {
        if(! verified)return abortHandshake(socket, code || 401, message);
        this.completeUpgrade(extensions, req, socket, head, cb);
      });
      return;
    }

    if (!this.options.verifyClient(info)) return abortHandshake(socket, 401);
  }
  this.completeUpgrade(extensions, req, socket, head, cb);
}
Copy the code

Afterword.

References:

Rfc6455 The WebSocket Protocol

High Performance Browser Networking – [plus] Ilya Grigorik

Learning the WebSocket Protocol – Implementation from top to bottom (revised)