This article mainly introduces webSocket(abbreviated as WS below), and uses Node to implement basic functions natively, the difficulty is mainly to parse and assemble data. Knowledge points needed:

  • WebSocket
  • Buffer
  • Bitwise operators
  • Understanding binary
  • Know hexadecimal

First let’s look at the WS data frame format:

0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-------+-+-------------+-------------------------------+ |F|R|R|R| opcode|M| Payload len | Extended payload length | |I|S|S|S| (4) |A| (7) | (16/64) | |N|V|V|V| |S| | (if payload len==126/127)   |
 | |1|2|3|       |K|             |                               |
 +-+-+-+-+-------+-+-------------+ - - - - - - - - - - - - - - - +
 |     Extended payload length continued, if payload len == 127  |
 + - - - - - - - - - - - - - - - +-------------------------------+
 |                               |Masking-key, if MASK set to 1  |
 +-------------------------------+-------------------------------+
 | Masking-key (continued)       |          Payload Data         |
 +-------------------------------- - - - - - - - - - - - - - - - +
 :                     Payload Data continued ...                :
 + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +
 |                     Payload Data continued ...                |
 +---------------------------------------------------------------+

Copy the code

The above diagram is essential to understanding WS, but those unfamiliar with data frames will have no idea what it means. So let’s just explain what this graph is for, and we should look at it.

Data frame

  • (bit)
    • The smallest unit of data storage in a computer is called b, also known as a bit. Each 0 or 1 is a bit.
  • Byte
    • Eight bits represent one byte

With these two concepts in mind, look at the picture above:

  • First line (32 bits)

    • There’s a FIN in the top left corner of the table, so this is onepositionIt’s only going to be 0 or 1 in this bit
    • RSV1, RSV2, and RSV3 occupy 1 bit, respectively.
    • Is then followed byopcode(4)Here represents the data operation code, occupying four bits. The value returned is 0000-1111, which is binary
    • And then theMASKMask identifier, accounting for 1 bit,
    • payload len(7), the length of the received data, accounting for 7 bits.
    • Extended payload length(16/54)...The last space in the first row is eight bits where the meaning of the data will change, more on that later.
  • Second line (32 bits)

    • Extended payload Length, if payload len == 127

      • In fact, the branch is just for the convenience of display, we could have spliced the second line after the first line, in fact, we do the same when processing data, there is no branch.
      0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-------+-+-------------+-------------------------------+------------------------------------+ |F|R|R|R| opcode|M| Payload len | Extended payload length | | |I|S|S|S| (4) |A| (7) | (16/64) | Extended payload length continued,  | |N|V|V|V| | | | |if payload len == 127             |
         | | | | |       |S|             |   (if payload len==126/127)   |                                    |
         | |1|2|3|       |K|             |                               |                                    |
         +-+-+-+-+-------+-+-------------+ - - - - - - - - - - - - - - - +------------------------------------+
      Copy the code

So the next few lines can be spliced to the next.

If the client (browser) sends a Hello to the server, our server receives a binary string of zeros or ones, like this: 10001000111… In order to know exactly what is sent to us, we need to parse the 0/1 of these columns. The diagram above resolves the series of 0/1 rules. We can follow the above rules step by step to get the data we want.

Here’s an example:

If you receive data 10000001 from the client (this is only the first part of the data capture (the first byte), there are many more), the corresponding values are as follows:

FIN RSV1 RSV2 RSV3 opcode
1 0 0 0 0001

Data frame format details

  • FIN: 1bit

    Indicates that this is the last frame of a message. The first frame could also be the last. %x0: next frame %x1: last frame

  • RSV1, RSV2, and RSV3: each occupy 1bit

    Must be 0 unless an extension has been negotiated to give a non-zero value some meaning. If no non-zero value is defined and a non-zero RSV is received, the Websocket connection fails

  • Opcode: 4bit

    The following opcode values are defined: %x0: indicates the consecutive frames %x1: Text frame %x2: binary frame %x3-7: reserved for non-control frame %x8: closed handshake frame %x9: Ping frame %xA: Pong frame % xb-F: reserved for non-control frame

  • Mask: 1bit

    Defines whether “payload data” is added to the mask. If 1 is set, “Masking-key” is assigned and all frames sent from the client to the server are set to 1

  • Payload length: 7 bit | 7 + 16 bit | 7 + 64 – bit

    If it is 0 to 125, it is the “payload length”. If it is 126, it is the “payload length” of a 16-bit unsigned integer. If it is 127, it is the “payload length” of a 16-bit unsigned integer. And then the next one is represented as a 64-bit unsigned integer, payload length.

    • Why do these three things happen? Due to thepayload lengthThere are only seven, and the second system is maximum1111111Convert to decimal127, if payload Length is greater than127You can’t represent it correctly. We need more bits to say payload length, so we’re inPayload lengthI’m going to write it in other bits. Why don’t we just define a 64 bit representation? This works, but there are performance concerns, as mentioned abovehelloLength only “5”, converted to binary is101Three digits will do, but using 64 bits would be a bit wasteful. So these three cases are defined separately.
  • Masking-key: 0 or 32bit

    All frames sent from the client to the server contain a 32-bit mask (if “mask bit” is set to 1), or 0 bit otherwise. Once the mask is set, all received payload data must be xor with the value in an algorithm to obtain the true value.

  • Payload data: (x+y) bytes

    It is the sum of “Extension data” and “Application data”. Generally, the Extension data is empty.

  • Extension data: x bytes

    This is 0 unless the Extension is defined, and any Extension must specify the length of its Extension data

  • Application data: y bytes

    Occupy the remaining frames after “Extension data”

In actual combat

Knowing the frame structure and meaning, you can then parse the data according to the rules

  • Analytical data
  function parseFrams() {
    // Buffer received data
    const buffer = this.buffer;
    // Data starts from the third byte by default and is less than 125 bytes long
    let payloadIndex = 2;

    // Get the first byte, containing FIN and opcode
    const byte1 = buffer.readUInt8(0);

    // 0: There are subsequent frames
    // 1: last frame
    const FIN = (byte1 >>> 7) & 0x1;

    // Get the opcode
    const opcode = byte1 & 0x0f;

    if(! FIN) {// Not the last frame needs to hold the current opcode, the protocol requires:
      // The opcode of the first frame must be temporarily saved
      // Fragment number 0 1... N-2 N-1
      // FIN 0 0 ... 0 1
      // opcode ! 0 0... 0 0
      this.frameOpcode = opcode;
    }

    // Get the MASK and the payload length.
    let byte2 = buffer.readUInt8(1);

    // Defines whether the payload data is added to the mask
    // Masking-key is assigned if 1 is set
    // All frames sent from the client to the server are set to 1
    let MASK = (byte2 >>> 7) & 0x1;

    // Get the length of the data
    let payloadLength = byte2 & 0x7f;

    let mask_key;

    if (payloadLength === 126) {
      // If the value is greater than 126 and less than 65536, then the following bytes represent the length of the data, then the actual data will be shifted by two bytes
      payloadLength = buffer.readUInt16BE(payloadIndex);

      // Real data is moved back by 2 bits
      payloadIndex += 2;
    } else if (payloadLength === 127) {
      If the value is greater than or equal to 65536, the following bytes represent the length of the data. The maximum length of the data is 64 bits, but if the data is too large, it is difficult to process. The maximum value is 32 bits
      // So bytes 2-6 should always be 0, and real data is 6-10 bytes long
      // 4:2-6 byte positions
      payloadLength = buffer.readUInt32BE(payloadIndex + 4);
      // 8: The data length occupies 8 bytes, the real data needs to be moved 8 bytes later
      payloadIndex += 8;
    }

    // If the MASK bit is set to 1 then Mask_key will occupy 4 bits MASK_KEY_LENGTH===4
    const maskKeyLen = MASK ? MASK_KEY_LENGTH : 0;

    // If the length of the received data is less than the total length of the sent data plus the length of the protocol header, the data is not completely received and is not processed until all the data is received
    if (buffer.length < payloadIndex + maskKeyLen + payloadLength) {
      return;
    }

    // If there is a mask, the real data is preceded by a four-byte mask key (Masking-key)
    let payload = Buffer.alloc(0);
    if (MASK) {
      // Get the mask
      mask_key = buffer.slice(payloadIndex, payloadIndex + MASK_KEY_LENGTH);

      // The real data is moved back again by 4 bits
      payloadIndex += MASK_KEY_LENGTH;

      // There is a mask need to decode, decoding algorithm is specified dead, visible text source code
      payload = unmask(mask_key, buffer.slice(payloadIndex));
    } else {
      // We can intercept data without a mask
      payload = buffer.slice(payloadIndex);
    }

    // It may be a fragmented transmission, so you need to cache data frames and wait for all frames to be accepted before processing the complete data
    this.payloadFrames = Buffer.concat([this.payloadFrames, payload]);
    this.buffer = Buffer.alloc(0);

    // Data is accepted
    if (FIN) {
      const _opcode = opcode || this.frameOpcode;
      const payloadFrames = this.payloadFrames.slice(0);
      this.payloadFrames = Buffer.alloc(0);
      this.frameOpcode = 0;

      // Process different data according to different opcodes
      this.processPayload(_opcode, payloadFrames); }}Copy the code
  • Build returns data, which is the inverse of parsing data
  /** * * @param {number} opcode * @param {string|buffer} payload * @param {boolean} isFinal */
  function encodeMessage(opcode, payload, isFinal = true) {
    const len = payload.length;
    let buffer;
    let byte1 = (isFinal ? 0x80 : 0x00) | opcode;

    if (len < 126) {
      // The data contains 0 to 125 characters

      // Build the return data container
      buffer = Buffer.alloc(2 + len); // 2: [FIN+ rsv1/2/3 +OPCODE](1bytes) + [MASK+payload length](1bytes)

      / / write FIN + RSV1/2/3 + OPCODE
      buffer.writeUInt8(byte1);

      // Write MASK+payload length from the second byte
      buffer.writeUInt8(len, 1);

      // Writes real data from the third byte
      payload.copy(buffer, 2);
    } else if (len < 1 << 16) {
      // The value ranges from 126 to 65535
      buffer.Buffer.alloc(2 + 2 + len);
      buffer.writeUInt8(byte1);
      buffer.writeUInt8(126.1);
      buffer.writeUInt16(len, 2);
      payload.copy(buffer, 4);
    } else {
      // Data length 65536~..
      buffer.Buffer.alloc(2 + 8 + len);
      buffer.writeUInt8(byte1);
      buffer.writeUInt8(127.1);
      buffer.writeUInt32(0.2);
      buffer.writeUInt32(len, 6);
      payload.copy(buffer, 10);
    }
    return buffer;
  }
Copy the code

The above two pieces of code have very detailed comments, should be able to understand, no longer specific analysis, see github source code