WebRTC is a point-to-point real-time communication technology. Based on this technology, a real-time online programming interview tool will be implemented in this paper. During the remote interview, both parties can not only have audio and video calls, but also the interviewer can see the programming status of the interviewer in real time.

It looks something like this:

Agora SDK is a real-time communication solution provided by Sonnet, which also includes WebRTC package, based on which we will develop weBRTC-related functions to provide a more production-level real-time communication experience.

The complete source code of this tool is stored in this warehouse, you can read the source code with the article.

demand

This online programming interview tool needs to address two parts of the requirements:

  1. Interviewers and candidates can communicate via audio and video calls in real time.
  2. Candidates can use an online code editor to answer questions and the interviewer can see the code written in real time. Ideally, the editor should have highlighting and code completion functions to facilitate the use of candidates.

Design ideas

Looking at the functionality provided by the Agora SDK, we found that there are two SDKS that can be used to fulfill our requirements:

  • Video SDK, which provides reliable real-time audio and Video call services, can be used for communication between interviewers and interviewees.
  • Signaling SDK, which provides a stable messaging channel, can be used to deliver data corresponding to an interviewer’s writing process to the interviewer.

The Video SDK has provided rendering related implementation, which can output the Video to the specified DOM node, basically out of the box. The code editor and its data transfer require some development.

After some comparison and selection, we finally chose to use the monaco-editor part of VScode as the built-in code editor, and rrweb, the previously open source Web recording and playback library, to record the monaco-editor operations. The data is sent to the interviewer via the Signaling SDK, and is also played back in real time via RRWeb for code synchronization.

Note that this tool is a proof-of-concept project and is for discussion only. The degree of optimization is not enough to be applied in production, and the design itself also has a lot of room for improvement. For example, relying on the complete VScode to provide code execution, debug and other functions, a scheme closer to live Share can be realized.

To encapsulate the SDK

The Agora SDK API itself is fairly clear and well documented, but the API is mostly asynchronous and provided as a callback.

In the case of video, you might have four or five layers of nested callbacks after initialization, joining a channel, creating a stream, publishing, subscribing, and so on. Therefore, I first made a simple encapsulation of the API used to provide a Promise-style interface, which can keep clearer code structure and better control ability through async/await when used.

Using initialization as an example, the SDK API is used as follows:

client.init(appId, function () {
  console.log("AgoraRTC client initialized");
}, function (err) {
  console.log("AgoraRTC client init failed", err);
});
Copy the code

We can turn this into a Promise in this way:

const init = appId= >
  new Promise((resolve, reject) = > {
    client.init(appId, () => resolve(), err => reject(err));
  });
Copy the code

Wrapping all apis in this way makes our basic flow code much simpler and clearer:

async function main() {
  try {
    await rtc.init(APP_ID);
    const uid = await rtc.join(null, CHANNEL_ID, ACCOUNT);
    const stream = rtc.createStream();
    await rtc.initStream(stream);
    awaitrtc.subscribe(...) ;awaitrtc.publish(...) ; }catch (error) {
    console.error(error); }}Copy the code

Refer to this file for the above asynchronous encapsulation of the SDK.

Audio and video call

For audio and video call functions, see the Quick Start Guide. The steps can be summarized as follows:

  1. Initializes a client based on the APP ID.
  2. Join a channel. Each channel has its own unique ID. Users in a channel can subscribe to video and audio streams published by other users in the same channel. In our tool, we use url query to store a channel ID, for example? id=abc123, both interviewers open the same query to ensure that they join the same channel.
  3. Create and initialize a local audio and video stream (the video content is the user) and initialize the video into the DOM. In the tool, we will see both our own video and the other’s video at the same time. The video rendered in this step is our own video.
  4. Publish your own audio and video streams.
  5. Subscribe to the audio and video stream published by the other party, and render the audio and video stream into the DOM after receiving it.

In the actual implementation process, since we encapsulated the SDK as a Promise, steps 4 and 5 adjusted the order for both sides of the interview:

  1. Interviewers subscribe to each other’s stream.
  2. Candidates publish their own streams and subscribe to each other’s streams.
  3. After the interviewer successfully subscribes, he or she will publish his or her own stream. At this point, the interviewer is already subscribed and will be able to receive the published information successfully.

This is mainly to avoid the issue that the other party has not subscribed when publishing, resulting in the failure to establish a connection.

Real-time programming

As mentioned in the design, we will use monaco-Editor as the online editor and rrWeb to record the actions in the editor. The apis for both tools are very easy to use and are integrated in a dozen lines of initialization code on the interviewer’s page:

import * as monaco from "monaco-editor/esm/vs/editor/editor.main.js";
import { record } from "rrweb";

self.MonacoEnvironment = {
  getWorkerUrl: function(moduleId, label) {
    // get worker urls}}; monaco.editor.create(document.body, {
  value: ["function x() {".'\tconsole.log("Hello world!" ); '."}"].join("\n"),
  language: "javascript"
});

record({
  emit(event) {
    parent.postMessage({ event }, parent.origin);
  },
  inlineStylesheet: false
});
Copy the code

In the implementation, we nested the editor in the form of iframe in the interviewer page. Rrweb recorded the operation record and passed the data to the main page via parent.postMessage for Signaling SDK to transmit.

However, when using the Signaling SDK in practice, we encountered two typical problems:

  1. Data transmission is limited by volume. The size of visible characters in each message cannot exceed 8 KB.
  2. Because rrWeb recording is a log-structured data structure, it is necessary to maintain strict order in case of different data sizes and different transmission speeds for each operation.

Data segmentation

One way to solve the data volume limitation is to divide the data into multiple chunks and mark each chunk as an incomplete data record, which needs to be spliced before being used.

The corresponding implementation is as follows :(here a rough way is used for identification, and in fact more meta information can be recorded to improve the accuracy of identification)

// Convert operation data to a string
const eventStr = JSON.stringify(e.data.event);

export const CHUNK_START = "_0_";
export const CHUNK_SIZE = 8 * 1024 - CHUNK_START.length;
export const CHUNK_REG = new RegExp(`. {1,${CHUNK_SIZE}} `."g");

const chunks = [];
if (eventStr.length > CHUNK_SIZE) {
  for (const chunk ofeventStr.match(CHUNK_REG)) { chunks.push(CHUNK_START + chunk); }}Copy the code

Description: If the Signaling SDK receives data from the Signaling SDK, you can use the CHUNK_START flag in the header to determine whether the data is complete or to concatenate:

let largeMessage = "";
on("messageInstantReceive", (messageAccount, uid, message) => {
  const events = [];
  if (message.startsWith(CHUNK_START)) {
    largeMessage += message.slice(CHUNK_START.length, message.length);
  } else {
    if (largeMessage) {
      // reset chunks
      events.push(JSON.parse(largeMessage));
      largeMessage = "";
    }
    events.push(JSON.parse(message)); }});Copy the code

Ensure that the sequential

As mentioned above, the data transmitted by RRWeb may be a large full snapshot or a small single Oplog. Therefore, under the influence of network transmission speed, if it is not controlled, late operations may be transmitted first, resulting in playback abnormalities. Therefore, we need to achieve the transmission data sequence preservation.

The Signaling SDK-provided send data API messageInstantSend provides a third parameter, callback, which is called when the send is successful. However, in the actual test, callback trigger does not guarantee that the receiver has completed the download, so we still need to implement the sequence preservation including the download by ourselves. If I understand or test wrong, please correct.

A simpler implementation is to add a message queue on the interviewer side, where rrWeb records the data when a new action is recorded.

At the same time, when the interviewer side is ready to receive data, a START signal is sent to the other side, and the interviewer side takes out the first data from the message queue after receiving the signal. After that, the interviewer side will reply with an ACK signal for each piece of data received, and the interviewer side will continue to take messages out of the queue and send them after receiving this signal, so as to ensure that the data received by the interviewer side are in strict order.

The following is an example:

on("messageInstantReceive".async (messageAccount, uid, message) => {
  if (message === START) {
    // Send the first data
    await signal.messageInstantSend(interviewerAccount, eventQueue.dequeue());
  }
  if (message === ACK && eventQueue.length > 0) {
    awaitsignal.messageInstantSend(interviewerAccount, eventQueue.dequeue()); }});Copy the code

A more complete practical implementation can be found in this file.

To optimize the

The Signaling SDK is based on TCP, and the additional communication generated by such a read confirmation mechanism causes a large delay, which affects the real-time performance of the interviewer watching the playback.

Some possible optimization ideas include:

  1. The sequence of data receiving is no longer strictly restricted, but the numbered index is recorded in the meta area of the data. When the receiver finds a “gap” between the two received data, it chooses not to play the data immediately, but to wait for the completion of data transmission and then reorder the data and play it again. This eliminates the need to transmit ACK signals and reduces the latency of a round trip.
  2. If there is more than one record in the sender queue, the sender attempts to combine the records into one record within the volume limit and send the records to the peer end in batches through one data transfer. In this way, the overhead of data transmission and connection establishment can be reduced. In particular, the optimization effect is obvious when the number of small data blocks is large.

We believe that after the above optimization, our online programming interview tool will be more practical.

conclusion

With the evolution of Web apis and the emergence of more sophisticated tools and services, developers can quickly develop useful tools and products based on them. In the case of the project in this article, with the use of three tools/services, Agora SDK, Monaco-Editor, and RRWeb, the functionality was verified with very little code.

As VScode remote/browser related features become more mature, the editor part of the functionality will be further enhanced and may become a practical product. So it’s reasonable to believe that more browser-based services will emerge as browser-based apis become more powerful and perform better, and the real-time communications WebRTC provides will be a valuable component.

Agora SDK experience essay contest essay | the nuggets technology, the campaign is underway