The Chinese version of this article was originally published on WebRTC Chinese Website. It’s by Chad Hart.

Stop touching your face with your hands! To prevent the spread of the novel coronavirus, medical authorities advise us not to touch our faces with unwashed hands. But that’s hard to do if you sit in front of a computer for hours without moving. I can’t help but wonder, can a browser solve this problem?

We conducted many computer vision and WebRTC experiments for this purpose. I had planned to experiment with running computer vision in a local browser using Tensorflow.js, and now seemed like a good opportunity. A quick search found someone who had thought about doing this 2 weeks ago. But the model the site uses requires some user training to understand, which is interesting, but also potentially risky. And it’s not open source software for others to extend. So over the weekend, I did some processing with code isolation to see what the results were.

You can check it out at Facetouchmonitor.com. You can also read on to see how it works. All code is available at github.com/webrtchacks… Found on. I’ll share some points and alternatives in this article.

TensorFlow.js

Tensorflow. js is the JavaScript version of TensorFlow. You’ve probably heard of Tensorflow because it’s the most popular machine learning tool in the world.

Tensorflow.js takes advantage of machine learning and will be used in JavaScript enabled Node.js and browsers. Even better, tensorflow.js contains several pre-built models for computer vision from the main TensorFlow model library. This article introduces two of these models.

TensorFlow.js + WebRTC

Tensorflow is very friendly to WebRTC. The basic model works for still images, but Tensorflow.js includes helper features that automatically extract images from video feeds. Some functions like tf.data.webcam (webcamElement) can even call getUserMedia for you.

KNN classifier is used for transfer learning

Donottouchyourface.com was launched a few weeks ago and has also received some positive reviews. When you first use it, you need to familiarize it with your face. It should take about five seconds. Then you need to touch the face and repeat the process several times. The program plays an audio file and sends a browser notification. If you are then in the same position as during the identification, the program will run successfully. But if you change locations during identification, or don’t provide enough identifiers, the program can run very erratically. The following is an example:

Ltwus2ix28x10gixx34jeigv-wpengine.netdna-ssl.com/wp-content/…

The working principle of

Even if the results aren’t quite as accurate, it’s still cool to do this in a browser. So how do developers do it? As you can see from their JavaScript, they use the obilenet and knnClassifier libraries:

import * as mobilenet from '@tensorflow-models/mobilenet'; 
import * as knnClassifier from '@tensorflow-models/knn-classifier';
Copy the code

The MobileNet model appears to be used as a basic image recognition library. During recognition, it takes a series of images from the camera and assigns them to one of the “touch” and “do not touch” items. Then, KNN classifier is used for transfer learning and re-recognition of MobileNet. Finally, the new image is classified according to these two terms.

In fact, Google has built a tensorflow.js transfer learning image classifier code lab that can do the same. I tweaked the code lab with two lines of code to replicate what Donottouchyourface.com did in JSFiddle below:

const webcamElement = document.getElementById('webcam');
const classifier = knnClassifier.create();
let net;
async function app() {
  console.log('Loading mobilenet.. ');
  // Load the model.
  net = await mobilenet.load();
  console.log('Successfully loaded model');
  // Create an object from Tensorflow.js data API which could capture image
  // from the web camera as Tensor.
  const webcam = await tf.data.webcam(webcamElement);
  // Reads an image from the webcam and associates it with a specific class
  // index.
  const addExample = async classId => {
    for (let x = 50; x > 0; x--) {
      // Capture an image from the web camera.
      const img = await webcam.capture();
      // Get the intermediate activation of MobileNet 'conv_preds' and pass that
      // to the KNN classifier.
      const activation = net.infer(img, 'conv_preds');
      // Pass the intermediate activation to the classifier.
      classifier.addExample(activation, classId);
      // Dispose the tensor to release the memory.
      img.dispose();
      // Add some time between images so there is more variance
      setTimeout(() => {
        console.log("Added image")}, 100)}}; // When clicking a button, add an examplefor that class.
  document.getElementById('class-a').addEventListener('click', () => addExample(0));
  document.getElementById('class-b').addEventListener('click', () => addExample(1));
  while (true) {
    if (classifier.getNumClasses() > 0) {
      const img = await webcam.capture();
      // Get the activation from mobilenet from the webcam.
      const activation = net.infer(img, 'conv_preds');
      // Get the most likely class and confidence from the classifier module.
      const result = await classifier.predictClass(activation);
      const classes = ['notouch'.'touch'];
      document.getElementById('console').innerText = `
        prediction: ${classes[result.label]}\n
        probability: ${result.confidences[result.label]}
      `;
      // Dispose the tensor to release the memory.
      img.dispose();
    }
    await tf.nextFrame();
  }
}
app();
Copy the code
<html> <head> <! -- Load the latest version of TensorFlow.js --> <script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs"></script>
    <script src="https://cdn.jsdelivr.net/npm/@tensorflow-models/mobilenet"></script>
    <script src="https://cdn.jsdelivr.net/npm/@tensorflow-models/knn-classifier"></script>
  </head>
  <body>
    <div id="console">Remember to allow your camera first</div> <! -- Add an image that we will use totest -->
    <video autoplay playsinline muted id="webcam" width="224" height="224"></video>
    <button id="class-a">No touch</button>
    <button id="class-b">Touch Face</button>
    <span id="message"></span> <! -- Load index.js after the content of the page --> <script src="index.js"></script>
  </body>
</html>
Copy the code

BodyPix method

As the picture above shows, it is difficult to identify the model within a few seconds. Ideally, we could ask hundreds of people to share images of their faces touching and not touching in different environments and locations. And then we can build a new model based on that. I don’t know if there is such a tag dataset, but tensorflow.js has a similar tool — BodyPix. BodyPix can identify people and subdivide their body parts (arms, legs, face, etc.). The new 2.0 version of BodyPix can even detect posture, as PoseNet offers.

Hypothesis: We can use BodyPix to detect hands and faces. When the hands overlap with the face, we can tell there is face contact.

Is BodyPix accurate enough?

The first thing to look at is the quality of the model. If it can’t reliably detect hands and faces, it will be scrapped. The REPo includes a demo page, so it’s easy to test if it works:

2.0 does an excellent job, especially compared to the 1.0 model. I was impressed.

Using BodyPix API

The API has the following options:

  • PersonSegmentation: People are separated from the background of the image. All people in the lake wear a mask.

  • SegmentMultiPerson: divide the person from the background of the picture, and give each person a mask;

  • SegmentPersonParts: Dividing a single body part of one or more people into a single data set

  • SegmentMultiPersonParts: Sets one or more body parts separately into a set of data sets.

What I want is the “personParts” feature. But The Readme.md warns that segmentMultiPersonParts is slower and has a more complex data structure. So I chose segmentPersonParts. To see all options in action, go to the documentation on the BodyPix repO.

The operation returns an object similar to the following:

AllPoses: Array(1) 0: Keypoints: Array(17) 0: {score: 0.9987769722938538, part:"nose"Position: {... }} 1: {score: 0.9987848401069641, part:"leftEye"Position: {... }} 2: {score: 0.9993035793304443, part:"rightEye"Position: {... }} 3: {score: 0.4915933609008789, part:"leftEar"Position: {... }} 4: {score: 0.9960852861404419, part:"rightEar"Position: {... }} 5: {score: 0.7297815680503845, part:"leftShoulder"Position: {... }} 6: {score: 0.8029483556747437, part:"rightShoulder"Position: {... }} 7: {score: 0.010065940208733082, part:"leftElbow"Position: {... }} 8: {score: 0.01781448908150196, part:"rightElbow"Position: {... }} 9: {score: 0.0034013951662927866, part:"leftWrist"Position: {... }} 10: {score: 0.005708293989300728, part:"rightWrist"Position: {... }}... Score: 0.3586419759016922 Length: 1 data: Int32Array(307200) [-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,...  height: 480 width: 640Copy the code

The basics include the size of the image, the score for each of the 24 body parts with a specific pose point, and an array of integers that correspond to each pixel or -1 of the body part. Posture points are essentially a less accurate version of another TensorFlow library, PoseNet.

Algorithms are everywhere

Although it took a while, I eventually came up with the following algorithm:

  1. If there is a high probability that the image does not contain a nose and at least one eye, ignore the image;
  2. Check if the hand has ever been in the face in the previous frame – count 1 point for each overlap;
  3. Check if the hand is touching the face – count 1 point for each contact;
  4. When the sum of the above items exceeds the threshold, an alarm is triggered.

Real-time monitoring

It is useful to be able to see the classifier in action in real time. BodyPix includes several apis to help model drawing posture points and staging data. If you want to see all the segmentation sections, just pass it to the segmentation object returned by segmentPersonParts:

const coloredPartImage = bodyPix.toColoredPartMask(targetSegmentation);
const opacity = 0.7;
const maskBlurAmount = 0;
 
bodyPix.drawMask(
   drawCanvas, sourceVideo, coloredPartImage, opacity, maskBlurAmount,
   flipHorizontal);
Copy the code

I wanted to focus on the hands and face, so I modified the segmentation data to include only parts of the above targetSegmentation.

I also copied the code for estimating points after drawing from the Keypoints array.

To optimize the

Making the model usable on a variety of devices is the most time-consuming part of the work. We haven’t solved the problem completely yet. But as long as you know the right way to open it, you can run it perfectly smoothly.

Sensitivity and error alerts

When a face appears in the picture, the model performs touch detection. One eye and one nose make a face. There is a sensitivity threshold set in the model. And I found it very reliable. So I set the threshold very high.

The segmentPerson uses a configuration object. There are several parameters that affect sensitivity. The two parameters I use are segmentationThreshold and scoreThreshold.

But it doesn’t recognize multiplayer scenes very well. In this case, segmentMultiPersonParts is more appropriate. Inside the configuration parameters, I set maxDetections to 1. This tells the posture estimator how many people to look for (but does not affect site detection).

Finally, you can use this model to make several performance-changing choices, which I’ll cover in the next section.

CPU usage

BodyPix has a high CPU usage. I have a 2.9ghz 6-core Intel Core I9 on my MacBook, so this isn’t a big deal for me, but it’s more pronounced on slower machines. Tensorflow.js also makes use of gpus as much as possible, so if you don’t have the right GPU, it will slow down. This is evident in frames per second (FPS) rates.

Adjusting model parameters

BodyPix has many model parameters that sacrifice accuracy for speed. To help test these parameters, I added four option buttons that will load the model using the following Settings.

More accurate Settings would certainly reduce false positives.

What we found: The regular version consumed an entire vCPU core on my new MacBook and ran at about 15 FPS.

It’s not just BodyPix. The same is true of the MobileNet + KNN classifier example I fixed earlier, where the video was smaller but not as good.

I have a Windows Surface Go with a slower dual-core Pentium 1.6 GHz that runs the regular model at 5 FPS. It does take longer to load the model onto this computer.

Tweaking Settings improves CPU and CPS, but only a few percentage points on CPU and 1 or 2 FPS. This is much lower than I expected.

Whether it can be improved

Tensorflow.js actually has a Web Assembly library that runs significantly faster. I hoped this would help solve the CPU consumption problem, but I ran into two problems:

The WASM backend only works with BodyPix 1.0, but 1.0 is nowhere near as accurate as 2.0;

I couldn’t get it to load with BodyPix at all.

I can solve the second problem, but if it still doesn’t work well, there’s no point. I believe there will be an official version of TFJS-Backend-WASM with BodyPix 2.0 in the near future.

The throttle problem

I noticed that when the video was no longer active on the screen, it slowed down. This is more pronounced on slower machines. So don’t forget that when you have too many tabs open, browsers limit inactive tabs to save resources. I’ve come up with some ways to cope, and I’ll explore them further.

Browser support

Without a doubt, the model works best in Chrome because Tensorflow.js is a Google project. It also works in Firefox and Edge (Edgium). But I couldn’t load it into Safari.

The experiment

All code can be found on the lot, https://github.com/webrtchacks/facetouchmonitor.

In experiments over the past few days, the model has worked well.

Don’t touch your face!

I could have made more tweaks, but in most of the tests THAT I and others have done, the model works just fine. We also encourage you to put forward more suggestions for modification and optimization.

I hope this will encourage you not to touch your face too much to help you stay healthy. If you really want to touch your face, consider wearing a helmet.