From front end engineer to AR engineer

In order to get started with AR, I made a simple Demo of image recognition. When different pictures are aligned with the detection area, different weather effects will appear, as shown below

AR Demo
Making the address

When playing this Demo, you need a computer with a camera and a mobile phone. Move the mouse to the weather icon and there will be a TWO-DIMENSIONAL code. Scan the two-dimensional code with the mobile phone to enter the weather icon web page and scan the icon at the black area of the screen to appear the corresponding animation.

Marker Base this is a Marker Base AR program, Marker Base is to identify a specific image (advanced TWO-DIMENSIONAL code…) While advanced AR may be Marker Less, such as recognizing cars, houses and pedestrians in the environment. At present, due to technical limitations, most AR are Marker base.

In fact, I think AR is more difficult than VR. VR only needs to “output” a virtual world to users, while AR also needs to “output” virtual effects according to the real world “input” by users. “Output” animation is familiar to the front end, but how to handle the real world of user “input” is a big technical challenge. While modern JS engines are fast, single-threaded JS wagons are way behind contemporary Native rockets when it comes to processing incoming image data from cameras.

With the support of WebRTC technology, we can take the data from the camera and draw it into a picture. Once we process the pixels with JS, the nightmare starts. AR applications require very high computing speed, animation needs at least 30 frames to be grudgingly accepted by users, that is to say, to complete image processing -> image recognition -> animation rendering in 34ms time. These three operations are CPU intensive operations, if all use JS to do, certainly cannot guarantee the frame number on the general configuration of the computer. At this point, we need to bring out WebGL and WebWorker to accelerate.

The first step of image recognition is image processing. Color pictures contain too much useless information, and the camera has noise in continuous shooting, so it is necessary to extract the main information and reduce the noise as much as possible.

There are many algorithms for converting to black and white, but most of them are edge detection based on GPU calculation. The points with edge features will be marked as white, and the size of white points in the figure will be screened. Small white points may be noise points of the camera. There is no pressure to do 60 frames with WebGL.

Then read the black and white image pixels for image recognition. It is really rare to realize image recognition by GPU. Here, I choose the third-party neural network (Then read the black and white image pixels for image recognition. It is really rare to realize image recognition by GPU. Here, I choose the third-party neural network (ConvNetJS) for image recognition. For the front end, neural network is still a relatively unfamiliar word. It’s probably a machine learning algorithm. If you’re not particularly adventurous, you can use it as a tool without knowing how to implement it, just as you don’t need to know how to convert JSX to DOM with React. And I’m going to focus on how to use neural networks for recognition.

“Machine learning” and “big data” generally appear at the same time, we build a neural network, after the classification of the image into the neural network, the neural network will automatically adjust the parameters according to the image classification, when the incoming image is enough, the neural network can learn the correct parameters to distinguish the image type.

The whole training process is also carried out in the web page. Firstly, each Maker image is sampled, and JS rotates, scales and shifts these samples to generate a large number of image training sets. After training, the neural network can be used for image recognition. When we change the Maker image, we can retrain the new neural network.

Any picture that is fed into the neural network must be classified, which will cause the neural network to classify incorrectly when the user does not provide any Maker, so an exception class is added during training, and this kind of picture can be any data.

A training set after a sample changeA training set after a sample change

Once the neural network is trained, we can save and load it using JSON. In order to ensure the number of rendering frames of the page, the algorithm of neural network image recognition is placed in WebWorker, so that animation rendering can run smoothly even if image recognition cannot reach 30 frames. For computers with poor performance, the process of image recognition may be slow, but animation playback will not be affected.

And then animation… Not to mention animation, because of the inherent advantages of browsers, whether it’s WebGL,Canvas2D or SVG, animation can run smoothly in modern browsers. But it takes a certain amount of skill to make amazing animations.

Neural networks are useful mainly for image classification, but they are not really for image recognition. Our application might display a 3D model on Maker, and when the Maker position changes, the 3D model moves with it. Mainstream AR applications use SIFT algorithm to achieve this function, this algorithm has a large amount of calculation, JS can not carry such real-time calculation, but this algorithm has a GPU implementation version, if a big bull can use WebGL to achieve SIFT, then it will have a historic role in promoting WebAR.

This Demo is on the cutting edge of technology. Combined with machine learning technology, AR is realized. On the Web side, various MV* frameworks always represent the advanced productivity of UI interface, but when it comes to the image and graphics, it is simply in the Stone Age, with no basic algorithms and poor computing performance, and even the current iOS cannot call WebRTC. But there is a famous saying in foreign countries: “Everything that can be written in JS will be written in JS”. After all, the technology required by WebAR is already in place. WebRTC provides the ability of camera, WebGL provides the ability of parallel computing, and WebWorker provides the ability of multi-threading. These algorithms have been mature in Native and will be translated into JS in the near future. It’s only a matter of time before WebAR catches on. Maybe you’ll have it in your React code somedayThis Demo is on the cutting edge of technology. Combined with machine learning technology, AR is realized. On the Web side, various MV* frameworks always represent the advanced productivity of UI interface, but when it comes to the image and graphics, it is simply in the Stone Age, with no basic algorithms and poor computing performance, and even the current iOS cannot call WebRTC. But there is a famous saying in foreign countries: “Everything that can be written in JS will be written in JS”. After all, the technology required by WebAR is already in place. WebRTC provides the ability of camera, WebGL provides the ability of parallel computing, and WebWorker provides the ability of multi-threading. These algorithms have been mature in Native and will be translated into JS in the near future. It’s only a matter of time before WebAR catches on. Maybe you’ll have it in your React code someday

render(){
 return <ARCamera size={size}></ARCamera>
}
Copy the code

Of course, my personal technical vision is also limited, if you have a better algorithm or related technology links, please point out in the comments, please also click a like!

From front end engineer to AR engineer

Related Posts

Day 45: Learn ECMAScript explicit and implicit stereotypes

A collection of efficient development tools pushed by front-end techies

Learn 3D interface for 3D visualization from scratch