Pure front end from 0 to make a gesture recognition, daigo 🤯

preface

I’ve been reading something about graphics, and I wrote aStroke gesture recognitionThis is a small demo that looks something like this:

If you are the first time to see it will certainly feel very interesting 🤩. Haha, without further ado, let’s go straight to masturbation.

Take a few seconds here to think about what you would do 🤔? You’ll remember the questions better if you move on. For example, you may be most concerned with how to identify and compare. I’m going to paste it ahead of timeProject Demo addressHelp yourself when you need it.

Specific steps

Start now 🏎 💨 💨 💨

Step 1: Gesture drawing

The first step is gesture drawing, which is relatively simple. Those who have learned canvas should have seen the implementation of drawing board. This is the same. The difference is that we also draw each coordinate point incidentally in the drawing process, the core code is as follows (can be skipped) :

handleMousemove(e: MouseEvent) {
    if (!this.isMove) return;
    const curPoint = this.getCanvasPos(e);
    const lastPoint = this.inputPoints[this.inputPoints.length - 1];
    / / draw line segments
    CanvasUtils.drawLine(this.ctx2d, lastPoint[0], lastPoint[1], curPoint[0], curPoint[1].'blue'.3);
    // Draw the coordinate points
    CanvasUtils.drawCircle(this.ctx2d, curPoint[0], curPoint[1].5);
    // If you think the number of original points is too high, you can throttle back
    this.inputPoints.push(curPoint);
}
Copy the code

When you finish, it looks something like this:It can be seen from the above figure that the red dots drawn are not uniform, because the hand speed is different in the process of a stroke, so the density is different. Therefore, in order to avoid the influence of this factor, we need to take a new sample.

Step 2: Resample

Sampling methods vary from scene to scene. Here we simply choose bisect line sampling, that is, first calculate the length of the whole gesture (the length of all line segments is added), and then divide the n equal points (arbitrary number of bisect, see the effect adjustment, do not tangle). Note that we have not changed the information of the original coordinate points, and the gesture will be drawn as the original points, so we need to add a variable to store the newly sampled points (all subsequent calculations will be based on the new sample points). This calculation is a bit tricky, so I have prepared a diagram for you to understand 👇🏻 :Then there is the concrete code implementation (you can skip it if you know it) :

export type Point = [number, number];
static resample(inputPoints: Point[], sampleCount: number): Point[] {
    const len = GeoUtils.getLength(inputPoints);
    const unit = len / (sampleCount - 1);
    const outputPoints: Point[] = [[...inputPoints[0]]];

    let curLen = 0;
    let prevPoint = inputPoints[0];

    for (let i = 1; i < inputPoints.length; i++) {
        const curPoint = inputPoints[i];
        let dx = curPoint[0] - prevPoint[0];
        let dy = curPoint[1] - prevPoint[1];
        let tempLen = GeoUtils.getLength([prevPoint, curPoint]);

        while (curLen + tempLen >= unit) {
            const ds = unit - curLen;
            const ratio = ds / tempLen;
            const newPoint: Point = [prevPoint[0] + dx * ratio, prevPoint[1] + dy * ratio];
            outputPoints.push(newPoint);

            curLen = 0;
            prevPoint = newPoint;
            dx = curPoint[0] - prevPoint[0];
            dy = curPoint[1] - prevPoint[1];
            tempLen = GeoUtils.getLength([prevPoint, curPoint]);
        }
        prevPoint = curPoint;
        curLen += tempLen;
    }
    while (outputPoints.length < sampleCount) {
        outputPoints.push([...prevPoint]);
    }
    return outputPoints;
}
Copy the code

Resampling looks like this:Note that if you use n equals, all gestures should be n equals and cannot be changed, otherwise it will be difficult to compare. Additionally we gesture to the center of the calculated out, by the way, is a simple addition of each sample point coordinates average), and gestures to the starting point of (the last point also line) connected to the center, this you can superficial think it represents the general direction of the this gesture, don’t understand can skip, follow-up will be talked about.

Step 2: Pan

In fact, if you want to compare anything, you have to compare it by numbers, not by feelings. I can’t say that the two gestures I think look like it’s just like, it’s just that artificial has no intelligence, so how do we solve this problem? We need to set a standard so that all gestures are compared in the same mold (like if you’re looking for an object, you can’t have a yardstick), such as the same size, the same direction. Or think about how they compare if I write a big 3 down here and a small 3 down here. So what we need to do is normalize gestures (in fact, the dotted boxes in each example are our shelves) and set the stage for subsequent comparisons by going through translation, rotation, and scaling. As for the translation, we have calculated the center point of the gesture just now, now we just need to move it to the center of the canvas, simply calculate the translation distance, and then do the translation operation for all the new sampling points, the example code is as follows:

// Shift each coordinate point
static translate(points: Point[], dx: number, dy: number) {
    points.forEach((p) = > {
        p[0] += dx;
        p[1] += dy;
    });
}
Copy the code

The effect is as follows:Note that we need to move the origin of the top left corner of the canvas to the center of the canvas when drawing. This will greatly facilitate calculation, including subsequent rotation and scaling based on the translation coordinate system.

Step 3: Spin

Careful students will notice that in addition to the dotted box in the middle, we have divided the entire canvas into eight equal parts. Why is this? In fact, as mentioned above, it is because gestures have directionality, for example, I and /. These two gestures should be very similar, but the directions are different, so some rotation is required. The eight bisectors here are the directions we want to approach (the number of bisectors is also optional), so we can simply calculate the direction of the gesture (green line in the picture) and rotate to the direction of the bisector, and then rotate all the points, the code is as follows (can be skipped) :

// calculate the radian to be rotated to the nearest auxiliary line, with center as the center point, startPoint as the startPoint of the gesture, and sublineCount as the number of coordinates bisected
static computeRadianToSubline(center: Point, startPoint: Point, sublineCount: number): number {
    const dy = startPoint[1] - center[1];
    const dx = startPoint[0] - center[0];

    let radian = Math.atan2(dy, dx);
    if (radian < 0) radian += TWO_PI;

    const unitRadian = TWO_PI / sublineCount;
    const targetRadian = Math.round(radian / unitRadian) * unitRadian;
    radian -= targetRadian;
    return radian;
}
// Rotate each coordinate point
static rotate(points: Point[], radian: number) {
    const sin = Math.sin(radian);
    const cos = Math.cos(radian);
    points.forEach((p) = > {
        let [x, y] = p;
        p[0] = cos * x - sin * y;
        p[1] = sin * x + cos * y;
    });
}
Copy the code

Many students may feel that rotation is more difficult than translation, actually very simple, you just need to know how a dot rotating line (line rotation is the rotation of the two endpoints, polygon rotation is the rotation of the multiple vertex), I drew a map of derivation convenient you understand (not interested can also skip) :Then take a look at this step:

Step 4: Zoom

We’re going to be drawing different gestures each time, so we’re going to have to unify them into one size, so we’re going to do a pinch. Let’s say we’re going to put a600 * 600Put the gesture into one100 * 100In the container (the dotted box in the picture), it will be shrunk by a factor of 6. So how do we figure that out? First of all, we need to figure out the size of the bounding box of the gesture, and here we use AABB model (and OBB, ball model, etc.). What is AABB bounding box? It is simple to find the maximum and minimum x and y values of all sampling points, like the following:Now you just divide the length of the container by the longest side of AABB, and you get the scaling factor. Then, again, iterate over all the points to scale, with the following code:

// Again, since we have moved the coordinate system to the center of the canvas, the center of the canvas and the center of the gesture overlap, we can simply multiply the zoom speed
static scale(points: Point[], scale: number) {
    points.forEach((p) = > {
        let [x, y] = p;
        p[0] = x * scale;
        p[1] = y * scale;
    });
}
Copy the code

The renderings are as follows:Note that it does not mean that the zoomed figure must be inside the dotted box, but that the zoomed figure is about the same size as the dotted box.

Step 5: Gesture input

This is a simple save data, can be divided into two steps:

Thumbnail: Dynamically create a canvas to draw gestures on, and then draw the image onto the canvas. This is exactly the same as the first step, but the image is smaller. You can draw with raw points or sample points (original points are more accurate), after all, it’s a thumbnail, you can’t tell much difference.
Save data: Sampling coordinates are definitely saved, since we’ve worked so hard to standardize for so long, and everything else can be saved.

Step 6: Compare (highlights)

Suppose we have two standardized gestures, how do we know they are similar? If you haven’t read about it, you probably don’t understand it. I. Also 😂… Also, stop and think for a few seconds here 🤔… Ok, in fact, gesture similarity can be turned into the problem of whether two sets of sampling points are close enough. An intuitive solution is to calculate the distance between two sets of sampling points to see whether it is less than a certain threshold, similar to the following:If you don’t understand it, think of it as a sampling point (it becomes the distance between two points 🐶). The specific code is as follows:

static squaredEuclideanDistance(points1, points2) {
    let squaredDistance = 0;
    const count = points1.length;
    for (let i = 0; i < count; i++) {
        const p1 = points1[i];
        const p2 = points2[i];
        const dx = p1[0] - p2[0];
        const dy = p1[1] - p2[1];
        squaredDistance += dx * dx + dy * dy;
    }
    return squaredDistance;
}
Copy the code

The fancy name for this method is Euclidean distance.~~Good good, do not install 😂~~). But there is a better similarity algorithm for our scenario (~~Algorithm? Slip slip slip!~~), so next we introduce the concept of cosine similarity (~~It’s not difficult, I have drawn a picture of the package will see the package~~) :If the Euclidean distance comparison is used in the figure above, it is clear that AC is closer and more similar. If you compare them with cosines, AB is obviously more similar. This is because Euclidean distances yield absolute differences and cosine similarity compares relative differences (carefully pinpin🍚). So why can the size of the Angle determine the similarity of two points? And the main thing that this method is trying to determine isAt two o ‘clock directionAs you can see, even though vector B is long, it doesn’t affect its orientation, so B’s target orientation is closer to THAT of A, which is easier to understand with mechanics. Take A look at the following picture:The smaller the Angle, the more consistent the direction of the force, the more we can pull an object (we are people with the same goal, similar). So how do we calculate cosine similarity for all these points? If we go back to the formula for the Angle, since it depends only on the direction, and not on the lengths of vectors A and B, we can generally convert A and B into unit vectors (that is, vectors divided by their lengths), so that the length of A and B is 1, and then the cosine can look like this:Now that it’s suddenly easier, let’s try to turn the gesture into a vector (i.e., a long array).We can call the converted one-dimensional array the features of the gesture, save it as data, and then take the array and calculate the cosine the next time we compare.

// Calculate the cosine similarity
static calcCosDistance(vector1: number[], vector2: number[]): number {
    let similarity = 0;
    vector1.forEach((v1, i) = > {
        const v2 = vector2[i];
        similarity += v1 * v2;
    });
    return similarity; // The similarity is between -1 and 1
}
Copy the code

Cosine similarity is useful in many situations, such as the application of word vector in article similarity (digs too far), so here is a brief review of its specific ideas:

Find a way to convert the raw data into a one-dimensional array of the same length[a, b, c, ..., n].~~(This is a one-dimensional array, but it is an N-dimensional vector.)~~.
Iterate through the existing data, respectively calculate the corresponding cosine value, find out the highest similarity value of the one.

Matters needing attention

Gestures are directional: we recognize them|and/As they pass through the rotation are all closeyAxis, but|andoneIt’s not going to work. One is the Y-axis and one is the X-axis. So if we’re going to take|andoneTo identify as a thing that can be done this way|Rotate a few more angles and see if they are similar at each Angle.
The aspect ratio of the gesture affects the result: if you draw a square, for example, it is not similar to a long, flat rectangle.
The number of sampling points: too much or too little is no good, too much efficiency is low, the graph consistency is also high, and vice versa.
Complexity of gestures: The recognition rate of a graph has little to do with the complexity of the graph. Simple graphics, such as polygon and circle, are prone to errors due to their obscure features. For complex graphs, sample points are easily diluted and the features obtained are coarse.
Application scenario: You can think of this thing in addition to gestures can be used where? Here, for example, such as mathematics teacher in remote class, writing blackboard writing, often need to hand paint round or square, here we can help the automatic correction, if painting is like a circle is automatically regenerated a perfect circle, maybe described is pale, so you can imagine the picture 😂 itself.

Basic set of comparison (skip)

Here’s a quick supplement to the general routine of comparing two things to see if they are similar:

Feature extraction (that is, the process of handling data) : no matter what things, all have the corresponding original data, we need to do is to convert its (through layer upon layer processing) to the same framework dimensions (i.e., standardization), is usually to convert the raw data growth degree of the same one-dimensional array (again, although is a one-dimensional array, is actually a n d vector).
Algorithmic identification (the process of comparing data) : comparing data one by one using an algorithm such as Euclidean distance and cosine similarity mentioned above. Similarly, there are grid recognition (the image is Mosaic, the pixel granularity becomes thicker, and then compared according to the pixel color difference, this method is suitable for finding the original image from the thumbnail), orientation recognition (for example, as long as the gesture order is first right, then down, then left, then up), and so on.

Obviously, different features and algorithms can result in wildly different results (efficiency, accuracy, etc., and salary 🤨?). , the means of optimization is also a hundred flowers bloom, so there is no general algorithm, only suitable algorithm, according to local conditions. Let’s take an extremely simple recommendation algorithm as an example, where the recommendation algorithm question can be translated to some extent into how similar two people’s preferences are:

Be fond of	rice	Touch the fish	Go to bed	Is to play	.
A (salted fish)	✅	✅	✅	✅	.
B (Turning over)	❌	❌	❌	❌	.
I	✅	?	✅	?	.

This and our gesture recognition can not be said to be very similar, can only be said to be exactly the same, in the existing gesture (a or B) to find a higher similarity with me (like), each line is actually a series of sampling points, and finally can simply infer that I (may) be a salty fish 😂, but also like to touch fish. For example, if you want to buy a computer, chances are that you will first look at what people around you are using, and then you will buy what you want. Conformity is also similar in nature (the choice of the masses is the direction). If you say you’re independent and you can buy whatever you want, that’s true, because you can’t get 100% of that stuff.

About Multiple strokes (skip)

Our literature is a single stroke, now you can think a little bit, if it is multiple strokes should do? Here you can still think for a few seconds 🤔… For a simple stroke of the above recognition effect is very good, whether it is efficiency and accuracy, but if it is a multi-stroke, it is complicated, such as the recognition of Chinese characters (think about the head big 🤯). Here is a simple identification method, is to separate multiple strokes into a single stroke, through the learning of this article you can figure out the similarity of each single stroke, and then a simple sum can get the similarity of the whole font, finally take the highest similarity can (this??) . Here’s an example of a concrete point, like the character ten (this is just an example, not quite) :

To extract stroke features of each Chinese character, starting point, ending point and turning point in the middle can be generally collected. The data would look something like this:

Processing data (standardized processes, such as moving each word to the center of the canvas and scaling to the same size)
Compare data (choose an algorithm, here is to judge the number of strokes, and then simply add and sum the similarity of each stroke)

That’s it? Of course it’s not even close. There are a lot of questions. Such as:

Because of the conjunctions, one stroke may be written as two strokes, so we should allow the error of strokes to be about 2, but in the final sorting, the closer the strokes are, the higher the priority will be.
Each stroke contains at least a starting point and an ending point, and there may be several inflection points in the middle. What should be done if the number of coordinate points in a single stroke is different when comparing? One way is to perform interpolation calculation, the other way is to take the initial sampling point information.
So if I write adingThe word seems to be able to recognize, generally is a horizontal and vertical, is there any way to avoid it? Of course there is. Now each stroke is no longer the coordinate of the point, but the Angle of the line between the point and the previous point. If it is the starting point of each stroke, we take the end of the stroke as the previous point, which is quite abstract, so I drew another graph 👇🏻 (very simple graph, don’t be scared by 😂) :

So let’s think about what iftenCharacters are clearly distinguishable in the second Angle (green 2) in the figure above. And we only saved the Angle between the two points, which saved a lot of space.

Looks like it’s all right? No, not even close. Think about what happens if the strokes are out of order. Or in thetenFor example, what if I write vertical first and then horizontal. Ah, it’s… In fact, there are other methods of recognition, such as the text according to the axis of the four pieces, four pieces of verification, this is not in-depth, point to stop (~~After all, they barely know the surface~~).

summary

The above is the general idea of gesture recognition, although it seems to be a pretty lofty thing, but after reading it should feel. No.. That’s hard… Some things you don’t just don’t know and don’t try, heh heh. Finally, I would like to send you the project address portal again, along with two other practical articles from my Canvas column:

Html2canvas having problems? Write a hand to know why 💥
🔥 use canvas to draw a function curve! Longitudinal silky

Ok, welcome to like your comments, see you next time, nothing 👋🏻.