Cost polymerization of binocular stereo matching

Make writing a habit together! This is the third day of my participation in the “Gold Digging Day New Plan · April More text Challenge”. Click here for more details.

This article is my own study notes, record the problems in the process of learning cost matching, here to share, reprint please attach the original link

Cost aggregation has been briefly introduced in the previous step of binocular stereo matching. Frankly speaking, its main purpose is to make the matching cost calculation better.

Before also introduced the first step of stereo matching – matching cost calculation matching cost calculation, I introduced the relevant matching cost calculation method in this paper, such as AD, AD-Cencus and so on. Through these methods, we can obtain a cost matrix C(DSI), which stores the matching generation value of each pixel within the parallax range.

Can I just use the cost matrix C that I get? Yes, of course, but the effect is relatively poor. At this time, we need to improve the previous matrix C, and the way to improve it is cost aggregation. First of all, let’s take an intuitive look at the disparity of the parallax diagram before and after cost aggregation:

FIG. 1 Parallax diagram before and after cost aggregation

It can be seen that the cost of polymerization before and after the effect gap is still very large. I’m sure a lot of people ask why is there such a big difference? This is because the cost calculation in the first step only considers the local correlation, which is very sensitive to noise. As a result, the parallax cost corresponding to the correct point with the same name is not the minimum, so the correct point with the same name cannot be found. May someone ask again, that cost aggregation can make up for the above defects? Don’t worry, let’s talk about the process of cost aggregation in detail.

First of all, it should be made clear that only local matching algorithm and semi-global matching algorithm (SGM) need cost aggregation, and global matching algorithm is not required. The following uses the semi-global matching algorithm (SGM) as an example. In order to obtain better matching effect, THE SGM algorithm still adopts the idea of global stereo matching algorithm, namely the global energy optimization strategy. Simply speaking, it is to find the optimal parallax of each pixel to minimize the global energy function of the whole image. The global energy function is defined as formula 1:

Edata(d) is the data item, and is the measure of the overall matching cost corresponding to the response parallax graph. Esmooth(d) is a smoothing term. In order to make the parallax graph meet the constraints of some conditional assumptions, such as the continuity assumption of the scene surface, the smoothing term will punish the parallax change of adjacent pixels exceeding a certain pixel (the punishment is generally to increase the energy value). The continuity of the above red characters has been mentioned in the second section of the steps of binocular stereo matching.

For formula one, our goal is to minimize the energy function, but the energy function minimization is a two-dimensional optimal problem, is a np-complete problem (what is a np-complete problem you can check data, anyway I don’t understand, are interested can look at, I think it is for this problem is more troublesome, so can change an idea to solve it!!!) . Enmm, to say the least, nP-complete problem is too difficult, so SGM algorithm uses a method similar to scan line or single-direction dynamic programming, using one-dimensional path aggregation to approximate two-dimensional optimal, which is more efficient and equivalent to other solutions.

Then SGM proposed a more specific energy function expression, as shown in Formula 2:

Where, C is the matching cost; the first item of the formula is the data item, representing the sum of the matching cost of all pixels when the parallax graph is D; the second and third items are smoothing items, representing N of pixel P_pAll pixels q in the neighborhood are penalized. The second of these is less punitive (P₁Small), which means that the parallax change of adjacent pixels is very small (1 pixel) is punished; The third is more punitive (P₂>P₁), which penalizes the case where the parallax of adjacent pixels varies significantly (greater than 1 pixel). Smaller penalty term can make small algorithm can adapt to changes of parallax, such as continuous curved surface, inclined plane or larger penalty term can let algorithm dealing with parallax discontinuous, but because of the brightness of the image edge position (that is, the location of the gradient is larger) border is prospects background probability, the location of the pixels in the neighborhood is often not parallax continuous, In order to protect the parallax discontinuity in real scenes, P₂It is usually dynamically adjusted according to the gray difference of adjacent pixels, as shown in Formula 3:

P2′ is the initial value of P2, usually set to something much greater than that. The meaning of this formula is: if the brightness difference between a pixel and its neighbor pixel is large, then the pixel is likely to be located in the parallax discontinuous region, and the parallax difference between the pixel and its neighbor pixel is allowed to exceed 1 pixel to a certain extent, so the punishment for exceeding 1 pixel will be appropriately reduced. Where I represents gray value.

But in fact, equation 2 is also an NP-complete problem, ENMM, which is more difficult… Therefore, THE SGM algorithm proposed the idea of cost aggregation, that is, the matching costs under all parallax of pixels are aggregated by all paths around the pixel in one dimension to obtain the path costs under each path, and then all path costs are added to obtain the matching generation value after the pixel aggregation. The path cost of pixel P along a certain path r is calculated as shown in Formula 4:

The first thing to note is that formula 4 and Formula 2 are basically the same. The first item is the matched generation value C, which belongs to the data item. The second term is the smoothing term, indicating that the cumulative value to the path cost is the minimum value in the three cases of no punishment, P1 punishment and P2 punishment. The third item is to ensure that the value Lr of the new path generation does not exceed a certain numerical upper limit, i.e

Here is the relevant explanation of Formula 4. First to overall analyze the formula, that is, a pixel in the aggregate price when parallax for d is equal to the initial generation of C plus four operation value of the minimum value minus another minimum (no extra calculation, the minimum value in the previous four calculation, the minimum value also does not have any special meaning, just to let the L value does not exceed a certain limit, As shown in Formula 5).

Secondly, p in the formula represents pixel and R represents path. In the case of left-right path, P − R is the adjacent pixel to the left (left-to-right aggregation) or right (right-to-left aggregation) of P, with the same row number and 1 difference in column number. L is the aggregate value and C is the initial value.

If you’re still confused about the second term of the formula, here’s how to explain it:

In fact, the core of cost aggregation here has been finished, the rest is to use the above formula to calculate according to the picture. That’s it!! ? In fact that’s the theory, but if you like I was full of doubt, don’t know how to implement this process, the formula of the expression is not well understood, the following about the problems encountered during my study and my thinking (PS: may not be very correct, only represent my personal understanding, if there are any errors, hope bosses pointed out).

Firstly, as for p-r in the formula, my understanding is that this algorithm actually needs to be solved continuously by iteration, and every time it needs to be calculated along a path to the end, so P-R is the point on a path where p is one unit away, and p-2r is the point on a path where P is two units away, and so on.

Second: about the formula in the second minLr (p – r, I), my understanding is that it should meet the | | I – d > 1, the related pixels of parallax is not the second top three conditions, namely pixels of parallax > 1. Actually I think theory also is such, that is, I want to meet | | I – d > 1, but for the second formula, have this limitation is actually does not affect the results, because if I don’t meet | | I – d > 1, namely | | I – d < = 1, then the first four parts of binomial son minimum value will not be in in, (P2>P1) in the first three parts, this term will be meaningless, so the range of I is not discussed in this formula.

Thirdly, I did not understand the specific process of this aggregation before. After reading many articles, I have my own understanding (please point out the mistakes). Now I will explain it through some illustrations. , for example, we now need to compute aggregate generation value of point p, so before we calculate is certainly know p each parallax, the initial matching cost (is stored in stereo matching of cost calculation of matrix, C), all for p, request p path aggregation cost, then according to the formula 4, The first part of C(p,d) is definitely given. If we understand it here, the initial matching cost under various parallax can be found in the initial cost matrix C for any pixel in the image. In other words, C(p,d) of the first part is known to Formula 4. Looking at the second term of Formula 4, it is not hard to see that this term is mainly to find the path aggregation cost of the previous pixel. Path cost of previous pixel?? We don’t know, it’s just a recursive process, finding the path cost of the previous pixel up to a particular pixel. So this particular pixel here can be interpreted as if I set a number of recursions, and when I recurse, I recurse to some pixel P, and that p pixel is what I call a particular pixel. Somebody’s going to ask, p pixel and I don’t know the path aggregation cost, by the way, we’re going to initialize it so that the aggregation value of p pixel L is equal to its initial value C. At this point, we have the path aggregation cost of the first pixel, so we can calculate the path aggregation cost of other pixels step by step, of course, the path aggregation cost of the target pixel can also be calculated. The following graph shows the cost aggregation process of a pixel in one direction.

FIG. 2 Cost aggregation process of a pixel in one direction

Tutu 2 represents a direction of the cost of a pixel polymerization process, assuming that the scope of the parallax only 10, then the pixels in the initial matching costs under each parallax is known, and in the matrix C, figure of the current pixel matching cost is the initial matching cost, a pixel matching cost generation value after polymerization, The current matching cost after pixel update is calculated according to Formula 4 (note: the third term is not subtracted, because the values of P1 and P2 are not fixed and the smallest one cannot be determined, so the operation of subtraction is omitted).

Shout… Finally, I’m almost done… Formula 4 above introduces the value of aggregation cost on a certain path, so the aggregation cost formula of the total path is as follows:

Type r polymerization in the path, in general, there are four paths (red arrow direction), 8 path (red + black arrow direction) and 16 path (white arrow direction), three kinds of polymerization route number, the more the better the results, but also more time consuming, often need to balance the pros and cons, according to the actual requirements of applications to choose the right path. (Note: In practice, the effect and cost performance of 4 paths are usually good.) The following is a schematic diagram of path aggregation:

Polymerization diagram we often use the path to figure 3 is 4,8,16, so we are starting from the common situation, can say path for no more than 16, Then according to formula 5 and 6, we can easily deduce that S<=16(Cmax+P2). According to this equation, we can control the range of aggregation cost by choosing an appropriate value. What is the use of controlling the range of aggregation cost? This reduces the memory footprint of the aggregate cost array. For example, if the maximum matching value C obtained by the 5×5 window Cencus transformation method is not more than 25 (because the matching cost of Cencus transformation is the number of two bits xor the last binary bit is 1, and the maximum effective length of all Census bit string after Census transformation is 25), the matching cost only needs to be stored by one byte. When P2≤ 65,55/16 −25 (216=65536), S can be stored with only two bytes, because the storage cost of C and S space size is W × H × D. When the image size is large, the memory occupation is huge, so it is necessary to reduce the number of bytes required for element storage.

Shout shout shout… Finally finished, this article is an own study notes, which is not rigorous and incorrect place I hope you point out, common progress!!

Swish swish duang give it a thumbs up

Refer to the article: ethanli.blog.csdn.net/article/det…

Parallax optimization of binocular stereo matching

Cost polymerization of binocular stereo matching

Related Posts

How to start machine learning from scratch

Object-oriented programming ideas and Python classes, class methods and properties, instance methods

Robot Internet and Machine Evolution from RoboEarth