For C++ programming veterans, sometimes they like to rewrite things according to how the compiler works in order to improve the efficiency of the code. It’s smart and a good programmer, but it’s risky. Because sometimes you can make a very subtle mistake with a simple slip of the pen. This article gives a similar example.

This Bug appears in the MySQL source code.

Error code:

static int rr_cmp(uchar *a,uchar *b) { if (a[0] ! = b[0]) return (int) a[0] - (int) b[0]; if (a[1] ! = b[1]) return (int) a[1] - (int) b[1]; if (a[2] ! = b[2]) return (int) a[2] - (int) b[2]; if (a[3] ! = b[3]) return (int) a[3] - (int) b[3]; if (a[4] ! = b[4]) return (int) a[4] - (int) b[4]; if (a[5] ! = b[5]) return (int) a[1] - (int) b[5]; <<<<==== if="" (a[6]="" ! ="b[6])" return="" (int)="" a[6]="" -="" b[6]; ="" a[7]="" b[7]; ="" }<="" pre="">

Description:

This is a classic error when copying and pasting code snippet. Programmers are likely to put "if (a[1]! = b[1]) (int) a[1] -- b[1]; This code is copied several times (and then manually changed the array subscript) to implement a loop. But the programmer forgot to change the index of one row to 5. The result is that the function sometimes returns the correct value (and sometimes not), an error that is difficult to detect. In fact, the bug was so hard to catch that all other tests failed to catch it until we scanned the MySQL source code with PVS-Studio.

Correct code:

if (a[5] ! = b[5]) return (int) a[5] - (int) b[5];

Even though the previous code looks clean and easy to read, it is possible for programmers to miss this error. Because the internal structure of the code block is similar, your instinct is to skim over it and not to focus on reading the code.

The reason the code is written this way is probably because the programmer wants to optimize the code as much as possible. He or she wants to "unroll the loop" manually. But I don't think it's a good idea to be here.

First of all, I doubt that programmers can really do it this way. Remember, modern compilers are so smart that if they really optimize program performance, they'll do it automatically.

Second, attempts to optimize resulted in bugs in the code. If programmers start out by honestly writing a simple loop, they are much less likely to make mistakes.

I suggest writing the method like this:

static int rr_cmp(uchar *a,uchar *b) { for (size_t i = 0; i < 7; ++i) { if (a[i] ! = b[i]) return a[i] - b[i]; } return a[7] - b[7]; }

This has two advantages:

  • This function is easier to read and understand.
  • Reduce your chances of making mistakes when writing code.

As far as performance is concerned, I can say that this version is no slower than the long-written version before it.

This recommended approach really says the following: Keep your code simple and readable. Simple code is usually correct code. Don’t do the compiler’s work — for example, unroll the loop. The compiler knows exactly what to do and doesn’t need your help. Manual code optimization is only appropriate for specific critical code, and only after the profiler has identified that code as a bottleneck.

This error was caught by the PVS-Studio analysis tool. Error text: THE V525 code contains a similar set of code blocks.