To count delta code coverage, you first need to get delta code, and this article describes how to get delta code data quickly and accurately.

All scripts covered in this article are hosted in the GitHub project UnitTestParser. Demo project: UnitTestDemo.

Get the code diff

Git diff output analysis

Git diff: git diff: git diff: git diff: git diff: git diff: git diff: git diff: git diff: git diff

git diff e53b8299 a25e1cc3
Copy the code

You should get the following output:

Where – indicates the deleted row, + indicates the new row (modification is actually delete + new); @@ -30,8 +30,8 @@ indicates that the original 8 lines starting from line 30 have been deleted and 8 lines starting from line 30 have been added.

The following information can be obtained from the above diff results:

  • targetCoverage.mdThe file has been changed
  • targetCoverage.mdLines 30 through 37 are the lines of code for this change.

Diff data optimization

But the targetcoverage.md file only changed 4 lines of code (red/green above), so why does the DIff result say 8 lines were changed?

In fact, for the sake of efficiency (as well as the accuracy of merge, etc.), Git does not compare the diff line by line. Instead, it uses 3 lines as a comparison unit by default. As long as the data in the comparison unit has changed, the unit is considered as changed. So git diff shows that 8 lines have changed, even though only 4 lines have changed. As for why it is 8 lines under this diff, you can think about it again.

While this default behavior of git diff does not affect the presentation of the git diff (we can see that the red/green parts of the image above do have 4 lines), it does confuse us in counting new/modified lines of code. For example, only 4 lines were modified this time, but Git Diff said that there were 8 lines changed, resulting in a large statistical data.

How to solve this problem? Git diff needs further research.

Git Diff has an interesting parameter in its help documentation:

-U<n>
--unified=<n>
Generate diffs with <n> lines of context instead of the usual three. Implies --patch. Implies -p.
Copy the code

That is, you can change the default behavior of Git diff by setting the value for Unified. Try executing again from the command line:

git diff e53b8299 a25e1cc3 --unified=0
Copy the code

You should get the following output:

As you can see, git diff counts the number of new/changed rows when –unified=0 is added.

Parse diFF data

Once you have the exact diff data, you need to convert the git diff output into something your code can read. For incremental code statistics, we want to get the following data:

  • All changed file names
  • All lines of change code corresponding to each change file

Git diff output contains both data, so we need to parse these data from the output.

The UnitTestParser project provides diffParser.rb to parse the final data needed from git diff output.

Execute on the command line

git diff e53b8299 a25e1cc3 --unified=0 > a.diff

ruby utils/diffParser.rb --diff-file=a.diff
Copy the code

You get the following data

{"docs/targetCoverage.md"= > [33, 34, 35, 36]}Copy the code

In the command, key is the name of all the changed files, and value is the changed code line (new/modified code line) corresponding to the file.

conclusion

Using git diff commands and custom parsing of the output, you can easily retrieve the new code data. The next article will continue to show you how to count incremental code coverage in just one step, based on the incremental code data you already have.