This article explains OCLint source code analysis and workflow analysis

The target readers

Front-line engineer, architect

Estimated reading time

15 to 20 min 🐢

Complete the reading harvest

Understand the principles of static code review techniques
Understand the technical workflow of static code review

Clang has to be mentioned

Because OCLint is a static code analysis tool based on the Clang tool, Clang has to be mentioned. Clang, a subproject of LLVM, is a compiler used to compile C, C ++, and OC.

OCLint itself is based on Clang Tool, in other words, a layer of encapsulation. Its core capability is to analyze the Clang AST, output code information that violates the rules, and export the report in the specified format.

Let’s take a look at what the Clang AST looks like as input.

Clang AST

Clang AST is an intermediary at compiler compile time, from lexical analysis, syntactic analysis (generating AST), to semantic analysis, generating intermediate code.

Abstract syntax tree example

Here’s a glimpse of the abstract syntax tree.

//Example.c
#include <stdio.h>
int global;
void myPrint(int param) {
    if (param == 1)
        printf("param is 1");
    for (int i = 0 ; i < 10; i++ ) { global += i; }}int main(int argc, char *argv[]) {
    int param = 1;
    myPrint(param);
    return 0;
}
Copy the code

Here you can clearly see the relationship between each element of this code and its children. There are two types of nodes, one is Stmt class, including Expr expression class is also inherited from Stmt, it is a statement, has a certain operation; Another large class of elements is the Decl class, definitions. All classes, methods, and function variables are a Decl class (the two classes are incompatible and require special container nodes to convert, such as DeclStmt). In addition, as you can see from the data structure, the tree is one-way, only accessed from a top element down.

In the terminal, you can view the syntax tree with the following command:

clang -Xclang -ast-dump -fsyntax-only Example.c
Copy the code

Access the abstract syntax tree

Both Stmt and Decl have their own iterators, which can easily traverse all node elements and then judge their types for operation. But there is a more convenient way to do this in Clang: inherit the RecursiveASTVisitor class. It is an AST tree recurser that can recursively access all nodes of an AST tree. The most common methods are TraverseStmt and TraverseDecl.

For example, if I wanted to access all the functions in a piece of code called FunctionDecl and print the names of those functions, I would override (by custom checker) a method like this:

bool VisitFunctionDecl(FunctionDecl *decl){
    string name = decl->getNameAsString(a);printf(name);
    return true;
}
Copy the code

Thus, we can access all of the FunctionDecl nodes in the AST tree and print out the function names.

Let’s take a look at the source code for OCLint and see how it works!

OCLint source code analysis

Take a look at the core class diagram first, and with a little bit of an impression, let’s move on to the code 👀

Find the entry file oclint/driver/main.cpp and the entry function main().

A simplified code framework for this file is shown below:

int main(int argc, const char **argv)
{
    llvm::cl::SetVersionPrinter(oclintVersionPrinter);
    // Construct parser
    CommonOptionsParser optionsParser(argc, argv, OCLintOptionCategory);
    / / configuration
    oclint::option::process(argv[0]); ./ / structure analyzer
    oclint::RulesetBasedAnalyzer analyzer(oclint::option::rulesetFilter().filteredRules());
/ / driver
    oclint::Driver driver;

    // Perform analysis
    driver.run(optionsParser.getCompilations(), optionsParser.getSourcePathList(), analyzer);
    
    std::unique_ptr<oclint::Results> results(std::move(getResults()));

    ostream *out = outStream(a);// Output a report
    reporter() - >report(results.get(), *out);
    disposeOutStream(out);

    return handleExit(results.get());
}
Copy the code

ConstructCompilers (), Invoke (), run()

// Build the compiler
static void constructCompilers(std::vector
       <:compilerinstance>
         &compilers, CompileCommandPairs &compileCommands, std::string &mainExecutable)
       
{
    for (auto &compileCommand : compileCommands) // Iterate over the compilation command set
    {
        std::vector<std::string> adjustedCmdLine =
            adjustArguments(compileCommand.second.CommandLine, compileCommand.first);

#ifndef NDEBUG
        printCompileCommandDebugInfo(compileCommand, adjustedCmdLine);
#endif

        LOG_VERBOSE("Compiling ");
        LOG_VERBOSE(compileCommand.first.c_str());
	std::string targetDir = stringReplace(compileCommand.second.Directory, "\\ "."");

        if(chdir(targetDir.c_str()))
        {
            throw oclint::GenericException("Cannot change dictionary into \"" +
                targetDir + "\","
                "please make sure the directory exists and you have permission to access!");
        }
        clang::CompilerInvocation *compilerInvocation =
            newCompilerInvocation(mainExecutable, adjustedCmdLine);// Create the CompilerInvocation object
        oclint::CompilerInstance *compiler = newCompilerInstance(compilerInvocation);
// Create oclint's CompilerInstance using clang's CompilerInvocation object, which oclint encapsulates
        compiler->start(a);// the core of clang::FrontendAction is to get the action and execute it
        if(! compiler->getDiagnostics().hasErrorOccurred() && compiler->hasASTContext())
        {
            LOG_VERBOSE(" - Success");
            compilers.push_back(compiler); // The oclint encapsulated CompilerInstance object is put into the collection
        }
        else
        {
            LOG_VERBOSE(" - Failed");
        }
        LOG_VERBOSE_LINE(""); }}// The actual analysis of the call method
static void invoke(CompileCommandPairs &compileCommands, std::string &mainExecutable, oclint::Analyzer &analyzer)
{
    std::vector<oclint::CompilerInstance *> compilers; // The compiler container
    constructCompilers(compilers, compileCommands, mainExecutable);  // Build the compiler

    // collect a collection of AST contexts
    std::vector<clang::ASTContext *> localContexts;
    for (auto compiler : compilers) // Iterate over the compiler collection
    {
        localContexts.push_back(&compiler->getASTContext()); // Put the AST context into the context collection
    }

    // use the analyzer to do the actual analysis
    analyzer.preprocess(localContexts); // Feed the context collection into the parser for preprocessing
    analyzer.analyze(localContexts); / / analysis
    analyzer.postprocess(localContexts); // Send processing

    // send out the signals to release or simply leak resources
    for (size_t compilerIndex = 0; compilerIndex ! = compilers.size(a); ++compilerIndex) { compilers.at(compilerIndex)->end(a);delete compilers.at(compilerIndex); }}// main.cpp is the core method called to perform analysis
void Driver::run(const clang::tooling::CompilationDatabase &compilationDatabase,
    llvm::ArrayRef<std::string> sourcePaths, oclint::Analyzer &analyzer)
{
    CompileCommandPairs compileCommands; // Generate compile instructions to the container
    constructCompileCommands(compileCommands, compilationDatabase, sourcePaths); // Construct the compiler instruction pair

    static int staticSymbol; // static symbol
    std::string mainExecutable = llvm::sys::fs::getMainExecutable("oclint", &staticSymbol);// Get the path to the Oclint executable

    if (option::enableGlobalAnalysis()) // Enable global analysis
    {
        invoke(compileCommands, mainExecutable, analyzer);// Call the invoke method, and note that Analyzer also incorporates the parameter
    }
    else 
    { // Non-global parse cases are parsed individually to compileCommand
        for (auto &compileCommand : compileCommands)
        {
            CompileCommandPairs oneCompileCommand { compileCommand };
            invoke(oneCompileCommand, mainExecutable, analyzer); }}if (option::enableClangChecker()) // Enable clang Checker
    {
        invokeClangStaticAnalyzer(compileCommands, mainExecutable); // Call clang's static parser}}Copy the code

The last one is the RulesetBasedAnalyzer class, which has very little code, as shown below

void RulesetBasedAnalyzer::analyze(std::vector<clang::ASTContext *> &contexts) { for (const auto& context : contexts) { LOG_VERBOSE("Analyzing "); auto violationSet = new ViolationSet(); auto carrier = new RuleCarrier(context, violationSet); // The violationSet is used to store the processed result set LOG_VERBOSE(carrier->getMainFilePath().c_str()); For (RuleBase *rule: _filteredRules) // the commuter plane was waiting for... / / call the rules of takeoff} ResultCollector * results = ResultCollector: : getInstance (); Results ->add(violationSet); // Add rule processed data to collector LOG_VERBOSE_LINE(" -done "); }}Copy the code

From the code above it can be seen that Analyzer will traverse the rule set to call rule’s reach method. Rule’s base class is RuleBase. This base class contains an example RuleCarrier as a member. RuleCarrier contains the ASTContext and violationSet for each file. A violationSet is used to store information about a violation. The rule’s job is to check the ASTContext of its member variable, ruleCarrier, and write the result to ruleCarrier’s violationSet.

Advanced: User-defined rules

So far, we have seen the basic usage of Oclint and the workflow.

The next part that is more flexible and more difficult to use is custom rules.

Rules must implement the RuleBase class or an abstract class derived from it. Different rules focus on different levels of abstraction; for example, some rules may have to delve very deeply into the control flow of code, whereas some rules detect defects simply by reading strings of source code.

Oclint provides three abstract classes for writing custom rules. AbstractSourceCodeReaderRule (source code reader rules), AbstractASTVisitorRule (AST visitor rules), and AbstractASTMatcherRule (AST matcher rules).

According to the official documentation, because OF the readability of AST matcher rules, unless performance is a big issue, we will probably choose to write AST matcher rules most of the time.

The AST visitor rule is based on the visitor pattern, and you just need to override certain methods (this abstract class provides the interface through which a list of nodes are accessed) to handle the validation logic within the corresponding node. Because OCLint uses an abstract syntax tree generated by Clang, knowing the Clang AST API is very helpful when writing rules.

The AST matcher rules are based on matching patterns, you need to construct some matchers and load them. Once a match is found, callback calls method with that AST node as an argument, and you can collect violation information in callback. (See here for more information on matchers)

That’s all we need to know, except that Oclint provides abstract classes for implementing custom rules. The section on how to write a rule expands in the next section.

Create rules — scaffoldRule scripts

This is a scaffold provided by Oclint. Using scaffolding to create rules You can use this script to easily create custom rules.

Write rules

Read oclint’s official documentation and read the introduction to Clang AST. We now know the general workings of Oclint. First, by calling Clang API to generate the corresponding AST of source files one by one; Secondly, each node in the AST is traversed and the violation is written into the violation result set according to the corresponding rules. Finally, according to the configured report type, the violation results are output to the specified report format.

Start with a brain map of Oclint rules for writing ideas to get a preliminary impression.

According to the above, we have now got an XcodeProj project. You can now open the CPP source file for the rule we created.

First of all, we can see that there are nearly 2000 lines of template code using scaffold-generated rules. Isn’t that a bit scary? Don’t worry. Most of these templates are methods that start with Visit, which is the callback method that Oclint provides, which is triggered when the corresponding node on the AST is accessed.

Let’s look at a practical example of a rule that has been used in code reviews for iOS groups. What this rule does is roughly the following: it checks the format of the if else conditional branch as required by the Cocoa specification. The format is that if else is separated from the following parentheses and curly braces, you can use Spaces and newlines. Example code is as follows:

void example(a)
{
    int a = 1;
    if(a > 0) { // (there is no space on the left or a newline is not compliant
        a = 10;
    }
    
    if (a > 0) {//) There is no space on the right side or a newline is not compliant
        a = 10;
    }
    
    if (a > 0)
    {
        a = 10;
    }else { } There is no space on the right or improper newline
        a = - 1;
    }
    
    if (a > 0)
    {
        a = 10;
    } else{ // {no space left or improper newline
        a = - 1; }}Copy the code

First, use dump on your terminal to check the AST (you’ve already seen how to check the AST, but you’re advised to check it if you haven’t already).

A series of colorful characters flashed across the screen and finally stopped here! Yeah, that’s exactly what we’re looking for.

You can clearly see the variable declaration VarDecl at the top and the conditional statement IfStmt below.

The node name to be checked has been determined, which is IfStmt.
Next, look for the corresponding callback method in the generated rule template. I suppose it’s called VisitXXIfStmt or something. Sure enough, we found it! VisitIfStmt looks like just what we need.
Next, we need to get the node name and node description. (See the full rules file below for detailed code.)
The final step is to determine whether the method name here is consistent with the rule. (Use LLVM, Clang, and various STD functions if you need them.)
If the method name detected is invalid, add the node and its description to the violationSet.

At this point, the overall writing process is complete. After reading the sample code below and reading a few more official rule codes, you will soon be able to write your own rules.

Here is the full implementation of the above rule:

#include "oclint/AbstractASTVisitorRule.h"
#include "oclint/RuleSet.h"

using namespace std;
using namespace clang;
using namespace oclint;

class KirinzerTestRule : public AbstractASTVisitorRule<KirinzerTestRule>
{
public:
    virtual const string name(a) const override
    {
        return "if else format";
    }

    virtual int priority(a) const override
    {
        return 2;
    }

    virtual const string category(a) const override
    {
        return "controversial";
    }

#ifdef DOCGEN
    virtual const std::string since(a) const override
    {
        return "20.11";
    }

    virtual const std::string description(a) const override
    {
        return "Used to check if the parentheses in the if else condition branch conform to the coding specification.";
    }

    virtual const std::string example(a) const override
    {
        return R"rst( .. code-block:: cpp void example() { int a = 1; If (a > 0) {// (a = 10; } if (a > 0){// a = 10; } if (a > 0) { a = 10; }else {//} a = -1; } if (a > 0) { a = 10; } else{// {a = -1; } } )rst";
    }

#endif
    
    bool VisitIfStmt(IfStmt *node)
    {
        clang::SourceManager *sourceManager = &_carrier->getSourceManager(a); SourceLocation begin = node->getIfLoc(a); SourceLocation elseLoc = node->getElseLoc(a); SourceLocation end = node->getEndLoc(a);int length = sourceManager->getFileOffset(end) - sourceManager->getFileOffset(begin) + 1; // Calculate the length of the source code for this node
        string sourceCode = StringRef(sourceManager->getCharacterData(begin), length).str(a);// Reads character data from the start position with the specified length
// printf("%s\n", sourceCode.c_str());
        
        // Check the if open parenthesis
        std::size_t found = sourceCode.find("if (");
        if (found==std::string::npos) {
//            printf("if ( 格式不正确\n");
            AppendToViolationSet(node, Description());
        }
        
        // Check the if closing parenthesis
        found = sourceCode.find("{");
        if (found==std::string::npos) {
            found = sourceCode.find(")\n");
            if (found ==std::string::npos) {
// printf("if the closing parenthesis format is not correct \n");
                AppendToViolationSet(node, Description()); }}// No more checking if there is no else branch
        if(! elseLoc.isValid()) {
            return true;
        }
        
        // Check the else left parenthesis
        found = sourceCode.find("} else");
        if (found==std::string::npos) {
            found = sourceCode.find("}\n");
            if (found==std::string::npos) {
// printf("} else format incorrect \n");
                AppendToViolationSet(node, Description()); }}// Check the else close parenthesis
        found = sourceCode.find("else {");
        if (found==std::string::npos) {
            found = sourceCode.find("else\n");
            if (found==std::string::npos) {
// printf("else {format incorrect \n");
                AppendToViolationSet(node, Description()); }}return true;
    }
    
    // Append the violation information to the result set
    bool AppendToViolationSet(IfStmt *node, string description) {
        addViolation(node, this, description);
    }
    
    string Description(a) {
        return "Format is not correct"; }};static RuleSet rules(new KirinzerTestRule());

Copy the code

Debugging rules

Based on what we learned earlier, we know that the actual embodiment of the rule is a dylib file. If you can’t debug while writing a CPP, it’s a nightmare. How do we debug the Oclint rules?

First you need an Xcode project.

The Oclint project uses CMakeLists to maintain dependencies. We can also use CMake to generate XcodeProJ from CMakeLists. You can generate an Xcode project for each folder. Here we generate the corresponding Xcode project for Oclint-Rules.

// Create a folder in the OCLint source directory Oclint-xcoderules mkdir oclint-xcoderules CD oclint-xcoderules cmake -g Xcode -d CMAKE_CXX_COMPILER=.. /build/llvm-install/bin/clang++ -D CMAKE_C_COMPILER=.. /build/llvm-install/bin/clang -D OCLINT_BUILD_DIR=.. /build/oclint-core -D OCLINT_SOURCE_DIR=.. /oclint-core -D OCLINT_METRICS_SOURCE_DIR=.. /oclint-metrics -D OCLINT_METRICS_BUILD_DIR=.. /build/oclint-metrics -D LLVM_ROOT=.. /build/llvm-install/ .. /oclint-rulesCopy the code

Once the Xcode project is created, we need to add startup parameters to the specified Scheme. And select the Executable from Scheme Info and select the oclint Executable as compiled above.

Tip: The oclint executable file generated by the compilation is stored in the build/ Oclint-release /bin directory in the root directory. In the case of the latest version of Oclint 20.11, the generated file is named Oclint-20.11 and will be recognized by the Finder as Document. (.11 is identified as a suffix), although it does not affect the direct call in the terminal, we will need to select the executable file through the Finder in Xcode in the subsequent debugging, but it cannot be selected because the type is incorrectly identified. So in this case we just delete the decimal point, rename the executable oclint-2011 and leave out any suffix. (Note that when modifying the extension, right-click getInfo, change the filename and extension, and see if the extension is hidden).

The startup parameters are as follows: (the first parameter is the rule load path, and the second parameter is the test rule file)

>-R=/Users/developer/TempData/oclint/oclint-xcoderules/rules.dl/Debug /Users/developer/TempData/oclint/oclint-xcoderules/test2.m -- -x objective-c -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk
Copy the code

When you are ready to run the rule, you can output the results and debugging information of your rule in the console.

Use rules

Once the rules written in Xcode have been compiled, the corresponding dylib file can be found in Xcode’s Products Group.

By default, the rule will start from $(/path/to/bin/oclint)/.. In the /lib/oclint/rules directory, which we’ll name “rule search path” or “rule load path”. The rule search path consists of a set of dynamic libraries with extensions so, dylib, and DLL in Linux, macOS, and Windows.

New rules can be used immediately by dragging and dropping them into the rule load path. Therefore, we just need to put the dylib generated by our custom rules into the default rule load directory. Of course, the rules directory here can also be configured. A project can use multiple rule search paths, and different rule load paths can be specified for different projects.

For more detailed configuration, refer to the official documentation here: Select the OCLint Inspection Rules

conclusion

Static code inspection tools can efficiently check out potential problems in the code. In the process of continuous business delivery, developers can pay more attention to code specifications, prevent code deterioration, and reduce mistakes caused by carelessness. Hopefully, the static checking tools mentioned in this article, along with the instructions for writing custom rules, will help you write higher quality, more elegant, and prettier code.

The resources

Clang Tutorial Clang Users Manual Oclint-Docs v20.11