The background,

Source code static analysis tool (SAST), as an important tool for software security, has been widely used in various fields. With the wide use of open source SAST tool, the increase of tool type, the user is difficult to identify advantages and disadvantages of tool and is not suitable for the application scenarios of the enterprise, this article from the financial, Internet companies, the most commonly used Java language code analysis point of view, the static analysis to make a brief evaluation, provide the basis for choose from static analysis tools, in addition, This paper analyzes the technical problems existing in static code analysis tools and the basic criteria of tool evaluation.

Second, the overview

Generally speaking, false positive and false negative are the most important technical evaluation indicators of SAST, but there is no universal test set that can fully reflect the detection accuracy ability and analysis sensitivity of static analysis tools. Therefore, in order to simplify the test difficulty, we choose a Java language and relatively secure international universal test set OWASP Benchmark to reflect the strength of code analysis tools in Java security detection ability.

The OWASP benchmark is a sample application that contains thousands of vulnerabilities from 11 categories. Benchmarks include snippets of code that are difficult to handle through static analysis, such as indirect calls, unreachable branches, mappings, and values that depend on configuration files.

Use case download address: github.com/OWASP/Bench… , can reflect the detection ability of code analysis tools to a certain extent.

1. Test basis

The following table shows the OWASP test results of a certain tool, as shown in the figure. The left column represents all categories, P/N is the number of positive/negative samples (BADCase/Goodcase), TP/FP is the number of true/false positives (BADCase/false positives), TN/FN is the number of true/false negatives (good case reported/missed), TPR and FPR are correct rate and false positive rate, Y is Youden’s Indx, YI is the correct index, the range of this index is only from 0 to 1, the greater the Youden index, the better its authenticity. The formula for TPR, FPR and Y is as follows:

TPR=TP/P

FPR=FP/N

   Y=[TP/(TP+FN)+TN/(FP+TN)]-1

category P N TP FP TN FN TPR FPR Y
Command Injection (cmdi) 126 125 126 45 80 0 1.0 0.36 0.64
Weak Cryptography (crypto) 130 116 130 0 116 0 1.0 0.0 1.0
Weak Randomness (hash) 129 107 129 0 107 0 1.0 0.0 1.0
LDAP Injection (ldapi) 27 32 27 13 19 0 1.0 0.41 0.59
Path Traversal (pathtraver) 133 135 133 36 99 0 1.0 0.27 0.73
Secure Cookie Flag   securecookie) 36 31 36 0 31 0 1.0 0.0 1.0
SQL Injection (sqli) 272 232 272 87 145 0 1.0 0.375 0.63
Trust Boundary Violation   (trustbound) 83 43 83 24 19 0 1.0 0.56 0.44
Weak Randomization (weakrand) 218 275 218 0 275 0 1.0 0.0 1.0
XPATH Injection (xpathi) 15 20 15 7 13 0 1.0 0.35 0.65
Cross Site Scripting (xss) 246 209 246 48 161 0 1.0 0.23 0.77
Average All 1415 1325 1415 290 1035 0 1.0 0.23 0.77

2. Test results

Mainstream SAST tools in the market are selected for testing. The test tools selected this time cover a wide range, including foreign commercial tools, open source tools and domestic self-developed tools. For example, SonarQube[1] code automatic review tool is the most widely used open source tool, because open source and free is especially loved by many financial enterprises. Checkmarx CxSAST[2] and Micro Focus Fortify[3] from Israel are widely promoted in electric power, finance and other industries, and are one of the most widely used Java tools in The Chinese market. Coverity[4] and IBM Appscan[5] are widely used in the Internet field. Parasoft’s JTest[6] and VeraCode[7] are widely used in the military industry. The following chart shows OWASP benchmark results for 10 static analysis tools. Commercial SAST tools 01-06 include: Checkmarx CxSAST, Micro Focus Fortify, IBM AppScan Source, Coverity Code Advisor, Parasoft Jtest, SourceMeter[9] and Veracode tools.

 

Analysis tools OWASP version TPR FPR Y
FBwfindSecBugs[10] 1.2 0.97 0.58 0.39
SonarQube 1.2 0.5 0.17 0.33
Hung-chien SAST 1.2 1.0 0.12 0.88
SAST-04 1.1 0.61 0.29 0.33
SAST-06 1.1 0.85 0.52 0.33
SAST-02 1.1 0.56 0.26 0.31
SAST-03 1.1 0.46 0.214 0.25
SAST-05 1.1 0.48 0.29 0.19
SAST-01 1.1 0.29 0.12 0.17
PMD 1.2 0.00 0.00 0.00

As can be seen from the figure above, the maximum coverage index is 1 (i.e. 100% coverage, no omission). The index with the lowest false positive rate was 0.12, and the index with the highest false positive rate was 0.58.

Technical analysis

The basic principles of static code analysis: first, the source code is parsed into an abstract syntax tree; Control flow analysis, polymorphic analysis, pointing analysis, function call analysis and other analysis algorithms are used to analyze the basic analysis model. Considering path sensitivity, the program model of symbolic execution, abstract interpretation and graph reachability is established. According to the defect pattern, the first-order logical expression is generated on the basis of the program model, and the satisfiability constraint is solved by SMT to generate the final result. At present, many tools only do basic analysis, especially open source tools. Many tools do not support sensitive analysis of the full mode, or do not support it well, so a large number of errors and omissions may occur. The following are several typical problems:

Based on the above points, analyze the causes of false positives and understand the limitations of the tool. Through analysis, false positives are basically caused by the following situations.

1. Set coverage problem

In the set, some of the set data have polluted data. When retrieving the unpolluted part of the set, the whole set will cover pollution and false positives will be generated. For example, in the following code, the map is filled with a value with a stain (param) and an unstain (a_value), and the parameter is retrieved from the map and assigned to bar. If one part of a collection is contaminated, assuming the entire collection is contaminated, then in this case, false positives result when we retrieve the uncontaminated part of the collection.

private class Test { public String doSomething(String param) throws ServletException, IOException { String bar = "safe!" ; java.util.HashMap map831 = new java.util.HashMap(); map831.put("keyA-831", "a_Value"); // put some stuff in the collection map831.put("keyB-831", param); // put it in a collection map831.put("keyC", "another_Value"); // put some stuff in the collection bar = (String)map831.get("keyB-831"); // get it back out bar = (String)map831.get("keyA-831"); // get safe value back out return bar; } } // end innerclass TestCopy the code

2. Address the problem of excessive pollution

In a collection, some of the collection data has contaminated data. After some operations are performed on the integration, the tool cannot identify the contaminated data, causing other normal data to be affected and generating false positives. For example, in the following code, there is an untainted value (safe), a tainted value (param) and another untainted value (moresafe) are added to the list. When the first value “safe” is deleted, the list element with index 1 is retrieved from the beginning of the list, reading the untainted value “moresafe”. It also leads to false positives.

public void doPost(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException { response.setContentType("text/html"); String param = ""; boolean flag = true; java.util.Enumeration names = request.getParameterNames(); while (names.hasMoreElements() && flag) { String name = (String) names.nextElement(); String[] values = request.getParameterValues(name); if (values ! = null) { for(int i=0; i<values.length && flag; i++){ String value = values[i]; if (value.equals("vector")) { param = name; flag = false; } } } } String bar = "alsosafe"; if (param ! = null) { java.util.List valuesList = new java.util.ArrayList( ); valuesList.add("safe"); valuesList.add( param ); valuesList.add( "moresafe" ); valuesList.remove(0); // remove the 1st safe value bar = valuesList.get(1); // get the last 'safe' value } String cmd = org.owasp.benchmark.helpers.Utils.getInsecureOSCommandString(this.getClass().getClassLoader()); String[] args = {cmd}; String[] argsEnv = { bar }; Runtime r = Runtime.getRuntime(); try { Process p = r.exec(args, argsEnv, new java.io.File(System.getProperty("user.dir"))); org.owasp.benchmark.helpers.Utils.printOSCommandResults(p, response); } catch (IOException e) { System.out.println("Problem executing cmdi - TestCase"); throw new ServletException(e); }}Copy the code

3. Branching

In conditional branching, some branches may never execute due to conditional setting problems, and the tool’s inability to determine which branches cannot execute may result in false positives. For example, in the following code, because num is constant 106, (7*18) +num is always greater than 200, so bar is always constant string, “This_should_always_happen”. Another branch, “Param,” which contains tainted data, will never execute. The tool generates false positives.

 private class Test {
  public String doSomething(String param) throws ServletException, IOException {
     String bar;// Simple ? condition that assigns constant to bar on true condition
     int num = 106;
     bar = (7*18) + num > 200 ? "This_should_always_happen" : param;
     return bar;
   }
  }
Copy the code

Four, conclusion

Comprehensive and efficient static vulnerability identification and reduction of tool false positives are crucial techniques in static analysis, and there is still a lot of room for progress in this field. For the OWASP benchmark test, the best tool coverage was 100% and the false positive rate was 12%, indicating that the tool was maturing, but the results were only for the OWASP test case. In the subsequent static code analysis technology, the static data flow tracker needs to be continuously optimized to improve detection accuracy.

In order to better evaluate code analysis tools, the author puts forward several evaluation dimensions and gives evaluation grades:

The evaluation index describe Level 1 Level 2 Level 3
About an index Recall rate – False positives 0.9-1.0 0.7-0.9 0.7 –
throughput Maximum code size detected 1000W+ 100W+ 10W+
Detection efficiency Code checks lines of code per hour 100W+ 50W+ 10W+
Snippet code detection capability Can detect if compilation fails Support for all languages C/C++,Java Does not support
Concurrent detection capability Supports multiple concurrent tests on a single CPU Maximum value allowed by the hardware Single process Single process
Cross-linguistic and framework analysis skills Support the latest framework and compatible language calls More than 70 frameworks are supported Supports less than 10 types Does not support
Support language Languages that support detection 20种+ 10种+ Three kinds of +
Integration with Devops Integration with major DevOps tools Support continuous integration tools, version management and testing tools, three aspects Only version management is supported Does not support
CWE mode coverage Number of defect patterns overwritten 200 + 100 + 50 +

The last

【Related information Details】