The author: Jin-Chi Pan — Director of Coding Products


Head of Tencent Cloud R&D Platform, with ten years of experience in R&D and energy efficiency construction


Product Leader for Coding Scanning

A young man was smoking at the door of an office building. A passer-by passed him and said to him, "Do you know this could be a health hazard? Did you notice the Warning on the cigarette box?" The guy said, "It's OK, I'm a programmer." The passer-by said, "So what?" The programmer replied, "We never care about Warning, only Error."

Opening with a laugh, this is a “primer” for people who rarely use/know code scanning tools. On the one hand, because code scanning has certain technical barriers, involving lexical/grammar analysis, compilation injection, pattern recognition and security and other related fields, it may be difficult to understand the content of this aspect. On the other hand, due to the current public for code scanning products and its field there are many misunderstandings, greatly affecting the use of code scanning experience, even more directly put Lint/Style and scan is equal to the sign, let a person be ironic.

Since the opening trial, it has provided scanning service for more than 5000 teams, helping development teams find a large number of potential code defects, security vulnerabilities and non-standard codes in time. It is hoped that through this article, with some common scenarios as examples, the value and use method of code scanning can be explained easily and easily, so as to help readers understand deeply and get started quickly, so that code scanning products can exert the maximum value on the road of helping enterprises to build DevSecops.

What’s the value of a code scan

Regardless of the cliced-up notions of quality forward or quality built in, from a practical perspective, code scanning is often the second step in a team’s transition to DevOps (the first step is continuous integration/pipeline). Firstly, there is only compilation, packaging and deployment on the pipeline, which is still a little thin. Secondly, compared with single-side, interface automation and E2E automation, the cost of access code scanning is the lowest. In the case of Jenkins, all Operations need to do is install the SonarQube plugin in Jenkins cluster and add a line of command to JenkinsFile to complete the access of code scan without developer intervention.

In addition, developers with a bit of code culture awareness will also install plug-ins in the native IDE for local checks, and if there is a syntax or style problem, it will be alerted directly to the IDE and even automatically fixed.

The more easily available things are often the most easily ignored, IDE Auto Inspect/Format or pipeline silent execution, easy to make R & D dilute the perception and value of code scanning: I have done the Style check, completed the code inspection task, the research and development on-line cycle is too tight, the problem of scanning will be looked at later, the problem found in the code scanning is not serious, the tool/link of code scanning is not necessary. In fact, software/Internet vendors spend millions a year on various scanning software licenses (SonarQube, Coverity, Checkmarx, etc.), and all of these companies are the best in the $1 billion industry (Sonar raised $45 million in 2016, Coverity was acquired in 2014 for $375 million). The huge market value and the humble existence sense, why can appear such phenomenon? To figure this out, let’s first look at what the code scan can help us find.

0. Programming syntax problems

I listed it as item 0 because I don’t think this problem even belongs in the scope of code scanning. There are a number of IDEs and plugins that have integrated syntax checking to help developers check, alert, and even automatically fix syntax problems during development, addressing some code quality issues, but this is the responsibility of the syntax parser, not the code scan.

1. Code specification issues

This may give many readers an “I get it again” look, which is by far the most common perception left by code scanning: checking for comments, whether indentation is a space or a Tab, whether curly braces start another line or follow the previous line, and so on. Checking criteria such as these can be a source of controversy within the team, and since they don’t prevent functional logic from running correctly (not warnings, only errors), this is where many attempts at code scanning end.

However, do code specifications really matter?

If you’re not warned here, are you aware that dynamic languages have inconsistent return values, and that there might be problems later on?

If you don’t have a special warning here, are you aware of the problems that might be introduced if you enter a variable object here?

If I continue to allow the nesting of for, if, and try, how will this section of code read later?

After making subsequent changes to the model fields, do you remember that you need to modify the same code multiple times?

Scanning code specification classes is the most effective way to solve the problem of “poisoning the code”. Also for collaborative projects, it’s important to avoid “God and I know what this means when I write this code; In a month, God only knows “scenario, it is also necessary to follow a uniform code specification.

2. Functional defects

A lot of people get lucky and think, “Well, maybe this is the only version of my code that I’ll ever need to maintain, so I’ll just run.” So let the code scan help confirm, does your code really run?

Are you sure you can test all of these null pointer problems?

Array beyond the bounds of the problem, through the human flesh CR to find how difficult?

Not to mention the memory leak problem, there is no tool to help people locate the memory, and managing the memory still requires some effort.

From this point of view, code scanning is equivalent to testing. It is an effective way to ensure that the application is functioning properly, and it is also a more efficient way to uncover deeper technical problems.

3. Security defects

Some readers may also think “my function is simple, it passed the test with a few clicks, and there is nothing wrong with it”. To know that an application needs to satisfy users from the function, but also needs to target the black products, Marriott leaked user data suffered heavy fines similar cases, how sure we can ensure that we are not the next target?

Source: InfoQ Manka

An important starting point for dragging libraries is SQL injection, and this type of problem can be easily detected with code scanning tools.

Remote command execution is also a common means of attacking the target machine. Many common open source components have been exposed with similar problems. Are you sure your security awareness is better than Apache?

There are CSRF, XSS, XXE, deserialization and other multiple attack means. If every front-line programmer needs to know these carefully and avoid them, the control cost will skyrocket. Using code scanning to quickly identify and locate risks, digital assets can be protected at the lowest cost. Static code analysis (SAST) is also one of the most basic and lowest threshold detection methods in Devsevops.

4. Public relations risk

“Don’t be funny, how can we talk about PR problems when talking about code quality”, don’t laugh first, let’s see a news: Vivo’s Lift Camera: rogue software detector or IQ tester?

Source: product play

Simply put, when an Android application gets camera parameters, it might call a function that triggers the camera to rise, but the viewer doesn’t really get into the technical implementation details here. Open the camera, is to secretly take the user, this at that time is a real public relations crisis, but also set off a small storm, affected to all sides. At the same time, Tencent has also organized a set of sensitive API scanning program internally to scan sensitive interfaces in projects through code scanning tools, reminding developers to self-check and confirm to prevent greater risks.

How should code scanning be used

From the above, you may be starting to see the value that code scanning can bring to your team in terms of ensuring code quality and security in a non-invasive way. So download tools like SonarQube, SpotBugs, CheckStyle, etc., configure them and run them either locally or on the Jenkins pipeline. But since code scanning is more of a local offline tool, why does CODING need to provide code scanning on an online platform?

Local scan, rules are synchronized with the remote side

Even with local scans, we don’t want the local rules to be so different from the remote rules that the local scan is submitted and then rejected. The most logical solution to this problem is IAC, where the scan scheme and filter conditions are stored in a local configuration file.

However, not all tool rule configurations can be managed locally, such as filter conditions, contrast branches, and other configuration items that are strongly relevant to the application scenario. There are two ways to deal with such appeals:

  1. Users complete the unified configuration (including tool rules, filtering conditions, comparison branches, etc.) on the platform side, and generate the configuration ID after the configuration is completed. Local scanning is no longer based on a local configuration file, but on a remote configuration ID.
codedog_client localscan --config 001
  1. Localization of the platform configuration, that is, scanning the platform to define the full rule format. Not only do you follow this configuration for local scans, but you can also parse the file configuration to generate visual displays for platform displays, thus achieving a unified IAC configuration.

Focus on people, turn the problem into responsibility

Local scanning can find problems, but it is difficult to find out the introduction of the problem and the timing of the introduction, so there is the possibility of entanglement and deniability. Based on the submission records of the code, the platform can trace back to the timing of code changes and find the person responsible for the problem, so as to track the problem from the perspective of the person responsible, and even turn it into a special Bug follow-up. Who pollutes who clean, this is very reasonable.

In addition, the platform can automatically close currently fixed code problems based on the results of the next scan, saving manual effort.

Code base quality tracking

Another benefit of having issues archived on the platform is that it is very clear what the code quality trends of a repository are, for example, when a new problem is introduced and overall quality deteriorates, or when a historical load is removed and quality improves. Visualized quality trends can also help team managers visualize whether or not they need to be alerted to the quality of their team’s code.

Quality access control, let bygones be bygones

Just do a local scan, there will still be “big heart” developers who don’t fix the problem and push it to the remote side, which can be blocked by the quality access feature provided by the platform side. The quality access control defines the number of issues that the current warehouse can allow. If the number of issues is exceeded, the submission or merge request will be blocked.

Typically, a history project will be able to scan hundreds or thousands of legacy issues at once, and teams are less likely to set aside time to fix them all at once, leading to “quit at the beginning”. Our advice is aimed at this kind of scene, set up the quality of MR entrance guard for the number of new problems, ensure that the code or trendy won’t have to introduce new code quality problem, control the incremental and clean up the stock gradually at the same time problem (business requirements will change to which document, is to repair the file code quality problems), in this way will slowly code quality back on track.

conclusion

To some extent, we recognize that the larger the team, the greater the need for code scanning tools to help the team improve standards and efficiency in the face of specifications and complex problems. For SMB and individual developers, code scanning also remains the cheapest quality improvement tool for access. I hope that the above cases and scenarios can help readers quickly locate the sticking points in the project and solve them smoothly, pay attention to each line of code iteration, and inherit the culture of excellent code.

Click to experience the code scanning tool to improve team efficiency