The author | YuYang

Facing the problem

In the daily development process, we usually face code asset problems fall into two main categories: code quality problems and code security vulnerabilities.

1. Code quality problems

Code quality is an old topic, but the problem is that everyone knows it’s important, but they don’t know how to improve and maintain the team’s common asset. On the one hand, developers may neglect quality control in order to get functions online in time. On the other hand, developers have different coding habits and program understanding styles.

In the long run, declining code quality often feeds on itself, as business pressure tends to decline, which in turn leads to decreased development efficiency, which further increases business pressure, leading to a vicious cycle.

2. Code security issues

Security problems are often hidden in coding logic without security awareness and open source dependent components that have not been detected or maintained, and are difficult to be detected in daily development and code review.

Code security can also be analyzed in two ways:

  • Code security issues, namely, security specification issues, can reduce privacy data leakage, injection risks and security policy vulnerabilities by avoiding non-conforming codes from entering the enterprise code base.

  • Dependency security issues: security vulnerabilities introduced by open source dependencies on tripartite components. According to the Synopsys 2020 Open Source Security report, more than 99% of organizations use open source technology. Using open source components themselves technology exchange and standing on the shoulders giants collaboration, to reduce the cost of development, speed up the iterative cycle, improve the quality of software project has no more advantages, however, open source software brings a series of convenient and at the same time, also hide a lot of security risks, according to the audit, 75% of the code of inventory in the security vulnerabilities, 49% of which contain high risk problem, Another 82% of the code base is still using outdated components that are more than 4 years old.

Code security issues, on the one hand, also need to conduct access checks, according to business scenarios and specifications to configure security code specification detection and checkpoint. On the other hand, regular maintenance is required to detect and repair new security vulnerabilities in time.

5 ali common code monitoring tools recommended

1. Code quality inspection

  • Java code specification detection

In the practice process of Alibaba, due to historical estrangement and different business styles, different engineering structures, different code styles and different norms, high communication costs, low cooperation efficiency and high maintenance costs are found in various organizations. The development of the group to the present scale requires the iterative and intensive development of a professional technology group army, rather than repeated manufacturing. A truly professional team must have a unified development protocol, which represents efficiency, resonance, feelings and sustainability.

Based on the above background, Ali formulated Alibaba Java Development Manual, which is the development specification followed by Java engineers in Ali, covering programming protocol, unit test protocol, exception log protocol, MySQL protocol, engineering protocol, security protocol, etc. This is the experience summary of nearly ten thousand Ali Java technology elites, and has experienced many large-scale front-line actual combat tests and improvement.

Traffic laws are ostensibly designed to limit the right to drive, but actually to protect the public’s personal safety. Imagine if there were no speed limits, no traffic lights, no right-side drive clause, who would dare to drive on the road. Similarly, for software, development protocols are not about eliminating creativity and elegance in code content, but about limiting excessive personalization, promoting relative standardization, and doing things together in a generally accepted way.

Therefore, the goals of code protocol are: 1. Code efficiency: unified standards, improve communication efficiency and research and development efficiency. 2, code out quality: prevent trouble, improve quality awareness and system maintainability, reduce failure rate. Code out feelings: craftsman spirit, the pursuit of the ultimate spirit of excellence, polishing high-quality code.

Code specification has been deeply integrated into Alibaba’s various development activities through tools such as IDE testing plug-ins, assembly line integration testing and code review integration. At the same time, Codeup, a cloud-based code hosting platform, has built-in Java code specification detection capabilities, providing developers with a more convenient and quick check during code submission and code review.

  • Intelligent patches are recommended

Defect detection and patch recommendation has been a difficult problem in the field of software engineering for decades, and it is also one of the most concerned problems for researchers and front-line developers. The defects mentioned here are not network vulnerabilities or system defects, but hidden defects in the code. Helping developers identify these flaws and fix them can greatly improve software quality. Based on popular defect detection methods in the industry and academia, and analyzing and avoiding their limitations, algorithm engineers of Alibaba Codeup proposed a new algorithm to realize more accurate and efficient analysis of code defects and recommend optimization solutions. The algorithm has been included in the International Software Engineering Conference (ICSE).

1. Identify fixed commits according to the keywords in commit message. Only commit with less than 5 files (too many files may dilute repair behavior). This step relies heavily on the developer’s good commit habit, and hopefully the developer can use the commit and write the message well.

2. Extracting deleted and new content from these fixed Commits at the file level, known as Defect and Patch Pairs (DP pairs), can be quite noisy.

3. Use the improved DBSCAN method to cluster the buggy and patch pairs simultaneously, and gather the similar defects and patch codes together. By clustering similar defects and fixes, it reduces the amount of noise left over from the previous step, and provides a strong reference to the common mistakes made in the history of code submissions.

4. Use self-developed template extraction method to summarize the defect code and patch code, and adapt to the context according to different variables.

Currently, the code patch recommendation service is applied to the automatic code scanning scenario of merge request. It detects code fragments that can be optimized and gives optimization suggestions in the code review process, and precipitation the manual experience in the historical review to continuously improve the quality of enterprise code.

2. Code security detection

  • Detection of sensitive Information

In recent years, a number of sensitive information (API Key, Database credential, OAuth token, etc.) have been inadvertently leaked through some sites in the industry, which has brought security risks and even direct economic losses to enterprises.

In our practice, we also face a similar problem, hard coding problems occur very frequently, and there is no effective recognition mechanism. Therefore, developers and enterprise managers urgently need a stable and sound sensitive information detection method and system. Through investigation, we know that most of the existing sensitive information detection tools simply use rule matching or information entropy technology, resulting in its recall rate or accuracy is difficult to meet expectations. Therefore, on the basis of rule matching and information entropy technology, combined with context semantics, we propose a sensitive information detection tool — SecretRadar, which adopts multi-layer detection model.The technical implementation idea of SecretRadar is mainly divided into three layers. In the first layer, the traditional sensitive information identification technology of rule matching is adopted. Rule matching has good accuracy and expansibility, but it is very dependent on the length, prefix and variable name of comparison, which is difficult to cope with the different coding styles of different developers and is easy to cause missing reports. For the scenes that are difficult to be captured by fixed rules, we use information entropy algorithm in the second layer. Information entropy algorithm is used to measure the degree of chaos of code lines, and it has good effect on the recognition of randomly generated keys and random identity information. However, the information entropy algorithm also has its limitations. The false positives increase with the increase of recall rate. Therefore, in the third layer, template clustering and contextual semantic analysis are adopted to filter and optimize, and common keywords are extracted according to the information entropy result aggregation, and the accuracy of the model is improved by combining contextual semantics and current grammatical structure.

The sensitive information detection tool not only serves our internal developers, but also supports more than 20,000 code bases and 3,000 enterprises on the cloud effect platform, helping developers solve more than 90,000 hard coding problems.

  • Source code vulnerability detection

Alibaba uses the Sourcebrella Pinpoint umbrella detection engine, source code vulnerability detection, mainly involves injection risk and security strategy risk detection.

Source Umbrella detection engine is the technical research result of Prism group of Hong Kong University of Science and Technology in the past ten years. The engine has absorbed nearly ten years of international software verification technology research results, and improved and innovated, independently designed and implemented a set of leading technology software verification system. The main verification method is to translate the programming language into first-order logic and linear algebra, and deduce the causes of defects through formal verification technology. So far, a total of four papers related to core technologies have been published, including one PLDI and three ICSE. Research students can click the link at the end of the paper to read them. Source Umbrella detection engine can find defects hidden for more than 10 years in large open source projects with high activity. Take MySQL detection [5] as an example, these defects cannot be scanned by other inspection tools in the market, and it can complete the detection of 2 million lines of large open source projects within 1.5 hours. While maintaining the efficiency of scanning, the false positives rate can be controlled to about 15%. For complex and large-scale analysis projects, The scan efficiency and false positive rate of The Source Umbrella detection engine is also the industry’s leading level.

“Source vulnerability detection” integrates the security analysis capability of the source umbrella detection engine, and can obtain better analysis results in a balanced way in terms of analysis accuracy, speed and depth. It has the following core advantages:

1. Support the analysis of bytecode, the code logic of two or three packages will not be omitted;

2. Good at logical analysis of cross-function long call link;

3, can deal with reference, pointer and other indirect data modification;

4, High accuracy, compared with similar tools such as Clang and Infer, better performance in accuracy and effective problem identification;

5, good performance, currently the average single application of about 5 minutes to complete the analysis;Source umbrella detection engine can accurately track the data flow in the code, with high depth and precision function call chain analysis ability, can find the depth of the problem across multiple layers of functions. In addition to finding defects, it can also provide the trigger process of the problem and completely display the relevant control flow and data flow, which can assist developers to quickly understand and repair the problem, improve the software quality at a lower cost in the early stage of software development, greatly reduce the production cost and improve the research and development efficiency.

  • Dependency package vulnerability detection

We hope to establish an effective detection and management mechanism for the security and credibility of open source components for developers, so we have implemented the dependency package vulnerability detection service and dependency package security problem report. In practice, developers generally report that the cost of fixing dependency package vulnerabilities is higher than fixing their own code vulnerabilities, so they are unwilling or difficult to deal with such problems. The reasons for this are, on the one hand, the majority of bugs are not directly introduced, but rely on third-party components that indirectly depend on other components, and on the other hand, it is uncertain which version is clean, usable and compatible.In order to reduce the difficulty for developers to repair, we further identify and analyze the reference relationship of dependencies, clearly mark the direct and indirect dependencies, and locate the specific dependency package introduction file, so that developers can quickly find the location of key problems. At the same time, through the aggregation of vulnerability data, the intelligent recommendation of fixing the version upgrade of the vulnerability, because a dependency may correspond to multiple vulnerability problems, developers can evaluate whether to accept the recommendation. By analyzing API changes and code invocation links between releases, measuring upgrade costs, and automatically creating fix reviews for developers, developers can maximize their ability to maintain code security more efficiently.

** Whether it is code quality inspection or code security inspection, developers can experience the above 5 automatic code inspection tools in Cloud Codeup for free.

Detection service application

1. Code submission

The most direct application of inspection service is in code submission scenarios, where enterprises can formulate and configure inspection schemes for different projects according to business scenarios and specifications. When a developer pushes a code change to the server, the detection service for the current code base configuration is automatically triggered, which checks for the full number of problems in the current COMMIT version for the developer, helping the developer to detect new problems early and confirm the resolution of existing problems. With access to these inspection services, testing can be moved left from multiple dimensions of code specification, code quality, and code safety, providing rapid detection and feedback as soon as developers finish coding.

2. Code review

In enterprise project collaboration, developers usually combine feature branch code into main branch in the form of merge request, which requires the code review and manual inspection by the project development leader or module leader. On the one hand, manual review of code requires considerable effort; on the other hand, manual review is difficult to cover potential problems in all dimensions of code. Therefore, the reasonable configuration of inspection services can greatly reduce the workload of manual review and accelerate the work process of code review. At the same time, by enriching, screening and precipitation of detection rule sets and manual experience, detection services can be more in line with the business scenarios of enterprises for blocking, avoiding non-conforming or risky codes from entering the enterprise code base.

Code metrics

In addition to helping developers identify and resolve problems early during code submission and code review, inspection services can also help managers with enterprise code quality measurements and risk visualization. Through the construction of enterprise-class report service and project task management, security and quality problems in the process of project evolution can be measured more intuitively.

Further reading

  1. Pinpoint: Fast and Precise Value Flow Analysis for Million Lines of Code t.tb.cn/0qxIpFV5sRD…

  2. SMOKE: Scalable Path-sensitive Memory Leak Detection for Millions of Lines of Code t.tb.cn/2l96Jh2yqOG…

  3. Pipelining Bottom – up Data Flow Analysis qingkaishi. Making the IO/public_pdfs…

  4. Conquering the Extensional Scalability Problem for Value – Flow Analysis Frameworks qingkaishi. Making. IO/public_pdfs…

Activity recommended

5 ali common code detection tools for free experience, only 2 steps, Cherry keyboard, doll embrace home, 100% prize!In 2021, you still think code checking = syntax/style scanning?

What are the millions of scanning software purchased by big factories every year? How to pay for free sex?

What’s step 2 in landing DevOps?

What is the lowest cost access quality and safety enhancement tool?

** The Cloud Effect DevOps Lab specially launches the “1-minute code automatic bug hunting” activity of 1-3 minutes, give your code a full physical examination.

Experience completion, can also be Cherry mechanical keyboard, Ali Cloud custom GIt command mouse pad, building blocks planet, 1000 points gift, 100% winning!

Click the link below to participate now! Note: this activity cloud effect only new users attend developer.aliyun.com/adc/series/…