preface

Earlier I talked about the Bug logging platform and how to analyze bugs. Basically, we analyze and improve the details of a single Bug at a single point, so as to improve the quality of our projects and products.

In fact, we need to think about a higher level, how to use existing data to evaluate the quality of our project, which involves the problem of quality measurement.

Quality measurement simply means how to evaluate whether the quality of a project or a product can meet expectations after a period of product, development and testing iterations. This can include many aspects, such as whether the software works well in the short term, whether there are any potential risks or legacy issues, and whether it can be released directly to the market. In the long run, it is whether there is room for optimization in the process, whether developers are too busy fixing bugs in their daily work rather than making new features or architectural improvements, whether a feature can only be modified repeatedly to meet business needs, etc. These are a lot of questions thrown out there, and without an intuitive, objective, quantifiable quality measure, I believe it is impossible for the best testers to answer these questions.

The origin of quality measurement

When it comes to the concept of mass measurement, it has to be said that its origin,

SATC (Software Assurance Technology Center)

It is a division of the National Aeronautics and Space Administration (NASA), which was established in 1992 as part of its Office of Systems Reliability and Safety at Goddard Space Flight Center. Its mission is to “become a center of excellence in software assurance, dedicated to making measurable improvements in the quality and reliability of software GSFC develops for NASA.” The center has been a source of research papers on software measurement, assurance, and risk management. Much of the evidence for the tests is also based on papers published by the center.

The agency gives some relationship between Bug rate and source complexity, and then comes up with a quality measurement model on top of that. Because it is based on C language for analysis, although the specific indicators may not be applicable to Java, Go back-end language, which is currently used by most of the Internet. But it is worth considering, for example, the conclusion that each function should have less than 100 lines of code and cyclomatic complexity should be less than 10. As far as I know, IT companies in traditional fields such as Huawei are still grounded as a strong code inspection specification. After SATC, many organizations at home and abroad also put forward some dimensions and indicators of quality measurement, such as SEI(Software Engineering Institute), DARPA(International Research Program Department of the United States Department of Defense), IBM, HP and so on. For example, indicators of the following dimensions:

  • Progress measurement
  • Need to measure
  • Line of code metric
  • Defect metrics
  • Test coverage measures

Based on a collection of data going back 20 years, test quality measurement has been widely covered in the industry.



Source: According to a survey of software measurement practices worldwide conducted in the fourth quarter of 2001 by KLCI, a consulting firm for the global high-tech industry.

Note: Even though the data is relatively old, it is still relatively authoritative. I believe that with the development of the testing industry for so many years, the measurement coverage is only possible to be more comprehensive.

Each of these dimensions can lead to a large topic, so this time we will focus on some of the metrics of defect measurement and what benefits it can bring us.

Quality measures of defects

Statistical defect related metrics have no other purpose than to know the quality of our current project or product.

Perhaps for those who care about projects and products, simply subjective description of good or bad quality is definitely not acceptable, so some indicators need to be quantified.

In detail, it can be roughly divided into process indicators before going online and results indicators after going online.

  1. The indicators after the launch are simple and crude, that is, whether there are accidents or online bugs after the launch of the product, and what is the direct impact on users. Incidents and online bugs will also be graded and rated in severity to facilitate the measurement of specific quality impact, such as 10 minor user experience bugs compared to one online incident that affects SLA. It must be the latter that matters more.
  2. There may be a lot of process indicators before going online, such as the number and severity level of some problems found by testers themselves, which reflects whether the testing process is sufficient and whether there are some potential pits or problems in the code itself. And the ratio of problems to development effort or lines of code. Generally speaking, the larger the amount of code for a requirement or function, the more people it takes up, the more complex it is, and therefore, the more bugs it may introduce. Of course, this is not entirely reliable, for example, a new graduate developer will not have the same Bug rate (number of bugs/number of days worked) as an experienced developer. So a lot of people tease each other, you’re not building features, you’re building bugs.

The above two aspects can probably be covered by the following indicators:

Online indicators:

  • Number and grade of accidents: Generally, the grade can be divided into P0, P1, P2, etc. The specific details can be defined according to the core value of the project or product, such as the time when the service is unavailable, or the number of online requests affected.
  • Online Bug rate: The core is to look at the level and number of bugs found online in relation to the proportion of software code developed during a certain period of time. The recommended algorithm is to multiply the number of online bugs in a certain period of time (six months or three months) by the corresponding importance level of the Bug (e.g., Critical 10, Major8, Normal2, minor1) and divide by the number of developer days.
  • Online miss rate: Analyze all online bugs found in a period of time. If they are strongly related to the testing process, such as no case coverage, or useful case coverage but not executed, this is a missed test problem. The missed test rate can be obtained by dividing the missed test problem by the total number of online problems. If there is no strong correlation with the use case, it may be a design level or requirements level problem, or an untestable problem introduced by operations or configuration. There may also be another definition, which is to divide online bugs by offline bugs to represent the proportion of problems that are missing online that are not evaluated even though so many online bugs are detected. The ultimate purpose of this index is to improve the quality of our testing process, to make the design of test cases more perfect, and to dig out the bugs in software quality as much as possible without affecting the efficiency of launching.

Offline indicators:

  • Offline Bug rate – Service dimension: Similar to the online Bug rate algorithm, the number of offline bugs in a certain period of time (half a year or three months) multiplied by the corresponding score of the Bug importance level and divided by the development staff days. If the calculation is not so complicated, you can simply count the number of bugs under the service or project dimension in a certain period of time. If there are a lot of bugs offline, it also means that the code quality is not good to some extent, and it is likely to cause some missing problems online. This indicator can reflect changes in the quality of a project or product through baseline comparisons or observed fluctuations over time.
  • Offline Bug rate – Personal dimension: The number of offline bugs can be calculated by multiplying the number of offline bugs by the score corresponding to the Bug importance level divided by the number of working people per day, or the number of offline bugs in a certain period of time. This can be done on a small project dimension for horizontal comparison, with individual developer sentiment taken into account when publishing data, or only for project managers.
  • Relationship between offline bugs and test types or stages: The number of offline bugs can be classified and counted according to the discovery stage or label. For example, the discovery stage may include functional test, special test, drill acceptance, and delivery acceptance. A healthy test activity for a project would be a gradual convergence of bugs. If the walk-through acceptance and delivery acceptance turn up more bugs than functional and specialized testing, focus on them and improve them.

How do quality measures land

Having said so many defect indicators, but also some statistical rules, suggested formulas, etc., how to reduce the labor cost of quality measurement? Statistical cycles and tools/platforms are important.

Extreme consideration, if the statistics cycle for a day, every day would be struggling to collect and statistics, contrast effect is not good, because it’s hard to measure data better today than it was yesterday that software quality is better, because there is no doing the test is likely to do it today the activities performed, or more bad luck today, all of a sudden to several online Bug, Nor does it mean that software quality today is worse than it was yesterday. It is recommended that the offline indicator statistical period be related to the online indicator statistical period. For example, some services are released once a month, some are released twice a month, or some are released once a week. Online indicators can be longer and strongly related to the performance evaluation cycle can be considered. For example, three months or half a year can be used as a measurement cycle, which can be more referential. Some indicators can also be used as reference indicators for overall project scoring or personal performance.

Tools/platforms, if all the statistical work is done by people, then there are two disadvantages, one is high statistical cost, the other is the data is not completely reliable, there may be deviations. It is recommended to use platforms or tools to complete statistics. On the one hand, the source data should be filled in accurately, and the fields based on Jira platform or other defect management platforms should be accurate. On the other hand, platforms and tools need to support the ability to pull data periodically and manually trigger the pull data and automatically plan preset indicators.

The following is the quality of offline indicator data regularly pulled by a tool:



Personal dimensions of quality

Quality of online index data (human flesh statistics) :



As you can see, the significance of online metrics is for horizontal project comparisons, or for metrics over several time periods.

The significance of horizontal project comparison is that with a large number of samples, it can comprehensively compare which project or product has better quality in the same period.

The comparison of several time periods within the project is whether there is a significant improvement or change or fluctuation in quality after some improvement of testing activities.

Since the collection frequency of online indicators is longer than that of offline indicators, human flesh statistics can be accepted in the short term, but in the long term, they should be completed as far as possible by tools and platforms.

Tool idea: The script calls the JIRA platform interface statistics, and then calculates some formulas according to the source data collected by JIRA and displays them by dimension.

Platform idea: for example, our company’sMass number bin architectureAs shown below, Jira and other metadata platforms based on the original data are collected and summarized, and through data summary, data model and dimension data calculation, the concerned data indicators needed for rapid screening are provided to the presentation layer.

summary

This paper introduces the origin of quality measurement, selects the indicators related to defects among the five huge dimensions of quality measurement for detailed disassembling and analysis, and provides a total of 6 core indicators online and offline for the reference of friends, and gives specific examples according to the landing method of quality measurement.

If interested partners can further in-depth communication oh! Feel free to talk to me in the comments and messages section! If you see this, please don’t spare your likes ~ remember to follow me!!

I have sorted out a huge amount of information here, including interviews shared, simulated test questions, and video dry goods. Friends in need, please send me a private message. I will send you the first time I see the news