Test cases and double 叒 yi failed, NLP to help you analyze

Abstract: This article describes how to use AI technology to achieve intelligent analysis of failed test cases.

This article is shared from Huawei cloud community “Test cases and double 叒 Yi Failed, why? NLP To analyze for you”, author: Agile Xiaozhi.

With the rapid development of the software industry, in order to achieve rapid iteration of high quality, more and more companies begin to promote test automation to shorten the test cycle, and more mature software companies begin to pursue the test automation rate of 80% or even higher. A one-week round of manual testing can be automated and fully executed in a day or less. In every round of automated testing, root cause analysis of failed use cases is a very important task, and manual analysis of massive test logs becomes a bottleneck. This article describes how to use AI techniques for intelligent analysis of failed test cases.

Log analysis to help developers find and locate system problems is not a new topic and has been extensively studied in the past decades. As data continues to accumulate, both academic research and industrial practice have attempted to use machine learning to solve the problem, including supervised and unsupervised methods. In terms of finding problems, a 2017 article called DeepLog:Anomaly Detection and Diagnosis from System Logs got a lot of attention. In this paper, multiple LSTM models are introduced to learn the characteristics of time sequence data composed of log template and log parameters, and abnormal detection of system behavior and system state is carried out to assist developers to perceive potential risks of the system in advance. This article describes how to use log analysis to help locate the root cause of failure cases in test scenarios.

The algorithm used in this paper is much simpler, but has higher requirements for test logs and historical analysis. First of all, based on the law of large numbers, the larger the sample size, the closer it will be to the original distribution of the data, so it is required that the failed use cases to be analyzed should be of sufficient magnitude. Otherwise, it will not only be a waste of time, but also the effect will not be good. Secondly, there must be some failure test case analysis data. This paper introduces a supervised learning solution, and the quality of labeled data will determine the online learning effect.

Ok, so here we go: use text classification models to locate the root cause of failed test cases. I’m just going to show you the complete solution.

Figure 1: Failed test log analysis service build process

As shown in Figure 1, the failure test log analysis service build process can be summarized as the following steps:

1. Prepare test log analysis data: Ensure that test logs are correctly recorded and saved, and that root cause analysis of failed test logs is properly saved. Logs should contain valuable information that can be used to locate root causes, such as error codes and error messages returned by the system.

2. Model training: Train the model according to the existing analysis data.

A) Log cleaning: Logs of failed use cases are filled with a large number of contents irrelevant to the cause of failure. These noise data will increase the uncertainty of model training. Combining with historical experience, cleaning log data and extracting key information is an indispensable key step in the initial training. For example, it has been found in practice that almost all failed use cases have error messages returned by the system. If not, you need to take a close look at whether the log design is improper or whether it can be directly attributed to the test environment. The tester gets the failed test case, sees the error message and can basically locate the probable cause of the failure. Therefore, only intercepting error information in log cleaning is a better preprocessing step in current practice.

Sample:

Case001 configuration error parameter comparison fails

Case002 cannot connect *.*.*.*

…

B) Log preprocessing: The less people involved, the lower the cost of later maintenance, so in the log preprocessing stage, only simple log preprocessing, such as word segmentation, removal of stop words, etc.

C) Model training: Load historical analysis data into TextCNN text classification model. TextCNN’s biggest advantage is that its network structure is simple and it easily beats Benchmark in multiple data sets. The network structure is simple, the number of parameters is less, the calculation is less, and the training speed is faster. Convolutional Neural Networks for SentenceClassification

D) Model tuning: attempts to obtain the optimal model by modifying the length of embedding DIM and adjusting the random strategy. When the model can realize train test in the laboratory with an accuracy rate of about 85%, it can be considered as a ready to go model.

3. Use the model host obtained in Step 2 as an online analysis service.

4. During the test automation execution, the logs of failed test cases are automatically posted to the prediction service after being preprocessed to obtain the predicted results, including the predicted root cause and confidence degree.

Testers can get an analysis report of test results immediately after a test run.

First of all, based on historical experience, the tester locates and deals with the test cases that can intuitively perceive the cause of failure in a timely manner. For example, if the test environment is faulty, the environment is repaired and the test case is re-executed.

Secondly, combined with the confidence of the model output, the prediction results are graded. A large number of historical error logs, generally high confidence, directly give the root cause. Failure use cases with low confidence, which may be new problems, should be warned in time.

Different service scenarios generate different logs. As the number of service scenarios increases, the log feature space becomes infinite. Therefore, the same model cannot be used for all scenarios. It is an important feature that can be implemented in the industry to minimize human participation and use lightweight models to train and iterate quickly for specific business scenarios. The TextCNN text classification model introduced in this paper can meet the above requirements from the current practice. In terms of improving model accuracy, combining active learning to improve data quality and introducing small sample learning to reduce manual dependence will be the key exploration direction in the future.

The resources

1. ExperienceReport: System Log Analysis for Anomaly Detection, 2016.

Click to follow, the first time to learn about Huawei cloud fresh technology ~

Test cases and double 叒 yi failed, NLP to help you analyze

Related Posts

MiiX· Voice for Technology · Global Blockchain Technology Competition successfully concluded (Beijing station)!

Why does Alibaba make it necessary to specify the size when initializing a collection?

Build API Gateway with Node.js