Stability testing is a test to verify the stability of the APP by running it continuously for a long time. It can effectively find the occasional flash back, memory leakage, performance deterioration and other problems under the long running APP. The API of Apple system is usually used to quickly execute click events on iOS terminal to carry out stability tests. Excellent tools like FastMonkey have many benefits, but as a long-term test service system, functions need to be adjusted to adapt to enterprise-level test scenarios. The problem cannot be solved, such as customizing the event execution sequence through external requests, setting startup parameters dynamically, and occupying too much disk due to local screenshots.

Iqiyi test team has carried out unremitting exploration in iOS stability testing, and has also accumulated some relevant experience. We hope to share with you in this article the experience and experience in the practice and optimization process of iOS stability testing, and also take this opportunity to draw inspiration and solicit more discussions from peers.

1. Program practice

1.1 Basic Framework

Iqiyi iOS stability test is based on the existing cloud real machine system, which is generally divided into three parts. Down is the equipment management, up is the output summary, the core is the test strategy. The basic framework is as shown in Figure 1. The real machine device is connected to the remote control system through the driver layer, which is uniformly managed by the back end scheduling. The core policy interacts with the real machine device through the device driver to simulate user behavior for testing.

Figure 1. System framework

The system structure is shown above, and the policy selection is discussed below.

Common stability testing schemes include Monkey testing, recording playback, or element traversal. Among them, Monkey test implementation cost is low, but some page elements are few, blind operations often fail to point elements, resulting in low test efficiency; If the recording and playback solution is implemented for a long time, the execution path is fixed and coupled with service functions, and the service logic may need to be maintained for a long time, which is not conducive to the rapid expansion of stability test scenarios for service lines. However, the implementation cost is higher than that of the Monkey strategy, which requires continuous r&d efforts by the test team.

We have tried the random sliding click strategy and traversal strategy of clicking by element. The details and difficulties of the two schemes are analyzed below.

1.2 Generating test events

We abstract out a module called event generator, whose main function is to continuously generate user event flow on APP, so as to verify APP stability. This article focuses on random generation and generation by page elements.

1.2.1 Randomly Generating Events

The typical random-event scenario is the Monkey, which sends a random stream of user events (such as clicks, inputs, slides, etc.) to the system to test the stability of an application under development. This solution can be quickly simulated using the interface provided by XCTest and is relatively inexpensive to implement. Early on, in order to quickly open up the testing process and verify the effectiveness of the scheme, we implemented the Monkey test scheme. As shown in Figure 2: First obtain the configuration of the proportion of events expected to occur in this task (involving restarting APP, pressing the home button, switching vertical and horizontal screens, clicking event, sliding event, backward event, etc.), and then specify which event to execute according to the probability. Assemble the events that need to be executed, and finally request the specific driver service to execute the event.

Figure 2. Random event generator

Generate events based on page elements

Random strategy has a high click frequency, but invalid events account for up to 90%. Analysis tests reported that when the page had few clickable elements, events generated by the random generator triggered a large number of invalid actions. If you can traverse a page based on actionable elements, you can greatly reduce the number of invalid events. Therefore, we try to use the click-by-element strategy to implement the depth/breadth first traversal strategy to improve testing efficiency.

How does iOS get elements, record execution paths, and execute the test strategy by iterating through elements?

  • Identify the elements on the page

Click by element first to solve the problem of element identification.

Method 1: The APP under test integrates SDK, and the access cost is high, and the formal package is generally not allowed to be brought in;

Method 2 parses elements from the DOM tree. There is a situation of accumulation of ELEMENTS in the DOM tree. When pages are pagination, elements of only the current page cannot be distinguished, affecting the accuracy of element parsing.

Both methods have certain problems, so we turn our eyes to method 3, AI image recognition elements. As a relatively mature technology, AI image recognition can effectively avoid the above problems and provide relatively accurate element data. Therefore, in terms of element generation, we use the AI service provided by iQiyi to transfer the screen shot to the AI service before the event operation to quickly identify the vast majority of element areas, as shown in Figure 3 below.

Figure 3. The AI service identifies the elements of the page

  • Delete multiple pages to locate the current page

Having solved the problem of element identification, the next step is to solve the logic of element traversal. In order to improve traversal efficiency, traversal logic should avoid repeated page traversal as many times as possible. In order to locate the current page to record the execution path and avoid repeated requests for AI services from the same page, we need to judge whether the current page has been entered before. We consider two methods:

In method 1, key information is extracted from the DOM tree to generate a fingerprint to locate the current page. However, through practice, it is found that the interface acquisition of DOM tree takes unstable time, often more than 4s, which is not conducive to fast page positioning.

Method 2: Use the screen shots before and after the event operation to generate pixel fingerprints to locate the current page. Practice has found that this method can be used to locate pages in milliseconds.

The general process is shown in Figure 4. The screen capture is first obtained, and then the screen capture is grayed and scaled to 8*9 thumbnail, and finally regenerated into fingerprint containing image information. By calculating whether the similarity between fingerprints exceeds the threshold value to determine whether the picture is the same, so as to determine the page. The advantage is that the similarity between pages can be calculated directly through the fingerprint, which is fast, and the local subtle changes can be avoided to interfere with the judgment results by increasing the threshold.

Figure 4. Use screen shot to generate fingerprint to judge similarity

  • Return to previous page

The breadth-first algorithm needs to return to the previous page and continue to click other elements of the page after clicking successfully in each step. The depth strategy points to pages that have no elements to click on also need to try to return to continue clicking, so it is necessary to implement a common return strategy. Unlike Android, which has physical keys to force return, iOS needs to implement its own return method. Combined with the characteristics of the APP under test, we encapsulated the return logic that most pages can return to the previous page by left-sliding, and the pages with left-sliding disabled will be processed in the following way.

  1. For example, the close navigation button on the H5 page can be recognized as the back and close button in the DOM tree whose attribute contains the word back/close. Try to click it to return to the previous page.

  2. Some page navigation buttons are rendered as “back” text, and this exit button is not found in method 1. We upload screenshots to the OCR word recognition service, which recognizes the “returned” text on the screen. Click the returned text identified to try to return to the previous page;

  3. The close button of some advertisement float layer is the close picture of “X” pattern, which can not be processed by the above two methods. We trained the AI tool to recognize the location of the close button by clicking on the location of the identified element to return to the previous page.

Therefore, as shown in FIG. 5, the step event is executed according to the element depth-first traversal strategy.

Figure 5 Element event generator

1.3 Handling burst interference at run time

Practice has found that there may be occasional pop-ups blocking click on target elements, APP may be backlogged due to misoperation, or a page may be stuck and unable to exit. How to deal with various exceptions that hinder testing and ensure that more pages can be reached during APP testing to improve test coverage?

1.3.1 Solution that some applications cannot enter the home page

Some apps must perform login and other operations before entering the home page to start testing tasks. There are so many different lines of business that the framework supports the running of custom scripts. Allows users to customize differentiated front scenarios by executing custom scripts before running stability tests.

1.3.2 Accidental popover panel processing

There are various pop-ups during APP operation, which can be roughly divided into the following categories in Table 6:

Table 6 Classification of pop-ups

The detection and processing of pop-ups in the system is relatively simple. It only needs to set the text to be recognized in advance. The WDA interface can judge the existence of elements by text attributes and click away directly. However, as the number of business lines increases, the number of texts to be set also increases sharply. Before each event operation, it is necessary to judge whether there is a pop-up window that meets the set text, which consumes too much time.

The study found that iOS “regular expression” -NSPredicate mode, support a request to judge whether multiple elements exist, at the same time, regular expression mode can be rapidly expanded according to the demand, effectively avoid exhaustive text enumeration. Based on this new idea, each step of detection popup only needs to request WDA once, which can judge N kinds of text, reducing the time of N-1 network request and improving the execution efficiency.

For example, figure 7: When the following pop-up box appears, “Label =’ cancel ‘or label=’ agree and continue ‘or label=’ I know ‘”, one request can determine whether various pop-up texts exist, and then process them one by one.

FIG. 7 Various popovers

1.3.3 Keep the application under test in the foreground

During the operation of the APP, some operations may jump out of the current APP, so it is necessary to ensure that the APP under test is always running in the foreground. There are two common solutions

Method a, direct analysis of the current page DOM tree, obtain the XPath returns XCUIElementTypeApplication attribute, this property value is consistent with the measured the APP name that tested the APP to keep at the front desk. However, when the application page under test is complex, the speed of XPath query slows down, which seriously affects the single step time of task execution.

Method 2: The interface activeApplication provided by Apple can directly output the Bundle ID of the current foreground application. The returned Bundle ID is the same as that of the application under test, and the current APP can be quickly confirmed. In this way, the whole XPath information of the page is no longer queried, which avoids returning too much useless information, and is more stable and fast.

However, method 2 cannot handle all cases. For example, when the panel of auxiliary functions (little white dot) is opened, or there is a jump back to other apps button in the upper left corner, Method 2 will misjudge the top-level APP as a non-tested APP. Check whether the delivered event is executed

Stability in the process of the task to run, if encounter a static page, click exit in before the operation of the current page trigger for APP is of no value, we call it invalid clicks, as far as possible in order to reduce the number of invalid clicks, we hope that judgment in operation after the event, the most direct idea is to determine whether action after the page has changed. Therefore, it is necessary to take screenshots before and after the event operation, and the fingerprint of the mean hash generated directly from the screenshots is used as the mark of the image. If the fingerprint Hamming distance between two consecutive steps is less than the threshold, the page is judged to be the same (FIG. 8). The event is invalid if two consecutive steps are the same.

Figure 8 Determine the similarity of screen shots

Once the generation of test events is resolved and emergent situations are handled, the test service can be looped through the service framework shown in Figure 9 below.

Figure 9. Service framework

Two, landing application

In order to verify the effects of different strategies, a comparative experiment was carried out. Each strategy was run for 20 times, and each strategy was continuously executed for 8 hours. Four indicators including total number of events, invalid ratio, single step time and page coverage were selected for evaluation. The comparative data are shown in Table 10.

Table 10 Data comparison table of different policies

In terms of page coverage, since the iOS terminal cannot obtain the number of activities similar to Android for statistics, we use similar data instead, and use the event screenshots in the test process to approximately measure the page coverage.

Page coverage calculation rules: fingerprint is calculated for each picture, similarity between pictures is calculated by fingerprint, if the similarity is too high, only one page is counted; The total number of screenshots is considered to be the page coverage of this task.

In figure 11 below, the X-axis is the test execution time, and the Y-axis is the screen coverage. You can see that after 8 hours of execution the new element traversal strategy has improved coverage by nearly 30% over the random click strategy.

FIG. 11 Increasing trend of the number of screens covered by each strategy over time

According to several sets of data, the average single step time of random click is short and the click frequency is high, but the proportion of effective time is low. The element traversal strategy significantly improves the proportion of valid events and page coverage, but requires a large number of operations such as graph cutting, so the single step is time-consuming. The two methods have their own advantages and disadvantages, and can be selected according to the actual test scenarios. If you want to verify the stress resistance of the APP, you can choose the random strategy; if you want to cover more pages, you can choose the element traversal strategy.

2.1 Independent APP running

Both tests are currently used within IQiyi. Stability tests run 800+ times a month and crash detection 200+ times, playing an important fundamental service role in the entire DevOps process.

2.2 Module specifies page execution

In addition to running in an independent APP, some teams also need to carry out stability tests in the specified module pages. We tried to add hidden logos in the DOM tree of the page under test to periodically judge whether the logos jumped out of the limited range. The solution basically meets the business scenario and can perform tests within the limited page range. However, the program also has a certain degree of invasion, the need to develop in the page to add customized attributes, for testing tools to judge whether to jump out of the limited range.

Third, follow-up optimization

The current stability testing strategies are brute force click and traversal, and more and more targeted traversal strategies will be expanded in the future. For example, the consistency of users’ actual operation will be pursued, and user behaviors will be extracted based on online user traffic to generate more practical operation behaviors of users. Pursuing bug finding efficiency, increasing traversal weights for pages that are historically more prone to crashes, and so on.