Zhang Zhao, QualityLab

1. The background

In recent years, AI+ test-related intelligent testing technology has gradually become the basic capability of domestic and international large Internet companies and major Test service providers. Its intelligence includes automatic generation of test code, analysis of large-scale test results, automatic exploratory testing, defect location and repair, etc. Representative companies, products or services are: Test.AI, Applitool, Totoro, Eggplant, Appdiff, etc.

Among them, automatic test generation ability has been the focus of industry. In 2019, Bytedance Quality Lab made in-depth exploration in automatic test generation and developed Fastbot, a stability test service for Android. Fastbot’s core technologies include:

  • Intelligent traversal: Use model-based test Generation (MBT) and provide multiple algorithm strategies to achieve high Activity coverage and problem finding capability;
  • Multi-machine cooperation: support up to hundreds of multi-machine cooperation traversal for a long time, and cooperate with each other for the same target;
  • Personalized expert system: the business side can carry out a variety of personalized configuration, such as: limit the test to run in the specified Activity, shielding some test scenarios;
  • Model reuse: use historical test experience data to learn and improve the current test strategy;
  • Complex use case generation: imitate and learn artificial use cases, and generate complex use cases in the traversal process.
  • Precise targeting: Automatic generation of targeted tests for change scenarios based on code call chain changes.

At the same time, according to cross-platform research in the industry, iOS market share has always been high, especially high-end consumers generally use iPhone devices to pursue better performance experience, but also have high requirements for application stability. However, due to the lack of iOS stability testing tools in the market, the stability and regression tests of iOS products are mostly conducted by manual verification, with relatively low efficiency and output. At the same time, facing the situation of product diversification, complexity and rapid expansion of product lines, the human input cost of quality assurance is huge. In order to alleviate this situation, it is urgent to develop an iOS app stability test service, which can be deployed and used in the left shift phase of the company’s product line test with ultra-low access cost and unattended operation. At the same time, in order to radiate Fastbot’s intelligent ability to other platforms, Bytedance Quality Lab has gradually started the research and development of stability test service on iOS terminal at the beginning of 2020. Firstly, there are two questions worth thinking about:

  1. Can Android traversal be used across platforms?
  2. Is there a universal cross-platform page recognition method for machine vision?

Both answers are yes.

Next, this paper will focus on the cross-platform design, technology evolution and application of Fastbot, an intelligent testing system developed by ByteDance.

2. Test build

2.1 Introduction to Automatic Test Generation

Automated Testing Generation (ATG) technology, also known as AIG (Automated Input Generation) technology. Traditional automation methods, such as Record & Replay, rely on testers writing test scripts. At the same time, as testing requirements change, testers need to spend a certain amount of time to maintain and adjust the corresponding test scripts. Compared with the recording and playback method, it can greatly reduce the workload of writing and maintaining test scripts by abstracting the general services that test activities depend on and automatically generating the operations required by test activities.

Manpower needs Script workload Degree of reuse Execution efficiency universal
Recording and playback more more medium low Apk, low
The original Monkey less less less high Android comes with it, high
Test generation less less less high Apk doesn’t matter, high

At present, typical ATG technologies include:

  • Program analysis; (Code – -based Testing);
  • Model-based Testing;
  • Combinatorial Testing;
  • Search-based Testing, Sapienz of Facebook;
  • Adaptive Random Testing.

Figure 1 Brief introduction of ATG technology

Their core logic focuses on “how to generate” test logic. In the case of MBT, a page during GUI testing (client-side testing) can be defined as a State. Using this page, we can extract more meaningful operations from the corresponding GUI control tree, such as State1 to State3 via Event1. State1 can be reached from State2 through Event2. In this way, the problems generated by the tests are transformed into traversal problems for digraph. Random testing tools like Monkey are often a source of concern for developers due to their lack of higher-level representations of logs:

  1. The sequence of tests generated by the Monkey does not easily document use cases;
  2. More difficult to reproduce the Bug, the lack of detailed steps to reproduce.

2.2 Automated test tools

ATG technology for App mainly includes two categories.

One is a white-box automated testing tool based on the code level. This method usually needs to obtain App source code in advance, generate control flow diagram after analyzing it, and generate test cases on this basis by means of test generation. The white-box testing method is more accurate, but more limited, and cannot be effectively tested for apps that do not have access to source code. In addition, too many test cases are inevitably generated to achieve high code coverage.

Second, we can also do black box testing based on GUI information in the App. This type of test does not need to obtain the source code of the App. We only need to monitor the UI information of the mobile phone page during the test and complete the action injection to achieve continuous interactive testing.

Other popular black-box test automation tools include:

  • Sapienz, developed by Facebook, uses genetic algorithms and a search-based approach to generate test cases;
  • Dynodroid, developed at Georgia Tech, views apps as a series of actionable actions that generate sequences of tests;
  • EHBDroid (grey box) bypases the UI layer, using static plus dynamic method directly through the Event Handler callback to trigger events;
  • Stoat first constructed the state-action probability graph model, and then optimized the model by MCMC Sampling method to achieve the optimal coverage of App.
  • APE proposed a method to dynamically adjust the page State abstraction, which can choose the appropriate abstraction granularity according to different App situations.
  • TimeMachine runs on the simulator and optimizes the test and performs accurate playback by save-load simulator state during critical test sessions;
  • Q-testing pre-trained a page abstract machine learning model, and then conducted test exploration through curiosity-driven reinforcement learning method.
  • ComboDroid abstracts and extends the manual use cases, identifies the connectivity of states and generates richer test cases.
  • Monkey, the random testing tool that comes with Android, mentioned above.

In addition, The Droidbot and Humanoid tools developed by Peking University also use model-based GUI testing. Humanoid mimics user behavior, while Droidbot abstracts pages and actions into graphical models. The graph is traversed by traditional DFS and BFS algorithms to achieve high coverage.

However, during our testing, we found that the traditional graph traversal algorithm did not work well in model-based GUI testing. Here’s why:

  • There are a lot of loops in the figure, and it is easy to fall into a local loop using the DFS algorithm, which only covers a limited number of pages and cannot exit.
  • The actual tested App is basically dynamic and updated in real time. Some pages (such as Feed page and search page, etc.) have serious problems that they cannot be rereached after exit. Even simple backward operation cannot guarantee that they can return to the previous page, and there is no corresponding backward operation for drop-down refresh and other actions.

In addition, all the above methods store the App model in the client. Due to the memory and performance limitations of mobile devices, the model size will be severely limited, and the test cannot be carried out for a long time. In addition, as a large number of AB experiments make use of the model, OS version and other data of the device, the number of states that can be traversed on each device is also different.

Fastbot on Android leverages a wider range of devices and uses the Device Farm to co-model the App to guide future testing tasks. At the same time, we also optimized the traditional graph search algorithm to use heuristic search, in order to obtain higher test coverage in a shorter time.

3. Design principle of Fastbot

As mentioned above, in order to solve the problem that model-based GUI testing would be limited by the size of memory and computing capacity of the mobile phone, we deployed the part that consumes a lot of memory and computing resources to the cloud, and only retained UI information monitoring and action injection functions on the client side. Figure 2 shows how client and server separation works.

FIG. 2 Fastbot workflow flow chart

In terms of specific workflow, we will run a set of lightweight client drivers on each device, which mainly includes: monitoring page GUI information sent to the server, receiving the actions sent by the server and realizing event injection on the device. Correspondingly, there are agents on the server side. Each server Agent is responsible for a device, receives its page information, encapsulates it, and generates State node. The server Agent interacts with the task model to make action decisions based on the current State information and the assigned assigned algorithm, and sends the action decisions back to the client Agent.

3.2 Algorithm principle of Fastbot

3.2.1 State-based Exploration and Exploitation

In terms of algorithm, we abstract the GUI information of the page into State in the model, and the actions executed into Action in the model. State is used as the node of the graph, and Action is used as the edge of the graph to form a directed and looped graph model. The traversal decision idea is derived from the Monte Carlo search tree idea of Alphago. On this basis, we also use other reinforcement Learning methods to design n-step Q-learning algorithm and reward function based on page change degree. Calculate the corresponding Q value for each Action on the page, and select the optimal Action based on the Q value.

The whole process is similar to what we’re doing with a map-finding robot. Our goal is to pursue all paths on the map and, given the time constraints, to prioritize the higher value paths given the information. Value here is actually a relatively broad concept, we can define value according to their own goals. If our goal is to get from A to B, then we can actually learn one or more fixed paths. Another concept to understand here is that if our pathfinder robot arrives at a new intersection, and there are N forks in that intersection, if we haven’t explored those forks, we don’t know the value of the path that follows those forks, so we can’t make the right decision, So we need to strike a balance between Exploration and Exploitation. When we explore a path, we can also relay information back to the robot to guide the robot to record the value of the whole link. Exploration is valuable only if it is sufficiently exploited (by exploitation I mean doing the best Action). In addition, if we explore the whole map for an infinite number of times, the Q value of each Action will be constant to a certain proportion, so that we have enough information on the map to make more accurate decisions. The same is true for traversal.

Simply put, the traversal selects the Action with the largest Value in the current State. Select the Action that brings the maximum Value bonus. For example, in the following figure, there are three actions that can be taken, but Action2 has the largest Value. Therefore, when the Agent enters the StateA state, Action2 will be selected. (Note that the Value of the reinforcement training is not known at the beginning, we usually set it to 0. Then the Agent is asked to try various actions, interact with the environment and obtain rewards continuously. Then the Value is constantly updated according to our formula for calculating Value. Finally, after N rounds of training, the Value tends to a stable number. In order to figure out what Value you get if you select an Action in a particular State.)

Figure 4. Reinforcement learning event decision

It’s important to note that the Value in this is not just a reward from the current State to the next State, the environment. Since we should pay attention to both current and long-term benefits in actual training, the Value mentioned above is obtained through a calculation formula, rather than just the reward immediately fed back by the state-changing environment. The formula depends on whether you use one or N steps; In addition, the Value is obtained by sampling, and the training can be considered to be over only after multiple rounds of iteration and loss convergence.

Figure 5 Reverse update Value

Another problem is that in the 4StateA state above, Action1, 2, and 3 are all 0 at the beginning, because we don’t know before execution. The initial Value is 0. If Action1 is selected randomly for the first time, StateA is converted to StateB and Value=2 is obtained. The system records that Value=2 corresponding to Action1 is selected under StateA. The next time the Agent returns to StateA, if we choose the Action that returns the maximum Value, we must still choose Action1. Because the Value of Action2&3 under StateA is still 0. Agent has never tried Action2&3 for Value.

Therefore, when traversal is combined with reinforcement learning, the traversal will be more inclined to Explore Explore at the beginning, instead of executing the Action with the largest Value. Actions are selected randomly (we use a UCB decision mechanism based on the number of visits to overlay Value) in order to cover as many actions as possible and try every possibility. After many rounds of training, various actions in various states are basically tried, we will greatly reduce the proportion of exploration at this time, and try to make traversal more inclined to Exploit. Whichever Action returns the largest Value, we will choose the Action.

3.2.2 Reward sparsity problem

Another difficulty of the algorithm is that rewards are often sparse during traversal. Initially, our reward function was designed to calculate the overwritten statistics of activities and controls, but these indicators tended to be constant in the middle and late traversal. Reward is difficult to increase significantly, so the learning effect based on such reward in the later period is not ideal. Through some experiments, we chose to use the method of Curiosity reinforcement learning to solve the problem of sparse rewards, and combined with natural language processing to abstract the features of the page information, adding Curiosity rewards on the basis of the original reward function. The process is as follows:

Figure 6 Flow chart of Curiosity Driven RL

So the reward calculation is updated as:

reward= (1-c) *reward_{old} + c * reward_{curiousity}

This allows you to assign a different reward value to any set of state action pairs, rather than assign a fixed reward value based on artificial subgoals.

At the same time, we conducted several ablation experiments and obtained the following conclusions: The results of the reward calculation without adding Curiosity (c0 blue line in figure 7) and the reward calculation by Curiosity only (C99 green line) are slightly worse than the original reward combined with Curiosity reward (C30 orange line). We can see from the data that the introduction of Curiosity Driven has a positive effect on test coverage, especially in the early stage.

FIG. 7 Curiosity Driven RL ablation experiment

To sum up, curiosity-driven learning precisely meets the need for time-based learning to be meaningful, and curiosity adds time-difference factors.

But the technique isn’t perfect. A known problem is that agents may be attracted by random elements or noisy elements in the environment, causing curiosity disturbance. This condition is known as “white noise” or “the TV problem”, also known as “procrastination”.

To illustrate this, imagine an Agent learning how to explore a given object (blue ball) in a maze by looking at the pixels it sees.

Animation 1 Explores the maze

Predict the next state to arouse the curiosity of Agent learning to explore in the maze. It tends to look for unexplored areas of the maze because it has the ability to make good predictions in well-explored areas (or rather, it can’t make good predictions in unexplored areas).

Suppose you now place a “TV” on the wall of the maze and play a random animation in rapid succession. Due to the random source of images, Agent cannot accurately predict what images will appear next. The prediction model will generate high losses and thus provide high “intrinsic” rewards for the Agent. The end result is that agents tend to stop watching “TV” rather than continue exploring the maze.

Animation 2 stuck in front of the TV

In the environment, when the Agent is faced with “TV” or random noise sources, the next state prediction will arouse the curiosity of the Agent and eventually lead to “procrastination”.

The same is true in traversal, where the existence of “TV” depends entirely on the accuracy of the definition of “pixel”. Imagine a page that is playing a short video or an advertising page that has been rotating. Will the Agent think that this is a place full of curiosity when he stands and glares at it?

3.2.3 Reuse of test experience

Considering that the duration of each iteration is not fixed and the situation is different for different apps, there may be insufficient training when the duration is short. Therefore, we will persist the model after each training, and load the last model to continue training before the next test, so that the “map” will be more and more complete. We also store the traversal data in pairs of “GUITree1, Action, GUITree2” in a persistent DB library for improving the natural language model and curiosity model.

From the actual test data, model reuse also has a positive effect on test coverage. As shown in Figure 8 below, A and B are two different types of apps with bytes. After several rounds of cumulative tests, the coverage capacity of a single simultaneous long test increased by 17.9% and 33.3% respectively.

Figure 8 model reuse

To verify the tool’s effectiveness, we compared Fastbot (Re) with several other similar State of the Art testing tools, including APE (A) based on dynamically adjusted page State abstraction and State (St) based on sampling optimization probability graph model. The experiment involved 40 representative apps, all of which were tested on a single device for one hour. Fastbot (accumulated three rounds of testing experience) performed better than other tools. Figure 9 shows a comparison of the various tools running multiple tests on a single device. As you can see, Fastbot has better upstream code coverage than other State of the Art tools in large apps, indicating that Fastbot may have an advantage when facing large apps.

Figure 9 Fastbot(Re), Ape(A), Stoat(St) evaluation data

3.3 Foundation of cross-end versatility

The versatility of the cross-platform algorithm has been fully considered in the design of the overall architecture, which decoupled the client capability from the algorithm decision, and realized a set of cross-end support of the algorithm by serserization of the algorithm decision backend.

The advantages of this decoupling are self-evident. For capabilities on cross-platform systems, such as iOS, only the differences between Android and iOS, such as obtaining GUI page information and injecting various events, are the two main differences. At the same time, in terms of message communication between the client and the server, we only need to agree on the differences of several terminals in advance to carry out cross-terminal compatibility, such as the standardization of communication protocols such as GUI page information structured reporting and event types and operation objects.

Figure 3 Fastbot cross-platform architecture diagram

4. Application of Fastbot in cross-platform

4.1 iOS Automated testing tools and frameworks

\ Due to the strong closure of iOS platform, most of the research and development of automatic testing and intelligent testing in academic or engineering are basically based on Android. There is still a relatively vacuum of intelligent testing schemes on iOS platform in the market.

In the process of intelligent test of landing GUI on iOS, one of the key points is to do some process operations for the tested App, such as start/kill/restart/front/background switch, etc. Another key point is to take the current GUI page information (GUITree control tree) and state it away to get an abstract representation of the current running state of your App and the characteristics of the current page. Often these basic capabilities are implemented through the platform’s corresponding automated testing framework, For example, in Android, the GUI page information captured by Android UIAutomator (or UIAutomator2, Accessibilityservice) is generally used as the input of abstract State.

Company (Organization) The key technology advantages disadvantages
UIAutomation Apple native Based on the org.eclipse.swt.accessibility underlying UI library Automation, drive the UI Automation by means of TCP communications to complete the automated testing Official native compatibility is guaranteed; Don’t need to insert pile Xcode8.x deprecated after; Supports debugging on only one device. The Instruments limits a Mac to only one iOS device
XCTest/XCUITest Apple native Xcode7. X, the UI test function framework based on org.swt.accessibility, which completely replaces UIAutomation and lifts the restriction of single platform Official native compatibility is guaranteed; Stronger than UIAutomation, support regular positioning UI elements, with UI assert assertion ability; Support unit testing, interface testing, UI testing; Don’t need to insert pile Run on Xcodebuild; Some of the basic capabilities are not provided, such as efficient event injection capability to get current foreground processes
KIF Based on the XCTest Framework, references some private interfaces Support for Xcode 11.6 (ios11-13); Support unit testing, UI testing If a private interface is used, downward compatibility cannot be guaranteed. Running speed is slow
The WDA (WebDriverAgent) Facebook/Appium Based on the XCTest Framework, it references more private interfaces than KIF Not limited by instruments single instances; Open private interfaces satisfy most test scenarios; It has good stability, and the subsequent well-known test frameworks have the ability to expand based on WDA If a private interface is used, downward compatibility cannot be guaranteed. The speed of control query matching is slow. Facebook no longer maintains THE WDA; Appium has taken it over
Appium The open source community Cross-platform UI testing framework based on WebDriver JSON protocol, operating the APP through WDA driver on iOS No piling; Support image recognition Too heavy weight, the environment is difficult to build; Slow execution (10 seconds)
Airtest netease Image recognition techniques were used to locate UI elements, and a POCO peg library was developed to capture GUITree control trees Support image recognition, meet the automated testing in the game scene The new Xcode adaptation and compatibility costs are high
EarlGrey Google Based on the XCTest Framework, black-box testing with XCUITest or white-box testing with XCUnitTest Provide automatic UI, network request, and queue callback to ensure status updates are stable before testing Source code level staking is required
tidevice alibaba Start WebDriverAgent (WDA) without Xcode You can run iOS automation scripts on Windows With WDA

Table 1 iOS UI automation framework

Table 1 lists several common iOS UI automation frameworks currently on the market. On the whole, it can be divided into three categories: 1) App source code staking: It can obtain the control tree of the host page and the executable operation of in-process injection through the staked SDK. The staked method has fast execution speed, but it also has disadvantages. The poor SDK may have bad influence on the host App, such as the deterioration of stability. 2) WDA private interface: the advantage of this method is no need to pile, is also the mainstream iOS UI automation scheme, but often use private interface must bring compatibility problems, secondly, private interface to obtain control tree performance is sometimes worrying. 3) Image recognition combined with WDA private interface: based on this automation capability completely depends on the image capability, it also has the advantages and disadvantages of the other 2.

Table 2 shows several relatively good iOS stability Monkey test tools on the market at present. In summary, they are basically implemented based on XCTest and WDA. However, the common problem is that the maintenance is not timely and even has been stopped for a long time. The primary problem faced by the development of its tools is the huge development cost of the new iOS version, especially the compatibility of WDA (WebDriverAgent, which provides some cross-process App scheduling and control tree fetching capabilities) private interface. We often have to wait for Facebook to resolve WDA compatibility before we can get started. Unfortunately, Facebook has now abandoned WDA in favor of IDB Development Bridge (iOS Development Bridge), which is similar to the ADB tool in Android, but has stability issues on real phones. WDA is now being taken over by Appium as a community and continues to iterate.

The key technology advantages disadvantages
ui-auto-monkey The earliest tool used to do iOS Monkey test, JavaScript driver, Monkey based on UIAutomation With the iOS and Xcode updates, the UIAutomation framework was axed and only worked on versions prior to Xcode7.x, which is now deprecated.
SwiftMonkey Based on XCUITest framework, Swift development, pure coordinates random click Monkey Fast (millisecond level), lightweight and compatible The tool needs to be plugged into the App source code and does not support parsing the control tree
FastMonkey Optimized WDA private interface and XCUITest with support for parsing control tree, event-driven Monkey based on XCTestWD and SwiftMonkey secondary development Fast (millisecond level), lightweight; No piling, optional support for control tree parsing (second level), custom event configuration Only Xcode 8.x, 9.x and 10.1 is supported, other versions are not compatible (other versions based on this secondary development are compatible with Xcode 10.x and 11.x)
OCMonkey Developed based on Objective-C and integrated with WDA private interface, you can customize the Monkey driven by control type weight No need to plug, support control tree resolution Control tree parsing speed is slow (100 ms – second level), does not support Xcode10. X, maintenance has been stopped
Macaca/iOSMonkey Based on Macaca secondary encapsulation, node.js, integrated with WDA proprietary interface, similar to Appium, server-client automated testing framework provides external drivers for Instruments Have cross-platform support ability, support control tree parsing Heavy weight, complex environment setup, slow event-driven response (level 10+ seconds), maintenance has been stopped
sjk_swiftmonkey SwiftMonkey based secondary development, compatible transformation of WDA private interface, support parsing control tree, event probability driven Monkey Lightweight, no instrumentation, support control tree parsing, support custom event configuration, support Xcode 11.x Control tree parsing is slow (in seconds) and does not support Xcode12.x

Table 2 iOS Monkey tools

In addition to compatibility issues, there are also concerns about iOS’s ability to retrieve GUI page information. Many of the Monkey tools in Table 2 above integrate the ability to parse the GUITree control tree. The advantage is that control parsing is much more efficient than pure coordinate clicking. For example, multiple clicks on several coordinates are likely to be repeated operations in the same control area; In addition, with control parsing capability, you can customize some behavior trees or control masking configuration mechanisms to enrich the capabilities of tools. At the same time, relatively speaking, control parsing speed is actually an important indicator that we need to consider. As a pressure testing tool, we certainly do not want to click and operate in 10 seconds, but rather hope that it has control parsing ability and speed close to the speed generated by coordinate based events. Having the ability to recognize the control tree and the time it takes to recognize it is clearly a trade off problem. After all, no “crazy” racer wants to see his attempt to change a tyre beaten a lap by the runner-up!

4.2 Fastbot-iOS cross-platform solution

To sum up, considering the pros and cons, fastbot-ios architecture uses lightweight and necessary WDA proprietary interface, plug-in SDK (optional, the extension provides additional plug-in capabilities) and pure image recognition based technical solution.

The specific workflow is shown in Figure 10, where the differences between Fastbot-ios and Android are highlighted.

First, we developed a Fastbot Native library that uses machine vision to parse page information using pure images. The library translates screenshots into the GUITree XML structure for the page. Through opencv and machine vision algorithm, we can identify the layout structure of GUI pages, control information and structural clipping for popover pages. This Fastbot Native is implemented based on c++, and the internal design and implementation can be versatile. For example, the library can be ported to Android and Mac PC at a low cost.

Secondly, we optimized WDA for the sake of performance and compatibility, and only reserved the minimum range of WDA private interface. The benefit of this design is to provide high availability of tools and fast compatibility with the latest version of iOS, even without any modification. For example, Apple released iOS15 the same day Fastbot-ios seamless compatibility can run directly.

Finally, we also provided extensibility capabilities of plug-ins, such as ShootsSDK plug-in integrated in App (E.g. the general UI automation framework developed internally by Bytedance for writing UI automation test cases, similar to the POCO SDK of Airtest in the market). The plugin uses internal reflection to access the App’s GUITree control tree, which is typically used only for webViews, Lynx, games, and businesses with special page parsing needs. More commonly, the Fastbot Native library can handle page parsing. At the same time, the extension mechanism supports the internal development of customized plug-ins, which only need to align with our Fastbot-ios communication protocol.

Figure 10 fastbot-ios action sequence diagram

4.2.1 Testability improvement

In addition to the Shoots plug-in, Fastbot-ios has also developed the AAFastbotTweak testability plug-in. As its name suggests, this plug-in is also integrated into the App to provide testing and enhanced scalability capabilities for the App, including but not limited to:

  • Scene restriction: You can restrict the host App to a scene and enter any sub-page of the scene. If you exit the scene, Fastbot-ios will immediately re-enter the restricted scene page. It also provides a blacklist and whitelist mechanism to specify which pages cannot be redirected or which pages can only be redirected to.
  • Block jump: periodically block QQ, wechat, Taobao and other third-party hop.
  • Shield upgrade: Shield the host App from automatic updates.
  • Automatic login: Obtains the specified account from the account pool and automatically completes the login.
  • Data Mock: The value and keys of A/B Testing that the Mock presets.
  • Mandatory kill: Forcibly kills apps, such as the WatchDog, after receiving execution messages.
  • Schema Jump: Reads the preset Schema List and automatically jumps to the specified scenario page.

These features are pluggable, on-demand and highly customizable.

4.2.2 Fault Injection

In addition to the above two SDKS, Fastbot-ios also developed a fault injection SDK, which is also integrated into the App. This plug-in simulates various complex and extreme situations of online devices, and tests the stability of the App in extreme situations by running Fastbot traversal tests while penetrating transient or persistent faults. Its capabilities include but are not limited to:

  • Simulation of high CPU load: use high-frequency calculations to increase CPU load, simulation makes most threads in the full state, single thread in the state of fluctuation.
  • Analog CPU Downfrequency: Adjust the value of “maximum CPU frequency” of jailbroken iPhone in low battery mode to simulate the low-frequency and high-temperature status of the device.
  • Simulate low available memory: preempt the memory by allocating non-releasable memory at a time, so that the App runs the test in low available memory state.
  • Simulate disk exception: Generate large files using random numbers and then copy them. Insert characters randomly into the copy files to create low available disks or no available disks.
  • Analog high IO: In write and erase state for a long time, using a small amount of disk and memory for I/O emulation.
  • Simulate thread or thread pool high concurrency: add count locks before thread execution to create high concurrent access to critical resources.

These functions and services can be accessed on demand, and multiple faults can be mixed. #### 4.2.3 WDA optimization

The modification of WDA was for the sake of performance and compatibility. We kept trying until we met the principle of minimal use. In the end, we only kept the following three interfaces, and replaced all the other private interfaces with XCUITest native interfaces:

  • Foreground process handle related private interface :(NSArray_)activeApplications;
  • Application initializes the associated private interface :(id) initPrivateWithPath:(id)arg1 bundleID:(id)arg2;
  • Generate the device event private interface :(void)synthesizeEvent:(XCSynthesizedEventRecord _)arg1 completion:(void (^)(NSError _))arg2.

This decoupled Fastbot-ios is much lighter, and for subsequent iOS compatibility iterations we only need to focus on these proprietary interfaces.

In addition, ios-Monkey tools such as OCMonkey, listed in Table 2 above, typically invoke the automation framework XCUITest or WDA to resolve GUI page information, which has a stability problem. Fastbot-ios is a third-party process that parses GUITree through XCUITest or WDA, which involves recursively parsing page elements. This recursion can cause resource problems on complex pages. There will be a high probability of disconnection or timeout, and what’s more, after running for more than 10 hours on the iPhone with low configuration, the battery will obviously overheat, and there will be the risk of battery bulge for a long time. Therefore, we choose to do business without the introduction of specific plug-ins (by default, we hope that the tool is completely based on non-plugging without the need to access such plug-ins, because the access of plug-ins means a certain transformation cost for App, and it is not suitable for release package test. Only in special cases or in order to improve the expansion capacity can we access the tool. Completely abandon the normal way of page parsing, switch to the cross-platform image structure coding technology, specific application, only need to rely on a XCUITest screenshot interface, and this ability can directly take screenshots of all content at the system level. Native support for parsing in-app and out-of-app pages.

5. Fastbot intelligent image processing expands cross-platform capabilities

5.1 Application of image algorithm in the field of testing

Intelligent image processing refers to a kind of image processing and analysis technology based on computer adaptive and various applications. It is an independent theoretical and technical field, but it is also a very important technology in machine vision.

The origin of machine vision can be traced back to the 1960s American scholar L.R. Roberts’ study of image processing of the world of polyhedral building blocks and the opening of the “machine Vision” course in the ARTIFICIAL Intelligence Laboratory at MIT in the 1970s. In the 1980s, the global machine vision research boom began to rise, and some application systems based on machine vision appeared. After the 1990s, with the rapid development of computer and semiconductor technology, the theory and application of machine vision have been further developed.

After entering the 21st century, machine vision technology develops faster and has been applied in many fields on a large scale, such as intelligent manufacturing, intelligent transportation, medical and health, security monitoring and other fields. At present, with the rise of artificial intelligence, machine vision technology is in a new stage of continuous breakthrough, towards maturity.

According to the research and testing field, more and more companies and academic organizations have introduced image processing and machine vision into the field, and the use of the scene is gradually rich, but also produced a lot of excellent tools. Table 3 lists several test tools that represent graphical capabilities. Seeing some bright spots in them is “like finding a guiding light in a boat in the middle of the night”.

time Company (Organization) Image technology Application field
Sikuli 2009 open source MIT Based on the recognition of screen image control, template matching and SIFT feature value matching in OpenCV are used technically 1. UI automation; 2. Image matching
Applitools 2017.07 Applitools The adaptive algorithm is used for visual testing, and the potential UI errors are found through DIff. The principle is to check checkpoint in each step of the baseline manually, and the comparison of checkpoint by image algorithm is used to assert 1. Functional test; 2. Regression testing
AirTest 2018.03 release netease Automatic test framework based on image recognition, principle from Sikuli 1. Game UI recognition; 2. Cross-platform AppUI recognition
Test.ai 2018.08 Test.ai Dynamically identify screens and elements in your application and automatically drive your application to execute test cases 1. UI traversal test; 2. Object detection
Appium1.9 2018.08 Appium Add the ability to identify and locate based on image control 1. UI automation
AppiumPro 2018.11 Cloud Grey Test. Ai is used as a plug-in to realize control recognition by deep object detection 1. Object detection

Table 3 Application of image algorithm in testing field

5.2 Image UI recognition

Under the premise of low power consumption, low consumption time and high performance of Fastbot, we give priority to the most basic image processing technology to identify GUI interface information, which can be completed in milliseconds to build a page of information. Basic image processing includes:

  • Basic segmentation:
    • Pretreatment: including clipping, gray histogram equalization and binarization. Clipping is mainly aimed at dragging the upper and lower bar on the side of the page, which will cause badCase when scanning the row, and the right column will be cut first when processing. Gray histogram equalization is mainly aimed at the situation where the whole image is dark, usually in the night mode. After equalization, the contrast of the background and UI will be higher. Binarization sets pixels less than a specific threshold to 0 and those greater than the threshold to 1.
    • Row and row scan: scan a row of pixel values from top to bottom or left to right of binary graph. If all of them are 1 (light color), they are regarded as non-UI area. If not all of them are 1, they are regarded as UI area. Alternating line scan iterates for many times, and basically can divide a picture, as shown in Figure 11 below.

Figure 11 Row and column scan

  • Text block aggregation: Merges adjacent UIs of type Text into a whole. The text line aggregation is performed first, followed by the column aggregation. The following figure 12

Figure 12 text block aggregation

  • Night mode: if the number of regions segmented is too small, the night mode will be judged to appear, and the gray histogram will be balanced first, and then the binarization threshold will be adjusted for segmentation. As shown in figure 13

Figure 13. Night mode

At the same time, when performance requirements are more lenient, we introduce deep machine learning related technologies to improve the accuracy of page parsing:

  • Classification: classifies detected controls to distinguish buttons/search boxes/pictures/text/short text, etc.
  • OCR: Text recognition that can be used to retrieve custom events.
  • Target detection: YOLOv3 target detection model is used to directly locate the pre-marked controls.

FIG. 14 Target detection

5.3 Image UI anomaly detection

In addition to the recognition of UI interface information, we also developed a rich image UI anomaly detection capability. Capabilities include but are not limited to:

  • Black and white screen: A black screen or a white screen is abnormal. Usually, an image loading error occurs due to an incorrect image path, application permission, or network disconnection. As a result, the image cannot be rendered on the interface.
  • Image overlap: Multiple images overlap each other, usually due to performance stalling during asynchronous image rendering loading.
  • Purple block anomalies: Purple block anomalies are commonly found in game scenes, usually caused by broken or missing tiles or model images.
  • White block exceptions: White block exceptions are commonly found in game scenes, where they are caused by broken or missing UI images.
  • Black box: A black area around an image exceeds the threshold width. This area is usually caused by insufficient compatibility between the model and the layout.
  • Overshot: Usually occurs in a game scene, usually due to a game engine rendering error.
  • Control occlusion: A control is stacked on top of another control, completely blocking the lower control, often due to the control’s aspect ratio or the control’s text size setting.
  • Text overlap: Text in two text boxes overlaps, usually due to text size errors. Text overlap is different from control occlusion, where two parts of text are confused, and control occlusion is when one control completely blocks the other.
  • Image loss: the image path is set incorrectly, application permissions, network disconnection and other reasons, resulting in image loading errors, resulting in incomplete image display on the interface.
  • Null value: Error in text display due to parameter setting error or database reading error.
  • Glitch: Glitch in a game or video, usually due to a hardware defect or an error when using GPU/CPU acceleration instructions.

6. Application of Fastbot in game testing

In recent years, reinforcement learning has been able to learn to play go, Starcraft, Dota and other games, and even surpass professional human players. These technological breakthroughs not only bring about innovation in game AI design, but also provide the possibility of intelligent game testing. In view of the real needs of the game business, Fastbot has made a lot of explorations and attempts in the direction of game testing combined with the current artificial intelligence technology.

  • Multi-language detection: In view of the background of a number of games going to sea one after another and the lack of manpower in the international multi-language testing of games, we realized the game UI traversal through Fastbot, and captured the game screen at the same time. Through OCR text recognition, we compared the text information and text area recognition with translation errors, text super box and other problems.

Figure 16. RO Legends of Wonderland – Language translation error – English appears in Thai

  • Automatic mission AI: In view of the scenario in which some story tasks need to be completed before reaching the game, we developed a set of Fastbot-A3C Agent algorithm and realized the automatic task of the game by combining the game state graph, behavior tree rules prior knowledge and imitation learning to meet the long-term stability, compatibility test and international multi-language detection of the game.

Animated 3 RO Fairy Tale game automatically do mission AI

7. To summarize

At present, Fastbot has been widely used in stability testing and compatibility testing of byte client products. More than 10,000 tasks are started daily, and more than 50,000 crashes are found every month on average. With Fastbot’s capabilities, we were able to fix the majority of crashes before release, ensuring a good experience for online users. At the same time, Fastbot plays an important fundamental service role in the entire DevOps process.

In the meantime, we open source:

  • Fastbot-iOS:github.com/bytedance/F…
  • Fastbot-Android:github.com/bytedance/F…

We hope to have in-depth cooperation and exchanges with peers in the industry. We believe that more and more intelligent testing tools will accelerate the reform in the field of quality engineering and promote the domestic quality engineering technology level to the forefront of the global quality engineering industry.

At the end of this article, we would like to express our heartfelt thanks to the teams of production quality Engineering, production iOS client platform architecture, Data vision technology, production game AI and game quality and effectiveness for their support.

  • Fastbot QQ communication group: 1164712203
  • Fastbot wechat communication group: If the QR code has expired, please add VX: 18610309004

8. Join us

Bytedance Quality Lab is an innovative team dedicated to theoretical research and technical pre-research of software engineering for the Internet industry. Our mission is to become the world’s top intelligent tool team. We are committed to applying cutting-edge AI technology to the field of quality and engineering performance, providing intelligent testing tools for the industry, such as Fastbot, ByQI, SmartEye, SmartUnit and other testing services. On the way of becoming the world’s top intelligent tool team, we hope to bring more intelligent means to the quality field.

Here, you can use machine vision and strong learning to create test robots with superior abilities and verify your algorithms on thousands of devices. You can also practice the test theory in all kinds of textbooks to help the business improve the test efficiency, combination test, program analysis, and precise test, automatic generation of single test, automatic defect repair, all waiting for you to explore; Moreover, we can communicate and cooperate with top institutions at home and abroad, and explore the possibility of more software engineering fields together with scholars from all over the world. Welcome to join us. Resume: [email protected]; Email title: Name – Working years – Quality Lab-Fastbot.

9. Relevant information

  • Sapienz: Intelligent automated software testing at scale, engineering.fb.com/2018/05/02/…
  • Dynodroid: An input Generation System for Android Apps, dl.acm.org/doi/10.1145…
  • EHBDroid: Beyond the GUI testing for Android applications, ieeexplore.ieee.org/document/81…
  • Predictors: Guided, Stochastic Model – -based GUI Testing of Android Apps, tingsu. Making. IO/files/predictors…
  • APE: Practical GUI Testing of Android Applications via Model Abstraction and Refinement, helloqirun. Making. IO/cca shut/at…
  • TimeMachine: Time-travel Testing of Android Apps, www.comp.nus.edu.sg/~dongz/res/…
  • Q – Testing: Reinforcement Learning -based named – Driven Testing of Android Applications, minxuepan. Making. IO/Pubs/Q – the test…
  • ComboDroid: Generating High-quality Test inputs for Android Apps via use case combinations, dl.acm.org/doi/10.1145…
  • Google Monkey, developer.android.com/studio/test…
  • Droidbot: a lightweight UI – Guided test input the generator for android, ieeexplore.ieee.org/document/79…
  • Humanoid: A Deep Learning – -based Approach to Automated Black – box Android App Testing, ieeexplore.ieee.org/document/89…
  • Wuji: Automatic Online Combat Game Testing Using Evolutionary Deep Reinforcement Learning, yanzzzzz. Making. IO/files/PID61…
  • The Game Imitation: Deep Supervised Convolutional Networks for Quick Video Game AI, arxiv.org/abs/1702.05…
  • Automated Video Game Testing Using Synthetic and Human – Like Agents, ieeexplore.ieee.org/document/88…
  • Counter-strike Deathmatch with large-scale Behavioural Cloning, arxiv.org/pdf/2104.04…
  • Developed based on the following tools:
    • zalando/SwiftMonkey
    • b1ueshad0w/OCMonkey
    • zhangzhao4444/Fastmonkey
    • facebook/WebDriverAgent
    • AirtestProject/Airtest
    • tianxiaogu/ape
    • zhangzhao4444/Maxim
    • tingsu/Stoat
    • skull591/ComboDroid-Artifact
    • yzygitzh/Humanoid
    • anlalalu/Q-testing