Author: jiwenqiang, DFX technical expert

When it comes to the development of a product, we usually think first of what kind of functions we want to achieve. However, in addition to functions, non-functional attributes also affect the experience effect of a product to a large extent, such as application freezes and crashes from time to time. Then why do some systems fail frequently, and some rarely appear these problems, which has to mention our protagonist DFX today.

What is DFX?

DFX is a product design concept that appeared in the 1960s and 1970s, but for many developers, it is a strange concept. What is DFX? DFX (Design For X) refers to the Design of non-functional attributes of a product, where X represents a certain feature of the product or a certain stage of the product life cycle. As can be seen from the figure below, the non-functional attributes of products are very rich, which directly affect the long-term core competitiveness of products such as quality, efficiency and cost.

Figure 1 Product DFX

In the past few years, The delivery efficiency and quality of Huawei software have been continuously improved. Compared with the last version, the delivery time of each major software version has been shortened, and the failure rate has also been greatly reduced. DFX has played an important role in these improvements. As industry awareness has grown, DFX has become a cornerstone of product Design eXcellence and an infrastructure For enterprise product Design and development. DFX is now being defined as “Design For eXcellence.”

What is operating system DFX?

Now that we know the concept of DFX and how important DFX design is to the product, we brought DFX into the design of HarmonyOS as a common infrastructure for the operating system, enabling the design, implementation, testing and maintenance of high quality and outstanding products. By looking at the applications and devices that the operating system serves, we identified the non-functional requirements that the system can provide, and then extracted a common, basic DFX framework to incorporate into HarmonyOS, resulting in the operating system DFX. Developers can use these DFX capabilities directly or flexibly within HarmonyOS, depending on the needs of the product.

Figure 2 Operating system DFX

See here, we may feel that the operating system DFX is not the product DFX ability copy to the operating system? In fact, there are two significant differences between operating system DFX and product DFX:

  1. Because the operating system is not specifically customized for a product class, but a full-stack, common infrastructure, the operating system DFX focuses on recording, diagnosis, recovery, observation, analysis, maintenance and service, and other common capabilities needed to develop products.
  2. The OPERATING system DFX focuses more on the development experience of developers and device vendors, with the goal of helping them design better products.

HarmonyOS ‘requirements for DFX capabilities

Given that the DFX operating system is designed to enable developers to create superior products, and that the DFX framework and capabilities have been added to HarmonyOS, one must wonder what DFX looks like in HarmonyOS. What does DFX bring to HarmonyOS? Before we answer these questions, let’s take a look at HarmonyOS ‘requirements for DFX capabilities.

Almost all operations have the following three requirements for DFX:

  1. Lightweight and effective: the system costs less resources, easy to use, easy to learn, accurate and effective.
  2. Basic universal: key, basic, universal, easy to expand, easy for developers to tailor and enhance.
  3. Comprehensive coverage: comprehensive services for application and device categories, comprehensive services for developers and device vendors, comprehensive coverage of the product life cycle.

In addition to these basic requirements, HarmonyOS has proposed new requirements for DFX:

  1. As we know, HarmonyOS is a hyperterminal oriented system, and there can be a huge difference in resources between a rich device with 8GB of RAM and 512GB of ROM, and a thin device with 128KB of RAM and 2MB of ROM. In the face of such resource differences, HarmonyOS required DFX to support full stack, multi-language, large or small, and flexible deployment.
  2. In addition to being hyperterminal oriented, HarmonyOS also features extensive support for distributed HyperTerminal scenarios, so HarmonyOS requires that the DFX capabilities of the system support distributed scenarios, such as distributed logging, distributed tracing, and distributed debugging and tuning.

Figure 3 HarmonyOS ‘requirements for DFX capabilities

HarmonyOS DFX Framework and capabilities

Now that you’ve been introduced to the concept of operating system DFX, let’s get down to business and introduce the framework and capabilities of HarmonyOS DFX.

Figure 4 Overview of the HarmonyOS DFX framework and capabilities

The brownish center portion of the panorama in Figure 4 shows the capabilities provided by HarmonyOS DFX.

HarmonyOS DFX offers the following capabilities:

(1) Recording ability: it provides lightweight log, event and tracking functions, which can record the track of program running and lay a foundation for subsequent analysis and measurement.

(2) Fault management ability: Provide accurate and effective fault detection, location and recovery ability.

(3) Observation and analysis ability: it provides unified and convenient observation and analysis tools, mainly including information export, information analysis and linkage debugging ability.

So what is the role of these DFX capabilities? As can be seen from the relationship between the middle part representing DFX and its surroundings in the panorama, these capabilities of DFX not only need to provide services for other subsystems of the operating system, but also have a more important mission to support software applications such as video and audio entertainment, intelligent travel and hardware devices such as “1+8+N”. In addition, these capabilities are also the basis of the product development operation tool chain, which requires IDE tools to support development and debugging and the construction of the product operation and maintenance big data analysis platform.

After looking at the HarmonyOS DFX framework, we learned that HarmonyOS DFX consists of logging, events, tracking, fault management, and observation profiling. Log, event, and trace embody DFX’s recording capabilities, fault management can help developers locate and discover problems quickly, and observation profiling is a series of tools to help developers use DFX capabilities in an integrated environment. Let’s take a look at each of the DFX capabilities that HarmonyOS offers.

1. Log (HiLog)

Log is often viewed as the most simple functions, but in the process of developers use log, there are two obvious problems, one is the phenomenon of excessive call log, the other one is with the expansion of software scale and organization, system log clutter, large flow problem more and more serious, not only easy to leak privacy, even the developers want to see your log has become increasingly difficult. To address these two issues, HarmonyOS DFX has designed a new logging feature called HiLog. Here is a schematic of HiLog.

Figure 5 Log (HiLog)

As you can see from the above figure, HiLog not only provides log collection functions in JS, Java, C, and C++ languages, but also focuses on log classification query, traffic control, and privacy processing. Let’s take a look at each of these designs.

(1) Classified query

HiLog classifies logs of different levels and provides hierarchical commands to query logs. In addition to viewing logs by Level, Type, and Tag, the command for viewing logs by Domain is also provided. The domain refers to the vertical domain of business across the software stack hierarchy. So why do we view logs by domain? Consider the following scenario: The Camera domain includes applications, services, and drivers. If a developer wants to filter the Camera domain logs from a pile of logs, there is no support for filtering them using the old filtering method. To do this, we define a DomainID for the required domain, and use domain filtering to solve this problem.

(2) Flow control

Through classified query, we solved the problem of inconvenient log view, but excessive log will also have a huge impact on the system performance. According to experience, if all logs in the system are opened, the system performance may be reduced to 70% in severe cases. So how do you solve the log overload problem?

HiLog solves this problem by controlling the total amount of logs in different domains. When collecting logs, it records the total amount of logs in each domain, identifies the domains that exceed the threshold, and controls the excess logs in this domain. There are different processing strategies for excessive logs in the Debug and commercial Release modes. In the Debug mode, excessive logs are prompted, but not actually discarded. In Release mode, excessive logs are discarded and a log drop message is printed.

Figure 6. Two modes of flow control

(3) Privacy control

In addition to inconvenient query and excessive logs, attention should also be paid to log privacy control. As we develop and debug, we tend to print more information, which makes it possible to print user privacy information, such as names, urls accessed, and so on. Now, the penalties for privacy disclosure are quite severe. The EU’s General Data Protection Regulation (GDPR) imposes a maximum fine of 20 million euros or 4% of annual turnover for privacy disclosure. Therefore, we need to be very careful when printing logs. Cannot print user privacy to log.

In order to control privacy, HiLog provides variable printing control function. Developers can flexibly declare variable contents with {private} or {public}. If {private} is declared, the variable is a private variable. In Release mode, the contents of these private variables are hidden. For variables that do not need to be controlled, {public} can be used to indicate them.

Figure 7 Variable printing control of HiLog

2. Event (HiView)

In addition to logging, HarmonyOS DFX provides the ability to record events and has designed a new Event Framework (HiView) for this purpose.

Figure 8. Event frame HiView

As we know, events may come from application or system, so HiView framework is divided into two parts: system event framework and application event framework. Each part provides an event collection interface. The HiSysEvent interface is used by the system event framework and the HiAppEvent interface is used by the application event framework. In addition, HiView provides a flexible subscription query interface that can share collected events for back-end processors. This interface can be used in a variety of scenarios. For example, the IDE can subscribe to events through this interface to display them on the debugging interface, and the system vendor can subscribe to events through this interface and then customize them.

HiView also provides a plug-in for the system event processing logic. You can configure and deploy the plug-in on HarmonyOS to flexibly adapt to terminals of different sizes.

3) HiTrace

Next, let’s take a look at HarmonyOS DFX’s final recording capability, tracing.

As a Hyperterminal oriented system, HarmonyOS requires the ability to track application interactions across devices, in addition to tracking interactions between applications and processes like a conventional operating system. In HarmonyOS, this distributed tracing capability is provided by HiTrace, which tracks the entire business chain through the delivery of TraceID. TraceID can be transmitted across layers, processes, and even devices among apps, Native devices, and Kernel devices. It’s worth noting that HiTrace is a lightweight tracking mechanism that only adds microsecond latency under Wi-Fi conditions, and the impact on the system is minimal.

Figure 9 HiTrace distributed tracing

4. Fault management

In addition to some of the logging capabilities described above, fault management is an important feature of HarmonyOS DFX. To help developers to quickly locate and found the problem, the HarmonyOS DFX in system side deployment of the full amount, accurate fault detection mechanism, contains 7 class single system fault detector (collapse process, application jammed, leak, hit the memory resources, restart, not boot and the entire system) and one kind of distributed fault detector, through the detector, The fault detection rate can reach more than 80%. To accommodate HarmonyOS’s Hyperterminal oriented nature, these fault detectors can be flexibly deployed on different devices based on resources.

FIG. 10 Fault detector

Due to space reasons, we focus on the process crash detector, application stuck detector and system crash detector among the 7 types of fault detectors:

(1) Process crash detector

Said the collapse process everyone must be not strange, this is one of the most common fault, the detection mechanisms are more mature, but the current detection mechanism, there are some problems, for example, cannot directly access to their application process related crash logs, crash logs contain a lot of useless information, repetitive information, such as collapse and grab the call stack failure. To address these issues, HarmonyOS DFX provides the following special design for their process crash detector:

  • Supports Java/JS/Native full stack detection.
  • A dedicated API is provided for application processes to query crash logs of their own processes, and only crash information of their own processes can be obtained. This solves the problem that applications have no right to obtain crash logs of their own processes.
  • By deleting the crash log information, a lot of invalid information is deleted, which helps developers locate the information more accurately.
  • Capture the call stacks of multiple processes at the same time, avoiding incomplete capture logs and ensuring accurate recovery of fault sites.

(2) Application of stuck & system crash detector

Application freezes and system crashes are also common faults. They occur in probability, but seriously affect user experience. The difficulty in detecting such problems is how to effectively match software failures with user-perceived crashes. If all software bugs are reported, developers will be at a loss, and if they are missed, they will not be able to accurately locate them. To this end, HarmonyOS DFX has designed the following application freeze & System crash detectors:

  • Thirty-two detection points are deployed in the system to comprehensively detect software crash faults.
  • In addition, four user behavior detection points are added to accurately detect the user’s reaction to the crash phenomenon.

These deployed detection points can be flexibly deployed according to the failure mode of different devices. If our device does not have a screen, there is no need to deploy on-off screen timeout and quick click on screen detection points. In addition to measuring points, decision rules can also be dynamically adjusted based on big data analysis of fault detection results. Through the above optimization, the detection rate of crash fault is increased from 30% to 80%.

Figure 11 Application of stuck & system crash detection

5. Observation and analysis

How can developers use the logging, event, tracking, and fault management capabilities provided by HarmonyOS DFX? Let’s take a look at our observational profiling tools, which can help developers analyze, locate, and tune problems.

(1) Information Export Tool (HiDumper)

In the process of development, debugging, testing and maintenance, developers need to frequently observe various information of the system. Generally, these observation information is obtained through information export. Although operating systems typically provide a variety of information export tools, the rules of these tools can vary widely and are difficult to adapt to automated test tools or ides. With the increase of product types, the information to be exported from the system becomes extremely rich, and the problems of information export interfaces, mixed capabilities, and difficult adaptation become more prominent.

To avoid information export problems, HarmonyOS provides HiDumper, a unified system information export tool. Compared with other information export tools, HiDumper standardizes command parameters and classifies, schedules, and outputs all exported information, simplifying the adaptation of back-end tools.

Figure 12. Information export tool HiDumper

(2) Distributed linkage debugging tool

Currently, APP debugging generally uses the local debugger, and each device to be debugged requires an independent debugging terminal and IDE tool, which obviously cannot well support the distributed business scenario that requires linkage debugging between multiple devices. To address this, HarmonyOS has developed a new distributed debug tool that associates logs, events, trace and fault logs across devices in the same IDE debug window, giving developers a single-device debug window experience. IDE runtime can automatically capture abnormal information, through the abnormal information associated with the related event list and flow log, and then through the exception log can accurately locate the code line, greatly improving debugging efficiency.

Figure 13 Distributed linkage debugging

(3) Distributed tuning tools

After introducing the observation and debugging tools, let’s finally look at the tuning tools. HarmonyOS ‘new distributed tuning tool accurately tracks the JS/Java/C/C++ call chain across the stack, logging activity across threads, processes, devices, and other granularity, generating normalized HiTrace files. By displaying HiTrace files in IDE graphical tools, developers can easily analyze performance bottlenecks in distributed applications.

Figure 14. Distributed tuning

So that’s our introduction to the key parts of the HarmonyOS DFX, and I’m sure you’ve got an idea of what DFX is all about. HarmonyOS DFX will continue to work on bug detection, fault recovery, big data analytics and more debugging and tuning tools to give developers the power to deliver even better products.