Author: Liu Tianyu (Modest Wind)

Series “nuked engineering corruption review | proguard governance” nuked engineering corrupt | manifest governance “nuked engineering corruption: Java code management” “nuked engineering corrupt | resources governance” nuked engineering corrupt | dynamic link library so governance “. This is the last article in the series, focusing on the overall governance ideas, scheme design, and the thinking and trade-offs behind it.

Engineering quality is the basis for any product to carry out business function iteration quickly, efficiently and stably. It is also a factor that cannot be ignored to bring users good product experience, and it is also the expectation and pursuit of any excellent engineer. However, any large-scale project has to face the problem of engineering corruption, which is extensive and delicate, hidden in the “corner” that is not easily perceived, and has an impact on all aspects of the project.

Born corruption and engineering itself, throughout the project life cycle each stage, the time, people, code, procedures, rules, and will lead to the change of any factors decay, from awareness to repair, systemic analysis to plan making, to accept and carry out the sustainable management, this article to this one by one.

origin

Until a project matures, corruption is deeply embedded in the code, which can significantly reduce development efficiency, but causes online problems infrequently and can easily be fixed as a single point of issue. However, as the degree of corruption intensified, the same type of problems appeared more and more frequently, and gradually smelled the “smell of corruption”, which led to a series of subsequent analysis, solution design, tool & platform development, and governance practices. Let’s take a look at the following chart, which may be familiar to many r&d students:

1.1 Smell of corruption

With years of experience in the field of Android architecture, the author is directly responsible for or indirectly involved in stability, startup performance, package slimming, engineering efficiency, new version OS adaptation and other aspects. With the deepening of various governance items and time, I have encountered various problems, such as: Conflicting resources cause inconsistent resource values in apK after multiple builds, even if the code does not change, resulting in online problems. Java code changes cause incompatible calls that eventually raise an online Java exception; Random use of threads, lack of unified control, on the one hand, the performance is worrying, on the other hand, the number of threads exceeds the user-defined limit of some devices, resulting in OOM exception; Useless code & resources & function modules, resulting in continuous increase in package size; Apk construction takes longer and longer, which seriously affects r&d efficiency. Dozens of such examples can be cited and will not be repeated here.

When we try to look at and think about these problems from a holistic perspective, we find the powerful enemy hidden behind them — engineering corruption. Engineering decay, is simply useless redundancy/unreasonable code of continuous accumulation, thus easier to a problem, the problem is more difficult to locate, and iterative faster decay quickly, even without any iteration, along with the new version of the OS listed, and increasingly strict privacy compliance regulation situation and so on the external environment change, will cause the stock code problems. Next, dive into the development iteration to see where the rot comes from.

1.2 Analysis of the generation of corruption

As mentioned above, there are many factors that can lead to project corruption, but there are only two primary factors: time and people. Time means changes in the external environment of the project. For example, the OS version number in the target device will be constantly upgraded, the development tool chain, the IDE, etc., will be updated iteratively, and a static engineering code will be corrupted slowly over time. Compared with the slow degeneration effect caused by time, the rapid engineering iteration led by human is the biggest source of rapid engineering decay. With that said, let’s focus on who is involved in an APP version iteration & delivery process, what are the core appeals, and how engineering corruption accumulates in this “soil.”

The figure above shows a typical mobile app version iteration & delivery process. For large app and R&D team, each role may have a special post and person to take charge of it, while for small app and R&D team, one person may play multiple roles:

  • Product and design, responsible for function, UI, interaction design, care about creativity and function to bring value to users, as well as visual and interactive smooth cool;
  • Research and development and testing, after received the product requirements and the design draft, responsible for the development, implementation, code effect & quality assurance, research and development and test students, often want to requirements and design don’t change after once established, moreover also want as far as possible reuse of existing logic and function, the demand for constantly pushed to redo type and design has a natural “resist”, Finally, I hope to have more time, more time, to ensure the code quality and acceptance effect;
  • The PMO and PTM are responsible for release pacing, managing the release process, caring about overall demand throughput, and process and online quality;
  • Channel and operation is responsible for delivering the new version of APP to users through various channels on time, and acquiring new users and comprehensive and rapid growth of users’ use of APP functions by means of endless operation methods;
  • In the previous process, security and legal affairs need to ensure that the security vulnerabilities of APP are solved in a timely manner, and privacy compliance and other related issues do not have risks.

Finally, users acquire or upgrade to the latest version of the APP, whose core appeal is “Is it good? Was it fun?” . After obtaining the new version of the APP, regulators and testing agencies will check whether there is any “violation” in the process of using the app according to the current laws and regulations.

In the delivery process of such an app version, you can see that the priorities of each character are not the same, and that the demands of all characters are ultimately carried through the code. Engineering corruption comes from the developer code production activities, the will of the developer itself, skills and experience, do greatly affects the quality of the code, but the complexity of the function of the modern enterprise app, by no means all participating developers, to be able to know the app all code, so this locality of engineering master or code, May be a more important factor in the generation of engineering corruption.

1.3 Disassembling corruption

After analyzing the generation of corruption, we further disassemble the corruption item of Android project with finer granularity. In terms of the types of “code” that the Android project contains, it can be divided into the following five:

The project configuration refers to the relevant configuration used in the APK construction process, and the configuration content itself does not enter the final APK. Such project configuration corruption mainly affects the complexity of the project itself, and even the construction process time, such as a large number of ProGuard configuration items. The other four types, MANIFEST, Java code, resource, and dynamic link library SO, are all possible “elements” that make up APK, and can have a variety of corruption problems themselves or with each other that directly lead to APK stability, performance, package size, UI& functionality anomalies, privacy compliance risks, and so on. Or increase the likelihood of these problems.

In actual tool development and governance practices, divide and conquer is implemented along these lines.

Response to

After the analysis of decay generation and the breakdown by type is completed, an effective response plan needs to be developed.

First, the guiding principle to be clear and always bear in mind is: “Do the right thing, the right way, no matter easy or difficult”. “The right thing to do” is often easy to define and reach consensus on, but “doing it the right way” can be difficult, because sometimes “the wrong way” means a shortcut to achieving the desired outcome quickly. For example: Suppose that we need to switch all threads in the app to unified thread pool implementation, there are two ways to complete, one kind is to use the build directly when aop technology to replace thread calls the code directly, another is the use of non-uniform thread pool testing & monitoring mechanism, in the protection of the effective prevention and control incremental code cases, gradually change the stock code. Obviously, the first approach achieves the goal quickly, but adds to the apK build time, and if the AOP process itself fails to replace if something goes wrong, or if the replacement process terminates unexpectedly and the bytecode replacement is incomplete, then there is another kind of “engineering corruption.” The second method can not achieve the goal quickly, but can effectively stop the trend of corruption, and gradually digest the problem of stock. Although the bayonet itself needs daily approval and evaluation, and stock code cleaning is not accomplished overnight, but the direct correction of code source is the “correct way” to solve the problem of project corruption.

2.1 People vs process

Engineering corruption comes from unreasonable changes of engineering code made by people in the version iteration process. Therefore, engineering corruption governance needs to be carried out around “people” and “process”.

There are well-established practices in the human factor, such as code review, code specification, IDE Lint rules, and ongoing technical training, which can improve the design and coding of developers and reduce code corruption at the source. In addition, it can imperceptibly improve the overall project quality and accomplishment of the RESEARCH and development team, and bring a more comprehensive improvement to the project quality. However, there are some problems with this approach that should not be ignored: developers involved in a project may not have the same level of technical knowledge, skill, and understanding, and the implementation of these specifications/rules may not be guaranteed and the potential cost may be high.

In terms of engineering corruption, it is highly uncertain to completely rely on these human-centered schemes, and the prevention and control of corruption requires a certain mechanism to “guard the door”. Meanwhile, the prevention and control itself needs to achieve a low cost, so we focus on the process. The process has the characteristics of objective, fixed and guaranteed. On the one hand, with comprehensive APK detection and analysis technology as the core, it can accurately locate the corruption items and deploy bayonet at key nodes of the process, timely sense and deal with problems on the spot, so as to achieve zero increase. On the other hand, diversified auxiliary tools should be provided to reduce the risk and cost of rectification and improve efficiency. Rome was not built in a day, so the thawing process cannot be carried out in a great leap forward mode. Instead, it needs to be iterated step by step without affecting daily RESEARCH and development activities, and finally realize the inventory clearance.

Around these solutions, and the process is not a choice but should be complementary to each other, the former focuses on reducing corruption completely from the source term, the latter is indiscriminate block which can effectively detect corrupted items into the final apk, at the same time, strengthen the consciousness of developers to prevent corruption, and promote the effective implementation code Review, specification and so on, Thus forming a virtuous circle.

2.2 Analysis Tools

As the core of the APK detection and analysis technology, what are the specific capabilities involved? Take a look at the chart below:

The figure above is a summary of current detection and analysis technologies, which can be divided into four types: redundant conflict, key configuration, reference relationship and auxiliary efficiency improvement. The first three types directly correspond to specific corruption items, while the last type helps developers better locate and analyze problems during the daily development process. For each detection capability, it is not detailed here. In the series of articles “Firing a gun at Engineering corruption”, relevant explanations are given in combination with specific practices.

2.3 Bayonet system

How are these detection capabilities combined with the process? Take a look at the following schematic diagram of the process bayonet:

For development/testing students, apK construction is required at key nodes such as test, integration, gray scale/official release, and at the same time, all detection and analysis that has been deployed will be automatically triggered. If the local packaging fails, the build fails directly, and the related information is given in the failure cause. If it is packaged on CI/CD platform, the bayonet result will be presented in the form of platform page. In either mode, the process will be interrupted until the developer fixes the problem and then continues. In this way, the problem of corruption can be timely perceived and modified on the spot.

In platform mode, for example, each time a test/integration is committed, the APK build triggers bayonet detection and blocks the process if any bayonet items fail. The following is an example of the bayonet result:

With such a set of capabilities and mechanisms in place, let’s look at how we can control and prevent corruption of all kinds. First of all, the concept of “module”, the impact on engineering corruption and governance, as well as tool construction and governance practices should be clarified.

The module control

The generation of a complete APK can be regarded as a process of “building blocks”. Each building block may contain Java code/resources, Android resources, AndroidManifest file, dynamic link library SO, and ProGuard configuration. These building blocks are joined together according to certain rules, and similar elements are mixed and compressed to become the final APK file. These “building blocks” are, in more technical terms, modules. Modules provide the possibility of functional reuse, and also provide the basis for parallel research and development mode. Generally speaking, the larger and more complex the project, the higher the degree of modularization.

The nature of engineering corruption is caused by the complexity of functions and code changes. Although modularization itself will bring some problems of corruption, more importantly, it facilitates the governance of engineering corruption. Imagine an app with hundreds of people divided into more than ten teams participating in iterations. If all the code is developed in the same app project, how to distribute the code is a huge challenge in case of corruption, not to mention solving the problem of code collaboration. In real engineering, modular general (normal engineering option) will be increased with the increase of function and developers, under this premise, engineering corrupt governance, the first thing to do is to clearly know each specific corrupt problem, from which a few modules, it is the premise of distribution and processing the problem. Next, I will first give a classification of modules, then describe several “auxiliary analysis capabilities” for module development, and governance practices on top of that.

3.1 Module Classification

Jar/AAR introduced in the form of external dependency in APP project and subproject parallel to APP project may be the most common module types in daily research and development. In addition, Android also supports other types of modules. From the perspective of APK construction, the complete classification diagram of modules is as follows:

The figure above shows the five module types, along with several dimensions: whether source compilation is required during apK builds, whether it exists in maven repositories, and possible dependencies. The following are explained separately:

  • App-project has one and only one, which is used to generate apK and contains source code, so source code compilation is required. You can rely on sub-project, local JAR, Flat AAR, external Module;
  • Sub-projects can have 0 or more, which are generally parallel to app-projects. They also contain source code and can rely on sub-projects, local JARS, and external Modules.
  • Local Jars cannot stand alone. Java code already exists as compiled class bytecodes and cannot rely on other types of modules.
  • Flat AAR is an Android native way to introduce aArs in non-Maven without source compilation and without relying on other types of modules.
  • External Modules, which do not require source compilation, can rely on other external modules. The dependency information is located in the corresponding POM file of maven repository.

Generally speaking, the “birth” of an APP starts from an app-project: all the code and resources are written in this project, and of course some secondary and tertiary libraries are introduced (dependent on) in the form of external modules. As the functionality of an app increases in complexity, it is likely that more developers will come on board, and the first modular “change” may come after a period of continuous iteration: splitting common functionality into sub-projects; As the number of developers increases, the cost of code collaboration increases. At this time, it may be necessary to split a single code repository into multiple code repositories to facilitate parallel development. At this time, the second modular “revolution” is coming: code repository splitting and finer granularity module splitting, and the degree of r&d parallelism continues to increase. Eventually, it evolves to the ultimate form of modularity: app-Project becomes a “shell” for packaging APK, almost all code is split into separate modules and repositories, which are dependent on (introduced) in the form of external modules in app-Project, and r&d is highly parallel.

Many large apps have basically completed the above evolution process, but at the same time, new problems have arisen. Next, we will talk about what tools have been developed and what governance has been carried out in the module dimension.

3.2 Auxiliary analysis ability

Auxiliary analysis ability, mainly from the perspective of apK complete construction, provides modules and their dependency information for developers to solve various daily problems, such as:

  • “I updated the version number of a module, why is the code in APK still old?” Look at the apK build and see what version number the target module ended up using. If there is no update, this problem will definitely occur.
  • “I removed the module, why is there code/resources in APK?” — Check whether the target module is involved in the APK construction process, whether the APP project is directly dependent on the introduction, or other modules are indirectly dependent on the introduction, and quickly locate the cause.
  • “I use a method in one module project, but I can’t find it in APK. What’s the reason?” Check the version number of the other module that the apK build depends on, upgrade the target project to the version number of the module that it depends on, and recompile the target project to see if methods have been removed, migrated, or signature changes.

Next, each auxiliary analysis capability is briefly introduced.

List of external dependent modules

List of external dependent modules, uniformly output all external dependent modules involved in this APK construction, and their version numbers and types. Example results:

Com. Youku. Arch: testlib: 0.1 - the SNAPSHOT @ aar com. Youku. Arch: testlib2:0.3 @ aarCopy the code

Dependency detection

During the APK build process, some external dependency modules are introduced through indirect dependencies (without declaring dependencies directly in the APP project). This indirect dependency exists in the POM file of the module in the Maven repository. Through the dependency detection function, you can easily find a module, which other modules are directly dependent on, for module offline, or to determine the ownership relationship (according to the dependency relationship, determine which upper-layer business the module belongs to). Example analysis results:

com.youku.android:y-core
|-- [provided] com.youku.android:ct-ad
|-- [compile] com.youku.android:catl
|-- [runtime] com.youku.android:MtRec

com.tb.android:z_dev
|-- [compile] com.tb.android:zcore
Copy the code

Notice that the result of the analysis here is the dependent relationship. In this example, the com.youku.android: CT-ad module declares its dependency on the com.youku. Android :y-core module by providing. Android :y-core com.youku. Android :y-core com.youku. Android :y-core com.youku. And so on. Dependency types generally include the following:

  • The compile. This type of dependency will result in the module being pushed into the APK if the exclude setting is not added.
  • Provided. This type of dependency does not cause modules to be pushed into apK;
  • The runtime. This type of dependency does not cause modules to be pushed into APK.

Module when published to the maven warehouse, of course, can be customized pom file contents, so if the module is released, is not correct to project dependencies on other modules in written to the pom, so the above test results, also exist corresponding error messages, such as: miss real depending on type module, rely on do not tally with the actual, contains additional dependencies module, etc.

Mismatch dependency detection

In modular development mode, each module is developed independently and ultimately participates in apK construction, which makes it difficult to perceive that the module it depends on has been upgraded: The module itself is built using an older version of the corresponding dependent module, so it can be compiled, but when apK is compiled, it is likely that the module it depends on has been upgraded, resulting in some mismatched references. Mismatch dependency detection is for the convenience of module developers to clearly master the differences between the version numbers of other modules relied on during module compilation and the version numbers used by these modules during APK compilation, so as to timely upgrade the dependent module version numbers in module engineering. Example analysis results:

Com. Youku. Android: YTask | -- com. Youku. The android: BFra: 1.0.0 - SNAPSHOT = = > 1.0.0.44 | - Com. Youku. Android: BUIKit: 20190617 - the SNAPSHOT = = > 1.0.1.66 | -- com. Youku. The android: YUI: 1.4.2.16 - SNAPSHOT = = > 1.4.10Copy the code

In the example above, the YTask module relies on the BFra module 1.0.0-snapshot at compile time, and the BFra module 1.0.0.44 at APK build time, and so on. In addition, it provides the additional capability of uniformly exporting poM files of all external dependent modules to apK build artifacts files for centralized viewing and problem locating.

3.3 Governance Practices

On the basis of the above several auxiliary analysis capabilities, there are two situations that will bring uncertainty to the constructed APK and therefore become the direct target of module corruption.

The snapshot version number

At the beginning of APK construction, directly download jar/ AAR files of the corresponding version numbers of external dependent modules from maven repository to participate in the subsequent construction process. The SNAPSHOT version number can update the JAR/AAR to the Maven repository at any time, which is not expected to happen during the app release build, resulting in all kinds of unexpected online risks. Therefore, the apK build process, whether there is an external dependency module with SNAPSHOT version number, needs to be strictly controlled.

In order to develop the snapshot version number detection function and screen out all external modules with version numbers of Snapshot participating in the APK construction process. The following is an example:

Com. Youku. Arch: testlib: 0.1 - the SNAPSHOT com. Youku. Arch: testlib2:0.2 the SNAPSHOTCopy the code

Further, iteration key nodes in APP version, such as integration, grayscale/official release, and use this detection capability to form bayonets. Youku launched this function in the form of local bayonet (APK build failure) several years ago. In 2021, the bayonet was integrated into the entire bayonet system and became one of the bayonet. It was blocked for 7 times in total and effectively prevented the snapshot version module from being introduced into the APK build process.

The snapshot rely on

In the development phase, to facilitate joint debugging between modules, the version of the dependent module is usually changed to SNAPSHOT. If the SNAPSHOT version number of the dependent module is not changed back to the official version during the packaging of the official version after joint debugging, once the SNAPSHOT version of the dependent module is updated within this time window, It will cause the formal version of the module to rely on unexpected code compilation, and eventually lead to various incompatible problems in APK runtime, such as: API incompatibility (class, variable, method signature mismatch), inconsistent constants (constants will be expanded when the module is compiled).

This is why the Snapshot dependency detection feature lists the snapshot version number module on which each module depends and the corresponding version number of the module when APK is built. The following is an example:

Com. Youku. Android: YHPage: 1.9.35.5 | -- com. Ali. The android: VCommon: 20210309 - the SNAPSHOT = = > 11.1.6.4 | - Com. Youku. Android: YRes: 20210309 - the SNAPSHOT = = > 1.0.44.2 com. Youku. The android: OUtil: 1.0.4.11 | - Com. Youku. Android: OService: 20210105 - the SNAPSHOT = = > 1.3.8.2Copy the code

As a corruption control item, Youku launched this function in early 2021. At that time, more than 200 modules depended on snapshot module in POM files. At that time, they were added to the whitelist. At the same time, the key nodes of APP version were iterated to form corresponding process bayonet, which was intercepted for 25 times in the past year, effectively preventing online risk problems caused by this.

Other governance practices

The above modules related to corruption governance is only the outpost of the long war with engineering corruption. The following “five battlefields” are developed for the elemental classification and disassembly of the previous project corruption. You can go to see the details (click to skip) :

  • Proguard configuration
  • manifest
  • Java code
  • resources
  • Dynamic link library so

What else can be done

During the engineering corruption practice of Youku in the past two years, many r&d students have been supported by them. With ingenuity, enthusiasm and courage, they have timely solved the new problems and digested the existing technical debt bit by bit. With their long-term persistence and efforts, the current engineering corruption problem has been significantly reduced. “Do the right thing in the right way, no matter it is easy or difficult”, which is not only the principle youku firmly follows in the design and governance of engineering corruption solutions, but also the technical concept that this series of articles want to convey.

At present, there are only more than 20 specific corruption problems that can be detected by tools. Compared with the iceberg of engineering corruption, it is no exaggeration to say that this is really only a fraction of the total. Moreover, the solutions presented here can only solve one of these problems, and there is no effective solution to the extremely complex and even integral problem of corruption. In the face of engineering corruption, there is still a long way to go, and there are still many things that can and need to be done. To attack engineering corruption is a direct and crucial attitude to solve the problem.

Pay attention to [Alibaba mobile technology] wechat public number, every week 3 mobile technology practice & dry goods to give you thinking!