I. Context

The time probably occurred in the early days of January, flying book in the main warehouse source module split from the construction of the time to see the incremental level is still about in a higher range of level fluctuations, did not reach the source code module due to a large number of reduced due to the compilation speed of the optimization expected.

There has been speculation that may be because of the influence of the old branch data of the benefits of Ming didn’t come so in time, will produce certain deferred gratification benefits, but then after a Hummer to compile (byte internal monitoring platform) data, chart analysis, found that the effect of deleveraging will soon have actually reflect in the production environment, According to the figure, we can see that the number of source modules is in the warehouse after a sharp decline trend.

So it begs the question, what went wrong?

2. Online data analysis

Here we casually through the compilation list during this period, grab a compilation data sample with an increment of about 2min (with real code modification) for specific analysis.

Sample:

According to the sample data, we can see several information:

  1. After adding FarSeer Optimizer, our shell project (top Module) has been reasonably optimized away from javac tasks for R files.
  2. Not many modules participate in incremental compilation, and their BuildCache status and execution time are as expected.
  3. There are several obvious time-consuming tasks. In addition to Kapt, which is slow, time-consuming tasks are particularly prominent in mergeExtDex, pangaTransform and manisTransform.

At the same time, we also observed some online data samples during the same period, plus some local test data can basically maintain the same results, so these tasks are our key targets, among which there must be evil.

3. Offline problem analysis

After we have the initial online data basis, we need to deal with the problem one by one. From the previous problem analysis, we can see that the problems are all around the transform process, whether it is some bytecode operations we have customized or Gradle’s native DexMergingTask.

Let’s briefly introduce the mechanism of transform:

Android Gradle Transform Incremental Build optimization

A Transform is a process that processes build artifacts. A complete APK build has multiple transforms that are executed sequentially. The output of the former Transform is the input of the latter Transform.

So how is the Transform increment implemented? The smallest execution unit in Gradle is task. A Transform becomes a separate TransformTask when registered in the build process. Incremental building for Gradle Task actually detects changes in Input and output files. When the changed file is detected in the Input, Gradle places it in a Changelist. If the output of the task is deleted, the full output of the task is triggered. After gradle configuration is complete, all the inputs and outputs of tasks are planned. If taskA depends on taskB, the output of taskB will become taskA’s input, and the increment of Transform is also based on this mechanism.

With a basic knowledge of Transform in mind, there are a few things that are clear:

  1. Gradle’s original Transform task will execute last. Gradle’s original Transform task will execute last. Each segment will directly affect the next Transform Task.
  2. Transform will declare whether the task supports incremental compilation when declaring registration, and if one of the rings is broken, the efficiency of the whole chain will be affected.
  3. Transform everyone at the time of writing can be found, to obtain the input source is very simple, is transformInvocation. The inputs to a Collection (real implementation ImmutableList) as object, traversal can get all the input source. Intuitive, simple enough but not robust enough.

So we according to the above points for confirmation can be found some hidden problems and explore truth in affecting the compilation speed, it is just below the mentioned three problems for the different direction, feeling has a certain universality, hope to give you some inspiration, to be on screen optimization hands when the project is to provide some ideas.

3.1 Panga is not implemented for incremental compilation and repetitive function

What is Panga? According to the offline survey, Panga is a DI framework introduced by feishu small program, here we did not find the source code directly pick the JAR on Maven, found that panga each stable run 10 seconds is the reason for its failure to achieve incremental compilation processing, on the basis of its low execution efficiency, This in turn affects both the full execution and the subsequent registered execution of the transform.

Normally, we would optimize the logic of such tasks to achieve reasonable incremental processing, but here we found another point that could facilitate optimization: the introduction of the transform framework with repetitive effects.

In flying book, we found that there are as many as three frameworks similar to DI and SPI, and their functions can basically meet each other. Finally, we adopted a unified plan to remove Panga and ServiceManager. After all, the redundant implementation will basically incur 10s+ time overhead per Transform task of a large project during full compilation.

A further idea and concept advocated here is to converge our self-developed and controllable transform as much as possible to avoid repeating too much IO overhead.

Use the ByteX framework to converge and speed up transform compilation in Android projects

Github.com/bytedance/B…

3.2 Is the implementation of the increment sufficient to meet the standard – Manis

Manis is a set of IPC framework basic flow logic for collecting all Manis annotated implementation classes at the bytecode stage and collecting their specific execution process information. Finally, the collected information data is inserted into the mapping table inside the Manis framework. Provides service process lookup and mapping in the Runtime phase.

Similarly, here we find two sample data for comparison:

Sample A Full volume:

Increment of sample B:

Here, the time of full transform is 17.259s, and the time of increment is 6.542s. In the same period, the time of full and increment also fluctuates at this level. Here, the time of increment reduction is basically caused by the IO reduction of unchanged Inputs. At the same time, for example, ServiceManager, Claymore and other plug-ins, the incremental data level of the basic fluctuation of 1s, there are obvious performance differences.

By analyzing the implementation principle, we found that although incremental processing of INTPUS related IO behavior, the collection of annotations and the query of Manis target Jar are performed in full traversal every time, which involves loading and analyzing all Class files in JarInput. It takes about 5 to 7 seconds whether the increment is constant or not.

Therefore, we resorted to an unspoken rule for the execution of transform tasks, that is, to keep its own output order constant to avoid the impact on subsequent tasks.

After testing to ensure the stability of the output of the preceding tasks, we cache the annotations collected in full and the file path of the target Jar injection, and process only the annotations collected in change inputs for increment, thus improving the time optimization for increment. After optimization, we have also been verified by local testing and online sampling:

Demonstration Sample A:

Sample B:

Finally, its effectiveness was confirmed, and the incremental cost was stably reduced by about 7s, from 7-9s to 0.5-1.5s.

3.3 Using unsafe multithreading in tasks – VCXP

We found that after the above optimization operation, the time of mergeExtDex mentioned above would still cause serious cache hit problem irregularly, leading to task rerun. After continuous offline testing and confirmation, we found that the final influencing factor was VCXP, an IPC + DI framework introduced by the video conferencing business of flying books.

Sample questions:

Through the analysis, we found that there was no obvious problem with the execution logic of the framework itself. The time consuming of the task itself was not serious and excellent, and the increment could fluctuate within the range of about 1s. Intuitively, there was no big problem. The inputs write operation in this framework is multithreaded.

Is it true that a transform can only operate with a single thread and can’t be executed more efficiently with multiple threads? No, the problem lies in the design of the Transform itself.

File dest = outputProvider.

    getContentLocation(jarInput.getFile().getAbsolutePath(),

        jarInput.getContentTypes(),

        jarInput.getScopes(), Format.JAR);
Copy the code

The above code is a familiar piece of logic that is used to get a File object from the target JAR output through the outputProvider and the input jarInput object. This generated path was the familiar xxx.jar in the build-intermediate-transforms -xxx directory, and by following through with the implementation we found that the generation of 0.jar, 1.jar, and xxx.jar really… In VCXP, the output Dest JAR file must be output in a subthread, and the output dest jar file must be output in the inputs list for the next transform task. This inevitably leads to uncertainty about the order of output JAR files, leading to failure of cache strategies in subsequent tasks.

We can still use multiple threads to optimize our IO execution logic by keeping the order of the synchronized single thread calls to destJarFile fixed. Also through online sample analysis, we finally verified our modification results, removing the unexpected time for mergeExtDex during incremental compilation, and hitting buildCache correctly.

Demonstration Sample A:

Iv. Optimization effect and summary

By the end of the month, we had completed this phase of incremental compilation optimisation after thorough troubleshooting of existing issues. From the initial average increment time of 2m, the fluctuation was basically optimized to 1.1m fluctuation level, and even in good condition, it could reach the level of less than one minute in PCT50 fraction. It can be seen that the effect of the existence of pits on the efficiency of compilation is extremely serious.

Through this correction is also roughly summed up the following directions, but also for your reference:

  1. Since the design of transform is not robust enough, extra care and attention should be paid to the introduction of the Transform Plugin, which should not be easily admitted and should be fully tested and verified.
  2. Transform Plugin for similar functions is best able to promote convergence of business parties, so as to avoid too many wheels in the project, which will bring unnecessary compilation, run-time performance overhead and inconsistency in the architecture of basic components.
  3. It is necessary to have a clear judgment basis and standard for transform, what is the level of full and incremental, dare to question time-consuming tasks, and the impact of a good incremental implementation time should be very low.
  4. Make reasonable use of infrastructure tools, observe and study the compilation data of online market more frequently, and find problems as soon as possible to avoid unnecessary waste of R&D efficiency.

Join us

Feishu – Bytedance’s enterprise collaboration platform is a one-stop enterprise communication and collaboration platform integrating video conferencing, online documents, mobile office and collaboration software. At present, feishu business is developing rapidly, and we have R&D centers in Beijing, Shenzhen and other cities. There are enough HC in front-end, mobile terminal, Rust, server, testing and product positions. We are looking forward to your joining us and doing challenging things together (please link: future.feishu.cn/recruit).

We also welcome the students of flying book to communicate with each other on technical issues. If you are interested, please click on flying book technology exchange group to communicate with each other.