Over the past six months, the Android hot patch craze has continued, with companies launching their own open source frameworks. Tinker recently completed the company’s review and is proud to be the first project officially unveiled on github.com/Tencent.

Reviewing the course of more than half a year, this is a kneel to finish, pit pit endless road. Perhaps only their own real experience, in-depth study, will really understand

Hot patch is not a treat

This is true for the hot patch technology itself, as well as for the users. I hope that by sharing wechat’s thinking and experience in this process, we can make it easier for everyone to decide whether to use hot patch technology in their own projects and what kind of solution to choose.

Hot patch technical background

What is hot patch and its application scenarios are introduced, you can refer to the article wechat Android hot patch practice evolution road.

In my opinion, Android hot patch technology should be divided into the following two schools:

  • Native, representing Ali’s Dexposed, AndFix and Tencent’s internal program KKFix;

  • Java, represented by Qzone’s super patch, Dianping’s Nuwa, Baidu Finance’s rocooFix, Ele. me’s Amigo and Meituan’s Robust.

Both Native and Java genres have their own pros and cons. See above for details of their differences. In fact, there is no best plan, only the most suitable for their own.

For wechat, we hope to get a “highly available” patch framework, which should meet the following conditions:

  1. Stability and compatibility; Wechat needs to run on hundreds of millions of devices, and even a 1 percent exception from the patch framework will affect tens of thousands of users.

  2. Performance; The patching framework should not affect the performance of the application, based on the fact that users will not use patches in most cases. Secondly, patch packages should be as few as possible, which is related to user traffic and patch success rate.

  3. Ease of use. We also want the patch framework to be easy to use and fully supported, even at the feature release level.

Under the premise of “high availability”, wechat did a lot of research on the two existing schemes at that time:

  1. Dexposed/AndFix; The biggest challenges are stability and compatibility, and troubleshooting native exceptions is more difficult. On the other hand, the release level cannot be achieved due to the inability to add variables and classes;

  2. Qzone. The biggest challenge lies in performance, that is, Dalvik platform has performance loss caused by piling, and Art platform may have too large patch package due to address offset.

In March 2016, iN order to pursue the goal of “high availability”, wechat decided to try to build its own patch framework — Tinker.

The evolution of The Tinker framework is not accomplished overnight. It is roughly divided into three stages, each with a different core problem to solve. The core problem of Tinker V1.0 is to implement a Dex patch framework that meets performance requirements.

Tinker V1.0 — The road to extreme performance

For stability and compatibility, wechat chose the Java genre. At present, the biggest difficulty lies in how to break through the performance problem of Qzone scheme, which gives us inspiration by studying the cold plug of Instant Run and The Exopackage of Buck. The idea is to completely replace the Dex.

To put it simply, we used the new Dex completely, so that there was no Art address confusion and no piling in Dalvik. Of course, given the size of the patch pack, we couldn’t just put the new Dex in it. However, we can put the differences between the old and new Dex into the patch package. Here, we can investigate the following methods:



  1. BsDiff; It is format independent, but does not work particularly well with Dex, and the resulting product size is very unstable. Currently, wechat still uses BSDIff algorithm for SO and some resources.

  2. DexMerge; The main problem is that it occupies too much memory during synthesis. For a dex of 12M, the peak memory may reach more than 70 M.

  3. DexDiff; Through the in-depth Dex format, a set of algorithms with small product generation, less memory consumption and support for addition, deletion and modification are realized.

How to choose? In the core appeal of “high availability”, performance issues are also particularly important. We are very glad that wechat resolutely chose the self-developed DexDiff algorithm at that time. Although the process was painful and tear-wrenching, it is because of it that Tinker now exists.

DexDiff technology practice

After continuous in-depth study of Dex format, we found ourselves jumping into a deep hole, with the following three main difficulties:

  1. Dex format is complex; Dex is roughly divided into Index areas such as StringID and TypeID and Data areas using Offset. They have a lot of references to each other, and a small change can cause a lot of Index and Offset changes;

  2. Dex2opt and Dex2OAT were verified; In both cases, the system does checks such as four-byte alignment, partial element sorting, etc. For example, StringID sorts by Unicode, TypeID sorts by StringID…

  3. Low memory, fast; This requires us to read and write each piece of Dex once, which is not completely structured like baksmali and DexMerge.



Now that I think about it, it really was a road to walk on my knees. Consistent with the study of Dalvik and Art execution, this is the result of looking through source code again and again, compiling Rom to view logs again and again, and dumping memory structure again and again.

Here is an example of the simplest Index field:



In order to change the sequence from left to right, the core of Diff algorithm lies in how to generate the minimum operation sequence and modify Index and Offset at the same time to realize the function of adding, deleting and changing.

  1. Del 2;” The “b” element was removed, and its corresponding Index was 2. In order to reduce the size of the patch package, only Index was stored except for the new element.

  2. “C “,” D “, “e” elements automatically move forward without operation;

  3. Addf(5); Add the element “f” in the fifth position.

For Offset, this is more complicated because each Section can have many elements. Finally, we get the final operation queue. Why is DexDiff able to do this with very little memory? This is because the DexDiff algorithm deals with each operation and does not need to read all the data at once. The data of DexDiff are as follows:



Through the implementation of DexDiff algorithm, we not only solved the performance loss of Dalvik platform, but also solved the problem of large patch package of Art platform. However, the disadvantage of this scheme is that the Rom volume is relatively large. Considering that the storage space of mobile devices is promoted relatively quickly, wechat can accept the cost of increasing the Rom space by dozens of meters.

The challenges of Android N

After the launch, I was full of confidence, but soon received a Crash from Huawei:



And the Crash only appeared on Android N, which was very shocking to us at that time. Did Android N not support Java hot patches? Have these two months of hard work been for nothing? All imagination is pale, only to continue to look for reasons inside the source code.

On the basis of the previous, this piece of research did not spend too much time, mainly due to the mixed compilation mode of Android N. For more detailed analysis, see the article Android N Hybrid compilation and The Impact of hot patches.

Challenges of OTA

Having just solved the Android N problem, I’m still basking in my own triumph. The front line soon came bad news, Xiaomi feedback that some users of the development version of wechat when the launch of black screen, or even ANR.

At that time, the first reaction was impossible. All DexOpt operations were put into a separate process. Why only appeared on the Art platform? Why mi development version user feedback is more? After analysis, we found that the optimized Odex file has validity check:

  • Dalvik platform: Modtime/CRC…

  • Art platform: checksum/image_checksum/image_offset…

This is very easy to understand, because after OTA, the system image has changed, and the offset address used by image in odex file is probably wrong. For classn. dex files, re-dex2OAT has been completed in OTA upgrade systems, while patches are dynamically loaded and can only be executed synchronously at first execution.

This can take up to ten seconds, black screen and even ANR is very easy to understand. So why is it that only Xiaomi users give more feedback? This is also because the development version of Xiaomi will push the system upgrade every week.

At that point, we re-examined the idea of full synthesis and again doubted the principle of the scheme itself, which brought the following costs on the Art platform:

  1. Black screen after OTA; Loading interface may be possible, but it is not a good solution.

  2. Rom volume problem; For a Dex of 10M, the oDEX product in Dalvik is only about 11M, but on Art platform, it can reach more than 30m.

  3. Android N issues; Android N struggled with mixed compilation, but was abandoned by the patch-full synthesis mechanism. This is because dynamically loaded Dex is still fully compiled.

In retrospect, the Qzone scheme only packs the required classes into patches and pushes them, which may result in large patches on the Art platform, but it is certainly much, much less than the fully synthesized Dex. Here, we put forward the idea of platform synthesis, that is, synthesis of full Dex on Dalvik platform and synthesis of small Dex required on Art platform.

DexDiff algorithm is already very complex, in fact, it is even more difficult to achieve sub-platform synthesis.



The main difficulties are as follows:

  • Small Dex class collection; What classes should be in this little Dex?

  • ClassN processing; What about ClassN, where classes might move from one Dex to another?

  • Offset secondary correction; How can the sequence of operations in the patch pack be refixed?

  • Size of art.info; To correct the size of the info file introduced by the offset?

Fortunately, we were not afraid in the face of difficulties, and finally realized this set of solutions, which other total synthesis solutions could not do:

  1. Dalvik full synthesis solves the performance loss caused by pile insertion;

  2. Small DEX was synthesized on Art platform to solve the problems of large Rom volume, OTA upgrade and Android N.

  3. In most cases, art.info is only 1-20K, which solves problems due to patch packs that may be too large;

In fact, the DexDiff algorithm has become so complex that how can it be guaranteed to be correct? Wechat does the following three things:

  1. Dex check is randomly formed to cover most cases;

  2. Random Diff check for 200 versions of wechat, covering daily use;

  3. Dex file synthesis product validity check, even if there is a problem with the algorithm, but also can not compile the patch package.

Each update of the DexDiff algorithm needs to pass the above three tests before it can be submitted, so the whole closed loop of the DexDiff algorithm has been completed.

Other technical challenges

During the implementation process, we also found some other problems:

1. Xposed and other wechat plug-in; There are a variety of wechat plug-ins on the market, which will load the classes in wechat in advance before wechat starts, which will cause two problems:

  1. Dalvik platform: a crash with Class ref in pre-verified Class Resolved to unexpected implementation occurs;

  2. Art platform: some classes use old code, which may result in invalid patches or incorrect addresses.

Wechat processing here is if crash found installed Xposed, that is, remove and no longer apply the patch.

2. The Dex reflection succeeds but does not take effect. In some Samsung Android19 versions, Dex reflection succeeds, but when class repetition occurs, the search sequence always starts from base.apk.

Wechat handles this by adding the Dex reflection success check, specifically by embedding the isPatch variable of a class in the framework as false. At patch time, we automatically changed this variable to true. The final value of this variable tells us whether the reflection was successful or not.

Tinker v1.0 summary

I. About performance

Through the efforts of Tinker V1, 0, we solved the performance problems of Qzone solution and obtained a patch framework that meets the performance requirements of “high availability”.

  • The patch pack size is very small, usually under 10K;

  • There is almost no impact on performance, 2% of the impact on performance is mainly caused by the MD5 of the Dex file during wechat runtime verification (although the file is in the /data/data/ directory, wechat for a higher level of security);

  • Art platform not only solves the problem of address offset, but also occupies the same Rom volume as Qzone solution through revolutionary platform composition.



Two. On the success rate

Maybe some people will question why wechat success rate is so low, other programs are more than 99%. In fact, our success rate is calculated as follows:

Application success rate = Number of patch versions converted/Number of base versions installed

That is, after three days, 94.1% of the base version was successfully upgraded to the patch version. Since the number of base version is also increasing continuously, and there may be users of base version or patch version who installed other versions, this statistical result should be slightly low, but it can realistically reflect the overall online coverage of the patch.

In fact, with Qzone, the 3-day success rate was about 96.3%, so there was still a lot of room for improvement.

Tinker V2.0 – The road to stability

In V1.0, most of the exceptions came through vendor feedback, and Tinker did not address the core stability and compatibility issues of “high availability”.

We need to establish a complete monitoring and patch rollback mechanism to monitor anomalies at each stage. This is also the core mission of Tinker V2.0, which will be covered in the next article due to margin issues.

Follow Tinker and give us star on Github

Github.com/Tencent/tin…

To view the Tinker project source code, click “Read the original article”.