1.1 What is hot Repair
For the vast majority of mobile developers, release updates are the most common thing. However, if you find an urgent BUG that needs to be fixed in a package you just sent out, you must go through the following process:
This is the traditional update process, the steps are very tedious. In general, the traditional process has the following disadvantages:
- It’s too expensive to re-release
- The cost of download and installation is too high
- BUG fixes are not timely and user experience is poor
Accordingly, many developers have found a suitable solution.
- Hybrid scheme. That is, business logic that needs to change frequently is isolated in H5. This solution requires traditional Java developers to learn the front-end language, which not only increases the learning cost, but also requires appropriate abstraction and transformation of the original logic. Also, code that cannot be converted to H5 form is still unfixable.
- Use a plug-in solution to solve the problem, such as the Atlas or DroidPlugin solution. This kind of way, the cost of transplantation is very high, but also to learn the whole set of plug-in chemical tools, the transformation of the original old code.
So thermal repair technology came into being.
1.2 Technical precipitation
Ali:
- Dexposed: Based on Xposed improvement, for Android Dalvik virtual machine running Java Method Hook technology, but not compatible with Android5.0 after the virtual machine
- Andfix: Is also a low-level replacement solution, which is compatible with Dalvik and ART
- Hotfix: Based on the experience of using Andfix in practical projects, ali Baichuan Hotfix was launched, but it only provided fixes at the code level, and the fixes for resources and SO have not been realized yet
- Sophix: Launched in June 2017, Sophix broke the fray and achieved industry leadership in code repair, resource repair and SO repair
Other famous hot fixes, but each has its own limitations, too large patches, inefficient, unstable, cumbersome to use:
- Tencent QQ space super patch
- WeChat Tinker
- Ele. me Amigo
- Meituan Robust
1.3 Detailed Comparison
Comparison of Sophix and Tinker with Amigo:
The only thing that is not supported is the fix of the four major components
1.4 Technical Overview
1.4.1 Design Concept
Sophix was designed to be non-invasive
- The final implementation has just two generated old and new APKs, and the only thing to do is two lines of code to initialize and request a patch
- No intrusion into the APK build process
- Do not change any packaged components
- No AOP code is inserted
1.4.2 Code Fixes
There are two main schemes for code repair, one is the underlying replacement scheme of Ali system, the other is the class loading scheme of Tencent system.
Both schemes have their advantages and disadvantages:
- The underlying replacement scheme has many limitations, but it is the most time-sensitive, easy to load and effective immediately.
- The class loading scheme has poor timeliness and requires cold restart to take effect, but it has a wide repair range and few restrictions.
Underlying replacement scheme
The underlying replacement scheme replaces the original method directly in the loaded class, which is modified on the basis of the original class. Adding and subtracting methods and fields from the original class is not possible because it would break the structure of the original class.
The addition and subtraction of methods in a patched class will result in a change in the number of methods in the class and in the Dex as a whole. A change in method number is accompanied by a change in method index, so that the correct method cannot be properly indexed when the method is accessed. If fields are added or subtracted, the indexes of all fields are changed, just as if methods were changed. A more serious problem is that if a field is suddenly added to a class in the middle of a program run, instances of the class that have already been created will remain in their original structure, which cannot be changed. When new methods use these old instance objects, accessing new fields can produce unexpected results.
This is an inherent limitation of this type of scheme, and the underlying replacement scheme is most criticized for its instability.
By rethinking the underlying principle of code replacement, we overcome its limitations and compatibility, and realize the immediate effect of code hot repair with a more elegant replacement idea.
Adopting a replacement approach that ignores the underlying concrete structure not only solves compatibility problems, but also eliminates the need to distinguish between all Android versions because it ignores the differences in the underlying ArtMethod structure, greatly reducing the amount of code. Even if later Versions of Android constantly modify the ArtMethod members, as long as the ArtMethod array is still arranged in a linear structure, it can be directly applied to the future version of Android 8.0, 9.0 and other new versions, there is no need to adapt to the new system version.
Class loading scheme
The classloading scheme works by having the Classloader load new classes after the app restarts. It’s impossible to uninstall a class on Android because all the classes that need to be changed have already been loaded halfway through the app. If you do not restart the vm, the original class is still in the VIRTUAL machine, and the new class cannot be loaded. Therefore, only on the next reboot, load the new class in the patch before it reaches the business logic so that Resolve will be the new class on subsequent visits. So as to achieve the purpose of thermal repair.
Take a look at the three categories of Tencent load scheme to achieve the principle.
- The Qzone scheme intrudes into the packaging process and adds useless information to the hack, which is not elegant to implement.
- QFix needs to obtain the functions of the underlying virtual machine, which is not stable and reliable, and there is a big problem that it cannot add public functions.
- Wechat Tinker program is a complete full dex loading, and can be said to be the patch synthesis to the extreme. Tinker’s synthesis scheme is a complete synthesis from dex’s method and instruction dimension, and the whole process is developed by himself. Although it can greatly save space, the implementation is complicated due to the fine granularity of dex content comparison, and the performance consumption is serious. In fact, the size of DEX accounts for a relatively low proportion of the entire APK. The size of DEX files in an APP is not the main part, but the resource files occupy a large space. Therefore, the time and space cost conversion of Tinker scheme is not cost-effective.
The best granularity for dex comparison is at the class dimension. It is neither as subtle as the method and instruction dimensions, nor as crude as the BSBIFF comparison. The best balance of time and space can be achieved in the dimension of class. Based on this criterion, a completely different full DEX replacement scheme was implemented.
- Directly use the original Android class lookup and composition mechanism, quickly synthesize the new full dex. In this way, there is no need to deal with the case that the number of methods exceeds during synthesis, and there is no need to do destructive reconstruction of the structure of DEX.
- Reorder the dex in the package. Dex, then classes2.dex, and then classes3.dex, also known as the dex file level class peg scheme. This approach breaks and reorganizes the classes.dex order in the old and patch packages so that the system can recognize the order naturally and achieve the order of class coverage. Greatly reduces the cost of compositing patches.
Shuangjian combination
Since both the low-level replacement scheme and the classloading scheme have their merits, wouldn’t it be best to combine them?
Sophix’s code fix system covers both approaches. The combination of the two schemes can achieve complementary advantages and fully take into account the role of flexible automatic switching according to the actual situation.
In the patch generation phase, the patch tool will automatically select according to the actual code changes.
- For minor modifications that are within the limits of the underlying replacement scheme, should the underlying replacement fix be used directly, so that the code fix can take effect immediately.
- For code modification beyond the limit of underlying replacement, classloading replacement will be used, so that although the timeliness is not so good, but can always achieve the purpose of hot repair.
- In the runtime phase, Sophix will also determine whether the running machine supports hot fixes, so that even if the patch does support hot fixes, because the underlying virtual machine does not support hot fixes, it will still load the fixes to achieve the best compatibility.
1.4.3 Resource Restoration
At present, many resource hotfix solutions in the market basically refer to the implementation of Instant Run. In fact, the launch of Instant Run is the main reason to promote the wave of hot repair. The implementation of various hot repair schemes, such as codes and resources, largely refers to the code of Instant Run, and resource repair schemes are the most widely used.
Briefly, resource hot repair in Instant Run is divided into two steps:
- Construct a new AssetManager and add the completed new bundle to the AssetManager by calling addAssetPath via reflection. The result is an AssetManager with all the new resources.
- Find all previous references to the original AssetManager and, by reflection, replace the reference with AssetManager.
New implementation: construct a resource package with package id 0x66, which contains only the changed resource items, and addAssetPath directly to the original AssetManager. Since the package ID of the patch package is 0x66, it does not conflict with the current loaded 0x7f, so it can be directly used by adding it to the existing AssetManager.
Resources in the patch package only include new resources that are not in the original package but are in the new package, and resources that have changed the original content. In addition, we use a more elegant substitution, destructing and refactoring directly on the original AssetManager object, so that all references to the original AssetManager object remain unchanged, so there is no need for tedious modifications like Instant Run.
It can be said that our resource repair solution is superior to Google’s official Instant Run solution. The advantages of the whole resource replacement scheme are as follows:
- Without modifying AssetManager references, replacement is faster and more complete. (Contrast Instanat Run with all copycat implementations)
- It is not necessary to deliver the full package, as the patch package contains only the changed resources. (Contrast Instanat Runs Amigo and other implementations)
- There is no need to synthesize the complete package at run time. Does not consume run-time computing and memory resources. (Compare Tinker’s implementation)
1.4.4 SO library repair
The repair of SO library is essentially the repair and replacement of native methods.
We use a similar kind of repair reflection injection approach. Insert the patch SO directory at the front of the Nativeli-BraryDirectories array, and it will load the patch SO directory instead of the original so directory.
With this scheme, the SO library in patch is injected completely by Sophix reflection during startup. It’s still transparent to developers. You don’t need to manually replace the system.load of the System as some other solutions do.
1.5 Summary of this chapter
This chapter describes the main application scenarios of the hot repair technology and the changes it brings to the industry. It explains in detail the origin of Alibaba’s hot repair solution Sophix, and compares it with other major mainstream solutions. In addition, it provides an overview of the various aspects involved in hot repair and leads to an overview of the subsequent chapters.
Refer to the article
Chapter 1: Introduction to Hot Fixes
A Deep dive into the Principles of Android Hotfix Technology [Book]