The body of the article was published in the February 2018 issue of Programmer magazine, with some changes.

Introduction of ProGuard

ProGuard is an excellent open source optimization and obfuscation tool for Java and Android applications released in 2002 by Belgian programmer Eric Lafortune. The goal is to make programs smaller, faster and monopolize the Java world. It is mainly divided into four modules: Shrinker, Optimizer, Obfuscator and Retrace.

  • Shrinker uses a reference marking algorithm to remove unused code.
  • Optimizer optimizes bytecode with complex Partial Evaluation &Peephole Optimization algorithms, which we won’t cover in detail. Code optimization changes the structure of some code blocks. A few examples: — a non-static method is not used internallythisWithout inheritance, the method can be changed to static. When a method (not very long) is called only once, the method can be inlined. Method parameter not used, this parameter can be removed. Local variable reallocation, such as initializing a variable outside of if, but only inside of if, can move the variable inside of if.
  • Obfuscator uses an Obfuscator to create meaningless names for a, B, and C to replace normal names, making it harder to reverse.
  • The bytecode stack running after Retrace has been processed by ProGuard is not the same as that running without ProGuard. In addition to the name changes, there are also logical changes. After a crash, developers need to use Retrace to restore the error stack to the non-ProGuard state.

background

When we implement plug-in and hot patch repair, in order to make the plug-in and patch compatible with the original host, we must rely on applymapping function of ProGuard for incremental obfuscation. However, when applymapping of ProGuard is used, some methods will be confused. ProGuard logs contain these Warning messages:… is not being kept as … , but remapped to … In view of this problem, we have conducted in-depth research and found a solution. This paper will introduce the cause of this problem and the solution one by one.

The phenomenon of

The following is a warning message output by ProGuard after using -applymapping. We also find that the names of these methods in the obfuscation results obtained by using -applymapping are inconsistent with the names of the original host obfuscation results. The -applymapping result is incompatible with the host.

Printing mapping to [.../mapping.txt]... . Warning: com.bumptech.glide.load.resource.gif.GifFrameLoader: method 'void stop()' is not being kept as 'b', but remapped to 'c' Warning: there were 6 kept classes and class members that were remapped anyway. You should adapt your configuration or edit the mapping file. (http://proguard.sourceforge.net/manual/troubleshooting.html#mappingconflict1) ... Warning: com.bumptech.glide.load.resource.gif.GifFrameLoader: method 'void stop()' can't be mapped to 'c' because it would conflict with method 'clear', which is already being mapped to 'c' Warning: there were 2 conflicting class member name mappings.Copy the code
Mapping changes before and after Applymaping
@@-1491,7 +1491,7 @@bitmaprequestbuilder -> com.. glide.a:- 264:265:BitmapRequestBuilder transform(cBitmapTransformation[]) -> a
+ 264:265:BitmapRequestBuilder transform(BitmapTransformation[]) -> b@ @ + 3532-3532, 7, 7 @ @ GifFrameLoader - > com. Bumptech. Glide. Load. R- 77:78:void stop() -> b
+ 77:78:void stop() -> c_

Copy the code
Initial confusion Incremental confusion
transform->a transform->b
stop->b stop->c_

The stop method exists as a common method in the host, and submodules depend on the Stop method in the host. After the submodule is upgraded, it still depends on the host’s interface, public methods, to ensure that the stop method is consistent before and after the submodule is upgraded. Stop is mapped from B to c_ for incremental compilation using -applymapping. The upgrade submodule cannot be upgraded because the stop method on which the upgrade submodule depends is incompatible.

Learn about Mapping

Mapping.txt is the code obfuscation phase output.

The purpose of the mapping
  1. Retrace uses the Mapping file and StackTrace to restore the stack before ProGuard.
  2. use-applymappingWork with the Mapping file for incremental obfuscation.
The composition of the mapping

The original name and new name are delimited by ->.

  1. Class mapping, characteristic: Mapping to:The end.
  2. Field mapping, characteristics: Not in the map(a).
  3. Method mapping, characteristics: there are in the mapping(a)And the one on the left has two numbers representing the range of line numbers in the method body.
  4. Inline, characteristic: Compared to method mapping, there are two more ranges of line numbers, with the line number on the right representing the original line of code and the new line number on the left.
  5. Closure, characteristic: Only three line numbers, which appear in pairs with inline.
  6. Note, characteristic: to#The beginning is usually not present in mapping.
A and-applymappingError related to mapping
GifFrameLoader -> g:
    com.bumptech.glide.load.resource.gif.GifFrameLoader$FrameCallback callback -> a
    60:64:void setFrameTransformation(com.bumptech.glide.load.Transformation) -> a
    67:74:void start(a) -> a
    77:78:void stop(a) -> b
    81:88:void clear(a) -> c
    2077:2078:void stop(a):77:78 -> c
    2077:2078:void clear(a):81 -> c
    91:91:android.graphics.Bitmap getCurrentFrame(a) -> d
    95:106:void loadNextFrame(a) -> e
Copy the code

GifFrameLoader maps to g. In the code, there is only one new mapping name for each class, class member, where stop appears twice as a different mapping. Why are there two different mappings? Do these two different mappings have an impact on the incremental obfuscation?

ProGuard document did not give specific reasons for this problem and reliable solutions, in – applymapping section mentioned that if the structural changes in the code may output above warning, it is recommended to use – useuniqueclassmembernames parameters to reduce the risk of conflict, This parameter does not solve the problem.

To solve this problem, we decided to take a look at the ProGuard source code to see why this problem occurs and how to fix it.

Look for the answer in the source code

Let’s look at how ProGuard represents a method:

ProGuard inputs to classes are classified into ProgramClass and LibraryClass. The former contains third-party SDKS that we write code for, while the latter are usually system libraries that don’t need to be compiled into programs, such as the referenced Android.jar and rt.jar. ProgramMember is an abstract class, with ProgramField and ProgramMethod subclasses, representing fields and methods respectively. The abstract class has an Object visitorInfo member inside, which stores the confused name.

Code confusion

Code obfuscation can be thought of as a process of renaming classes, methods, fields, and incremental obfuscation with the -applymapping parameter. The process of using -applymapping can be divided into three parts: mapping multiplexing, name confusion, and name conflict handling.

The simplified process looks like this (the two large dotted boxes on the left and right represent two treatments for a single class, name obfuscation and conflict handling) :

-applymapping

1. MappingKeeper

It reuses the mapping mapping and restores ProgramMember’s visitorInfo to its confused state.

  • For a new method, visitorInfo is null.
  • If more than one mapping exists for a method, the new mapping overwrites the old mapping and prints a warningWarning: ... is not being kept as ... , but remapped to.
public void processMethodMapping(String className,
                                 int    firstLineNumber,
                                 int    lastLineNumber,
                                ...
                                 int    newFirstLineNumber,
                                 int    newLastLineNumber,
                                 String newMethodName)
{
    if(clazz ! =null && className.equals(newClassName))
    {
        String descriptor = ClassUtil.internalMethodDescriptor(methodReturnType,ListUtil.commaSeparatedList(methodArguments));
        Method method = clazz.findMethod(methodName, descriptor);
        if(method ! =null)
        {
            // Print out a warning if the mapping conflicts with a name that
            // was set before.
            // Make sure the mapping name will be kept.MemberObfuscator.setFixedNewMemberName(method, newMethodName); }}}Copy the code
2. Confusion processing

The obfuscation is based on class and can be divided into two parts. The first part is to collect mappings and the second part is to obfuscation names. Determines if a mapping exists and assigns a new name if not. The first part: map name collect MemberNameCollector collect ProgramMember’s visitorInfo and put the methods or fields of the same descriptor into the same map< confused name, original name >.

        String newName = MemberObfuscator.newMemberName(member);/ / get visitorInfo
        if(newName ! =null)
        {
            String descriptor = member.getDescriptor(clazz);
            Map nameMap = MemberObfuscator.retrieveNameMap(descriptorMap, descriptor);
            String otherName = (String)nameMap.get(newName);
            if (otherName == null                              ||
                MemberObfuscator.hasFixedNewMemberName(member) ||
                name.compareTo(otherName) < 0) { nameMap.put(newName, name); }}Copy the code

If visitorInfo appears with the same name, the key-value pairs in the map will be overwritten by subsequent methods (in the order in which they appeared in Class), which may result in an incorrect mapping overwriting the correct mapping.

Part two: Name confusion

If visitorInfo is null, assign a new name to the member. The map collected in the first part ensures that the new name generated by NameFactory does not conflict with the existing name. NextName () has a counter that increals every time a new name is created, which is why a, B, and C are present. This step only ensures that there is no conflict between a map that appears in the map and a newly created map.

        Map nameMap = retrieveNameMap(descriptorMap, descriptor);
        String newName = newMemberName(member);
        if (newName == null)
        {  nameFactory.reset();
            do{newName = nameFactory.nextName(); }while (nameMap.containsKey(newName));
            nameMap.put(newName, name);
            setNewMemberName(member, newName);
        }
Copy the code
3. Confuse the handling of name conflicts

The first step in the obliquation process is to collect ProgramMember’s visitorInfo, and the map is in the same state as the obliquation process.

Conflict judgment code:

        Map nameMap = MemberObfuscator.retrieveNameMap(descriptorMap, descriptor);
        String newName = MemberObfuscator.newMemberName(member);
        String previousName = (String)nameMap.get(newName);
        if(previousName ! =null&&! name.equals(previousName)) { MemberObfuscator.setNewMemberName(member,null);
            member.accept(clazz, memberObfuscator);
        }
Copy the code

Take out the visitorInfo in the current ProgramMethod and use this visitorInfo as the key to fetch the value in the map. If the value is different from the current ProgramMethod, it means that the value overwrites the ProgramMethod map. Set visitorInfo to NULL. Then call MemberObfuscator again to generate a new name for ProgramMethod. NameFactory suffixes the new name with an _, which can cause some method to confuse the underscore.

4. Final code output

After code optimization, the bytecode will not be modified. The above is mainly for class and class member name mapping relationship allocation and mapping conflict processing. When the conflict is resolved, it will output mapping. TXT, modify bytecode, reference repair and generate output.jar.

5. Generation of the Mapping

In the process of mapping, in addition to generating the mapping relationship of classes, methods and fields, the inline information of methods is also recorded.

    2077:2078:void stop(a):77:78 -> c
    2077:2078:void clear(a):81 -> c
Copy the code

The first line represents the offset from the range of code on the right to the range on the left (lines 2077-2087 in method C are from the stop method), and the second line represents the final position of the offset code (the method call on line 81 is modified to line 2077-2078). These two rows are not normal mappings.

Code optimization

Earlier we mentioned that there is an inline message in the mapping. Now let’s see why there is an inline message in the mapping that seems to have nothing to do with confusion. As mentioned above, there is an inline message in the Mapping. The reason why there is an inline message in the Mapping is that javac does not do much code optimization during compilation, only some simple optimizations. For example string chaining str1+str2+str3 is optimized to StringBuilder, reducing object allocation.

When a large amount of imported code, libraries, and some obsolete code still remain in the repository, these redundant codes occupy a large amount of disk, network, and memory. ProGuard code optimizations address these issues by removing unused code, optimization instructions, logic, and local variable allocation and inlining within methods, making applications run faster, use less disk, and use less memory. Inlining: Calls to inlined methods are expanded during compilation, reducing method calls and consuming less CPU. ProGuard inline methods inline methods are not inline in Java.

inline

In the process of code optimization, some methods are inlined (Copy the inlined method body content to the caller to call the inlined method, which is a process of code expansion), and the code structure of the caller is modified. Therefore, the side effects caused by the Copy of the inlined method to the caller need to be considered. When code from the Copy crashes, The Java StackTrace does not represent the true crash stack and method call relationship, which is influenced by the caller’s own code and the code from the inline Copy. There are two main types of inlining: unique method, which is called only once, and short method, which is called multiple times, but whose code_length is less than 8 (not the number of lines of code). Methods that satisfy both are likely to be inline.

Take clear calling Stop as an example, as shown in the following figure:

81-88: void the clear () - > 2077 c: 2078: void stop () : 77-78 - > c / / stop method in 77-78 lines of copy to the c offset of 2077-2078, 2077:2078: void the clear () : 81 - > C //2077-2078 Insert line 81 in c, c is the clear methodCopy the code

When a crash occurs on line 81, stop is called on line 81, clear is called on line 81, clear is called on line 81, clear is called on line 77-78.

The rules for line numbers are simplified as follows: (Number of lines of code to be inlined + 1000/1000) x1000x Number of inlining occurrences +offset, where offset is the start line number to be inlined. The minimum line number of the Copy code is 1000+ the start line number. If the number of lines is greater than 1K, the value is rounded and + the start line number.

Do methods still exist for being inlined?

This may or may not exist, but if it does, mapping will show the method mapping. This method does not exist if no other method is called after being inlined, but it usually does if the method is inherited (a subclass inherits its parent class).

Here’s how it works

These modules are not unrelated, so let’s string together the whole process.

1. Initialization

ProGuard initialization will read our configured proGuard-rule-txt and various input classes and dependent libraries. The input classes are managed by ClassPool. Our rule-txt is configured with keep class conditions. This list is called seeds.txt. All subsequent operations (obfuscation, compression, optimization) are benchmarked by seeds. Code that is not referenced by seeds can be removed.

2. shrink

If a class is not being used and its members are supported to be removed from the ClassPool, the first call to shrink will result in usage. TXT recording the removed classes, methods, and fields.

3. optimize

Code optimization is more complicated, this part of the class optimization, including optimization logic, variable allocation, dead code removal, removal of useless parameters in the method, optimization instructions, and method inlining, we know that inlining occurs code Copy, the copied code will not be called by the current method. Shrink is repeated once the code is optimized. Inlined methods may not have references and thus be removed, but if inlined methods inherit, this is preserved.

4. obfuscate

Obfuscation allocates names for classes and class members, deals with conflicting names, outputs the mapping file, and then outputs an optimized and obfuscated JAR. Incremental compilation using ‘-applymapping’ will fetch the mapping from the mapping. If the mapping cannot be found, the method and field will be assigned new names. The mapping file records two types of information: the first is plain mapping, and the second is inline (which is part of optimize and not directly related to obtrussion), and both are needed for retrace, but only for incremental obtrussion.

Go back to the Mapping file again

What error occurred when MappingKeeper read mapping?

When performing confusion, MappingKeeper will assign the mapping relationship existing in the mapping to the visitorInfo of ProgramMethod, but it does not distinguish between normal mapping and inlining. Although the stop method is initially correctly assigned b, the inlining method is then incorrectly assigned C. In this case, the clear visitorInfo is also c.

Enter conflict resolution

Set the visitorInfo of stop to NULL, and then divide stop into a name with an underscore.

Given that the descriptor for clear is not void and is confused with f, the map state is shown as follows:

stop()->f

Sample

This Bug is difficult to reproduce in some projects, or the projects that can reproduce this Bug are too complex. We wrote a Sample that can trigger this Bug. /gradlew assembleDebug generates a mapping file and copies the mapping to the app directory. Go to Proguard Rule and open -applymapping and compile again. Warning:… is not being kept as … , but remapped to … .

Some common questions about ProGuard

In addition to the incremental obfuscation method mappings mentioned in this article, developers can also encounter the following situations:

  1. Reflection, for example Class clazz= class.forname (” XXXX “); clazz.getMethod(“method_name”).invoke(…) With XXXX. Class. GetMethod (” method_name “). The invoke (…). The former method may not be found, so you need to keep the method reflected in the rule.

  2. Confusion of rules can cause configuration errors such as -optimizations! Code /** method/** is only allowed to use the positive or negative rule. The number is the negation rule.

  3. In versions prior to 6.0, there are a large number of single-thread operations, and the whole process is time-consuming. If the -Optimizationpasses parameter can be changed to 1, only one code optimization can be performed, and the subsequent code optimization will bring little improvement.

conclusion

This paper mainly introduces the basic principle of Java optimization & obxidation tool ProGuard, the relationship and influence between several modules of ProGuard, as well as the Bug of partial method mapping disorder encountered by incremental obxidation using -applymapping, the cause of the Bug and the solution to fix it. The compiler theory involved in code optimization is abstract and the implementation is complex. Due to space constraints, we only introduce the impact of code optimization on the whole process. Readers who are interested in code optimization can consult the compiler related books.

Author’s brief introduction

Ting Li is a technical expert at Meituan Dianping. He joined Meituan in 2014. Responsible for several business projects and technical projects, committed to promoting the application of AOP and bytecode technology in Meituan. Independently responsible for the App preinstallation project of Meituan and promoted the automation of preinstallation. Led the design and development of Meituan plug-in framework, and the focus of my current work is preaching and promotion of Meituan plug-in framework.

Wei Xia is a senior engineer at Meituan Dianping. He joined Meituan in 2017. At present, I am engaged in the plug-in development of Meituan and the optimization of some underlying tools of Meituan platform, such as AAPT and ProGuard. I focus on Hook technology and reverse research and am used to finding solutions from source code.

Client technology team of Meituan platform, responsible for the development of basic business and mobile infrastructure of Meituan platform. Meituan-dianping’s rapid development has been supported by its platform based on a large number of users. At the same time, we have also made some active exploration in mobile development technology, and have accumulated certain accumulation in dynamic, quality assurance, development model and other aspects. While the client technology team is actively adopting open source technology, we also contribute some of our accumulation to the open source community, hoping to promote the efficiency and quality of mobile development together with the industry.

If you are interested in our team, you can follow uscolumn.