Key words: code confusion, Android application reinforcement, mobile application protection, APP protection, royal security

Are there any good ways to improve the security of native code? The answer is yes, and today we are going to introduce an effective method against native layer code analysis — code obfuscation technology. With the rapid development of mobile Internet, application security problems keep emerging. Therefore, more and more application developers transfer core code from Java layer to Native layer to counter mature Java reverse analysis tools. However, if the code in native layer is not protected, Or it is easier for reverse analysis workers to obtain its operation logic, and then complete application cracking or other operations. So are there any good ways to improve the security of native code? The answer is yes, and today we are going to introduce an effective method against native layer code analysis — code obfuscation technology.

So what is code obfuscation? The academic definition of code obfuscation is as follows:

Code obfuscation is a cation that takes the code of a computer program and converts it into a functional equivalence, which means it has the same or similar functionality before and after the transformation. Its explanation is as follows: the program P is confusingly transformed into P ‘. If P does not end or ends incorrectly, then P ‘cannot end or ends incorrectly. And the result of the P ‘program should have the same output as the program P. Otherwise P prime is not a valid confusion of P. The current classification of confusion is generally based on Collberg’s theory, There are four types of layout obfuscation, data obfuscation, Control obfuscation and Preventive obfuscation.

  1. Layout obfuscation Layout obfuscation is the removal or obfuscation of auxiliary text information irrelevant to execution in software source code or intermediate code, making it more difficult for an attacker to read and understand the code. The annotated text and debugging information in the software source code can be directly deleted, as well as the code or data structure such as the methods and classes that are not needed can be deleted. In this way, the semantics of the code can be difficult for attackers to understand, and the software volume can be reduced to improve the efficiency of software loading and execution. The naming rules and literal meanings of identifiers such as constant names, variable names, class names and method names in software code are beneficial to attackers’ understanding of the code. Layout obfuscation makes it more difficult for attackers to understand the software code by confusing these identifiers. There are many ways to obfuscate identifiers, such as hash function naming, identifier swapping, and overload induction. The hash function is named by simply replacing the string of the original identifier with the hash value of that string, so that the identifier string is independent of the software code; Identifier exchange refers to collecting all identifier strings in software code and then randomly assigning them to different identifiers. This method is difficult to be detected by attackers. Overloading induction refers to the use of some characteristics of naming rules in high-level programming languages, such as variable names in different namespaces can be the same, so that different identifiers in software can use the same string as far as possible, which increases the difficulty for attackers to understand the software source code. Layout obfuscation is the simplest method of obfuscation that does not change the code or execution of the software.

  2. Data obfuscation Data obfuscation is the modification of data fields in a program without processing code segments. Common data obfuscation methods include merge variable, split variable, array recombination, string encryption and so on. Merging variables is the merging of several variables into a single data, with each of the original variables occupying one of the regions, similar to a large data structure. Variable segmentation is to divide a variable into two variables, provide a mapping relationship before and after segmentation, and transform the operation of a variable into the operation of the two variables after segmentation. Array recombination can be divided, merged, folded and smoothed. Split is to divide an array into two or more arrays of the same dimension; Mergers are the opposite; Folding increases the dimension of the array; Smoothing is the opposite. In ELF files, global variables and constant strings are stored in data segments, and disassembly tools can easily find references between strings and code. In the software cracking, through some string prompts can be very convenient to find the code key statements, so as to crack the software. String encryption encrypts and stores these apparent strings and decrypts them as needed.

  3. Control confusion Control confusion is also called process confusion, which is to change the execution flow of the program, so as to interrupt the tracking thought of the reverse analyst, to protect the software. Commonly used techniques are insert instructions, disguised conditional statements, breakpoints and so on. A disguised conditional statement is A program that executes from A to B sequentially, confuses A conditional statement between A and B, so that A outputs TRUE or FALSE after execution, but B will execute no matter what output. Fuzzy predicates, embedding and linking, breaking order and other methods are used to control confusion. Fuzzy predicates make use of the principle of message asymmetry, and their values are known to the obfuscator when they are added, but hard to infer from the unobfuscator. So the addition will interfere with the compiler’s analysis of the value. The use of fuzzy predicates is usually to insert dead or bogus code, or to insert it in a loop or branch statement that interrupts the process of program execution. In-line is to embed a small piece of program into every program point called. Out-of-line is to abstract a piece of code without any logical connection into a piece of program that can be called many times.

Breaking the order means breaking the local dependencies of the program. Because programmers tend to put related code together, changing the spatial structure of a program by breaking the sequence will make the mind jump harder.

  1. Obfuscation prevention obfuscation prevention is typically designed for specialized decomcompilers to prevent being decomcompiled by such decomcompilers. It is designed to take advantage of the weaknesses of a particular decompiler or antiobfuscator. Prevention of confusion is very effective for specific decomcompilers, so the design should take advantage of the characteristics of various decomcompilers. The security protection scheme provides multi-dimensional protection of the above four confusion categories. In terms of layout confusion, security provides functions such as confusion and deletion of debugging information for function names in native code layer. In terms of data obfuscation, security provides functions for constant string encryption and global variable obfuscation. In terms of obfuscation control, security provides functions such as flattening, inserting bogus branches and code equivalence transformation for code flow. In terms of confusion prevention, the royal security has added the code for the mainstream decomcompilers to prevent confusion in the process of confusion, which can effectively resist its analysis. Royal security also provides application developers with different levels of protection and a variety of obfuscation function choices, users can customize different obfuscation function protection according to their own needs. At the same time, the royal security protection scheme in addition to provide code confusion protection technology, but also provide code virtualization technology and anti-reverse, anti-debugging and other security protection schemes, comprehensive use of a variety of protection schemes can effectively improve your code security, welcome to use.

About Tencent Security Lab

Tencent Mobile Security Laboratory: based on Tencent mobile phone butler products and services, through terminal security platform, network security platform and hardware security platform to create a full range of cloud terminal security solutions for the mobile industry. Tencent royal security focuses on providing comprehensive application security services for individual and enterprise mobile application developers.

Tencent Security anti-fraud laboratory: gathering the world’s top white hat hackers and a number of Tencent expert-level big data talents, focusing on anti-fraud technology and security attack and defense system research. Anti-fraud Lab has the world’s largest secure cloud database and serves 99% of China’s Internet users.

(Tencent Royal security team original, reproduced please indicate the source)