Zyq8709 · 2015/10/03 10:03

0 x00 preface


Some research into the source code of the two Android runtimes, coupled with the rise of Android reinforcement services in full force, led to the creation of a runtime for unpacking, and thus DexHunter (source: github.com/zyq8709/Dex…). . Today, I will talk about some of my simple ideas through this article for your reference and discussion.

0x02 Related mechanism


First, let’s take a look at some of the mechanisms associated with Android runtime and see what we can do.

First of all, it is necessary to unshell the Dex file format, which is already very clear in the official Documentation of Android, and I will briefly mention it here. The entire structure is shown in Figure 1:

Figure 1 Dex file structure

In the header, there is an offset value that points to the start of each segment. Class_defs and data are the two sections we care most about.

Class_defs contains all classes, described by class_def_item. Figure 2 is a schematic expansion of class_def_item:

Figure 2 Class_def_item structure

Each class_def_item refers to a class_data_item. Each class_data_item contains the data of a class. Each method is described by the encoded_method structure, which in turn refers to a code_item. This is where all the instructions for a method are stored.

For ART, the installed dex file is compiled into an OAT file, which is actually an ELF file. Figure 3 shows its structure:

Figure 3 OAT file structure

It can be seen that the part with OatData pointing to contains the original Dex file, which is our target. Of course oatexec refers to compiled ARM instructions, but it is not useful for us at present.

0x03 Four Opportunities


In order to unshell, we need to establish a concept, which is “timing.” For non-virtual shells, dumping from memory is the most effective and consistent technique, so you must find a time when the data in memory is exactly correct.

In Android, there are four opportunities:

  1. Open the Dex file

    If the dex file in APK is extracted and cached, then the odex or OAT file is finally opened.

  2. To load a Class

    The runtime reads each class stored in the Dex and populates it with a generated class object that contains all the members of the class so that a class can be used; Figure 4 shows the structure of the Class object under ART and DVM

    Figure 4. Structure of the Class

  3. Initialize the Class

    If a class has a static block, that part is compiled as the class’s initializer, or method, which is executed when the class really needs to be used. Of course, the shell can use it to do a lot of things;

  4. Call a concrete method

    Needless to say, specific code instructions are found based on the generated Class object and executed.

0x04 Two loads


Okay, so how do we do that? Very simply, we start with the loading of the class.

In general, there are two ways to load a Class. One is display loading, mainly used for reflection, which actively loads a Class by calling class.forname () or classLoader.loadClass (). The other is implicit loading, which occurs primarily by creating the first instance of a class or when a static member is accessed prior to class generation. There are functions behind these operations that are actually done at runtime.

In the ART:

Explicit loading:

This. The corresponding DexFile_defineClassNative loadClass

Corresponding Class_classForName class.forname

Implicit loading:

Corresponding artAllocObjectFromCode

Figure 5 illustrates this relationship:

The implementation in Figure 5 ART

In the DVM:

Explicit loading:

This. The corresponding Dalvik_dalvik_system_DexFile_defineClassNative loadClass

Corresponding Dalvik_java_lang_Class_classForName class.forname

Implicit loading:

Corresponding dvmResolveClass

Figure 6 is the implementation representation in DVM:

Figure 6 implementation in DVM

0x05 Start to modify


It is clear that we have found the key point, which is DefineClass in ART and Dalvik_dalvik_system_DexFile_defineClassNative in DVM. This is where we start and the main changes are made. Simply put, load and initialize all classes at once.

There are several principles implicit in this:

  • When the class is loaded, the corresponding part of the dex must be valid;
  • When a Class is initialized, the contents of the dex include that the generated Class object can be modified;
  • Code_item is required to be valid only if a method is executed.

Figure 7 shows a workflow of DexHunter:

Figure 7. DexHunter principle

Here are the steps:

(1) Locate the memory

For each of the entry functions mentioned earlier, there is an argument that represents the file being manipulated.

In ART, this parameter is the DexFile object, which has a location_ member, which is a string and can be easily understood as the path to this file. So the DVM is DexOrJar, and the relative string member is fileName. Once we specify the target string, we can find out which dex file we want from the many dex files we can use, and conveniently, from these two objects, we can easily find the starting address and length of the operation file in memory.

(2) Active loading and initialization

This is to traverse each class_def_item in the class_defs section of the dex file and load and initialize it one by one. In ART, we use the FindClass function to load the class and EnsureInitialized to initialize it. DVM is initialized with dvmDefineClass load, dvmIsClassInitialized and dvmInitClass.

(3) Dump and repair automatically

Finally, it’s time to actually grab dex. Divide dex into three parts:

  • Part 1: Before class_defs
  • Part 2: class_defs segments
  • Part 3: The Part after class_defs

Let’s save Part 1 in part1 file and Part 3 in data file. Don’t worry about Part 2.

Now we have to parse the class_defs stuff. The incomplete code, in simple words, is to mimic the Android process, we decode each class_datA_item into an in-memory object (with LEB128 encoding) for our repair.

Here are some decisions to make to see if a repair is needed:

Check whether class_datA_off in class_def_item is in the memory scope of the dex file. If it is out, you need to put the class_datA_item in the end of the dex. Modify class_def_item and save it.

Compare the parsed AccessFlag and CodeOff with the accessFlag and codeOff of the method object generated at run time. If the accessFlag and codeOff are inconsistent, the accessFlag and codeOff in run time shall prevail and be modified and saved.

Again, check to see if code_item_off is out of bounds. If it is out of bounds, retrieve the code_item, continue to add to the tail, and modify the contents of class_def_item to save again.

Of course, the so-called “tail” is only to ensure that the offset value starts from the tail, the real content is stored in the extra file first. The modified class_defs section is saved in the classdef file.

Then we put the four files back together and we get the original dex or Odex.

0x06 Interesting phenomenon


Finally, some interesting things we’ve seen.

360 basically encrypted the original dex in an SO and decrypted it before loading.

Ali removed some class_data_items and code_items and fixed the relationship when he opened dex. Also, some annotation_off is invalid to prevent static parsing.

Baidu removes some class_data_items, much like Alibaba, and erases the heads of dex files. It also rewraps individual methods to restore before calling and erase after calling. The code can be retrieved by monitoring DoInvoke (ART) and dvmMterp_invokeMethod (DVM).

Bang-bang-love encryption is similar to the 360. Bang-bang-love encryption hooks a bunch of liBC functions like read,write, mmap, etc. to prevent reading of the related dex area. The encrypted string will change, but only the filename will change and the directory will remain unchanged.

Tencent created a fake class_data_item for the protected class or method, which does not contain the protected content. The real class_datA_item is released and attached at run time, but the code_item is always in the dex file, filling annotation_OFF and debug_info_OFF with invalid data to interfere with decomcompiling.

0 x07 reference


https://source.android.com/devices/tech/dalvik/dex-format.html /libcore/libart/src/main/java/java/lang/ClassLoader.java /libcore/libdvm/src/main/java/java/lang/ClassLoader.java /libcore/dalvik/src/main/java/dalvik/system/DexClassLoader.java  /libcore/dalvik/src/main/java/dalvik/system/PathClassLoader.java https://github.com/anestisb/oatdump_plus#dalvik-opcode-changes-in-artCopy the code