Author: Shao Mai

preface

The author previously in the cloud music big front public number to share a part of the Android privacy compliance static check. Android privacy compliance static check

The previous article scanned in-app checks for privacy method calls by decompiling the APP. But there are some problems:

  • Unable to check for possible private method calls in the so file.
  • When we fully scan for a private method call somewhere, we don’t know where the actual call entry is.

The call in the so file

Sometimes we have privacy methods called by executing Java layer code through JNI reflection that cannot be found by scanning Java layer files. So you need to do a special treatment for the SO file.

Let’s sort out our requirements: generally speaking, the APP business side only needs to know whether certain privacy methods are called through SO. In which so there might be a call. The rest, we give so developers to check on the line.

Now that the requirements are clear, how do we know if a method is called in the so file? In Java, if a method is called by reflection, the string of the class name + method name must be stored as a string constant in the class file’s constant pool. So will there be a similar way of storage in SO?

The answer is yes, the strings of Linux C programs may exist in the following two areas:

  • .text code segment, usually an area of memory used to store the code that a program executes. The size of this area is determined before the program runs, and memory areas are usually read-only. Some architectures also allow code segments to be writable, that is, to modify the program. It is also possible to include read-only constant variables, such as string constants, in a code snippet.
  • .rodata This section is also called the constant area, used to store constant data, ro stands for ReadOnly. Store strings in C and constants defined by #define.

We can use the Strings command in Linux to retrieve the string used in the so file:

strings xx.so
Copy the code

We check the string of each so file in the APK file and flag the current SO as a suspicious call if it matches the configured privacy method name. The check process is as follows:

Check the output as shown in the following demo:

Method call chain analysis

A lot of times we don’t know where an Android API is being called, so we can only handle it at runtime, such as replacing its implementation with a hook method. But run-time checking does not cover all scenarios. So statically checking apK’s method call chain is necessary. At least we can see the class from which a sensitive method was called, so that we can trace and attribute.

Based on the technical solution shared in the previous article, the author further analyzes the method call chain. In the previous article, we said that by decompiling APK, we can transform to generate the relevant SMali file, which contains the relevant method call information. We can use this method information to organize the method call relationships for the entire app.

Methods to collect

At the beginning of the smali file, information about the current class is marked:

.class public final Lokhttp3/OkHttp;
.super Ljava/lang/Object;
Copy the code

We get the modifier of the current class and the complete type descriptor.

The.method directive in smali describes which methods are present in the class:

.method constructor <init>(Lokhttp3/Call$Factory; Lokhttp3/HttpUrl; Ljava/util/List; Ljava/util/List; Ljava/util/concurrent/Executor; Z)V .method private validateServiceInterface(Ljava/lang/Class;) V .method public baseUrl()Lokhttp3/HttpUrl;Copy the code

Using Retrofit as an example, we can see the method description in Retrofit.smali:

  • Constructor, passing in arguments as Factory, HttpUrl, List, List, Executor, and Boolean
  • The private validateServiceInterface method takes Class and returns void
  • Exposes method baseUrl with no arguments and returns HttpUrl

Through the above information, we can collect all the methods in an APP. We need to establish our own identifiability for each method, which we do with the following fields:

  • The class in which the method definition resides needs to be the full package name + class name

  • Fields required within a method signature, including:

    • The method name
    • Parameter passed in

In SMALI, the method descriptor is the descriptor of the JVM being used, and we need to parse the information in the descriptor to save each of our fields for output display. The descriptor rules for methods correspond symbols to types. The basic type relationship is:

symbol type
V void
Z boolean
S short
C char
I int
J long
F float
D double

Objects are represented as full package and class names, beginning with L, using file descriptor spacing, and ending with a semicolon, such as Strig:

LJava/lang/String;

Method relationship establishment

Once we’ve collected all the methods, we need to know who the method calls and by whom the method is called. In SMALI, we can use invoke-directives to find out which other methods are called within a method:

Invoke – including

  • invoke-directCall a method directly
  • invoke-staticCall a static method
  • invoke-virtualInvoke a virtual method
  • invoke-superCall the virtual method of the parent class directly
  • invoke-interfaceCalls a method of an interface

In addition to the invoke-interface, which needs to confirm the calling object at run time, there are several other ways to know which methods are called by the current method through the invoke-following description:

invoke-virtual {v2, p2, v1}, Ljava/util/HashMap; ->put(Ljava/lang/Object; Ljava/lang/Object;) Ljava/lang/Object;Copy the code

Invoke – The latter part of the directive describes the class name and method to be invoked, separated by ->. By parsing this part of the instruction, we can get complete information about the called method.

We can collect the call relationship of the decompilated SMali file in the whole app. During the collection process, each method will be stored, and each method includes its own method information as well as the list of called methods:

  • Calleds: calls its own list of methods

When a method call is scanned, we add the method to the callers of the current caller and the caller to its own calleds. The final method relationship is established as shown in the figure below:

We end up with a graph structure of a multi-fork tree, in which we can think of the privacy methods we need to check the call chain as leaf nodes of the tree.

Of course, we can also add a new callers array to represent the list of methods called by each method, so we can also build a tree structure with bidirectional binding of node points:

In the bidirectional binding tree structure, we can analyze the call chain of a method according to the method. You can also analyze all possible call chains for certain entries, starting at the top level. For example, when we suspect that some page is being called improperly, we can look for those Activity classes and work our way down to see if the privacy methods are being called.

Call chain traversal

After the method call relationship is established, we need to iterate through all the call chains and output them to the user. Here it is easier, we can use depth-first traversal to find all our possible paths:

There is A special case here. During recursion, A may be called by B, and B may be called by A. The current data structure is reflected in the graph structure forming A ring. So we need to make a judgment about whether there is a ring.

When we judge that there are repeated nodes in the current call chain, we can assume that there is a ring. At this point, we can directly end the recursion on the chain without actually affecting the compliance of our post-mortem analysis of the call chain.

This part of the logic can be expressed in pseudocode:

fun traversal(method) {
    val paths = []
    dfs(method, [], paths)
}

fun dfs(method, path, temp) {
    if (method.calleds.isNotEmpty) {
        for (called in method.calleds) {
            if (path.contains(called)) {
                temp.add(path)
                continue
            } else {
                newPath = []
                newPath.addAll(path)
                newPath.add(0, method)
                dfs(called.method, newPath, temp)
            }
        }
    } else {
        path.add(0, method)
        temp.add(path)
    }
}
Copy the code

The final effect of call chain analysis is shown as follows:

conclusion

Static checking of Android privacy compliance calls is enough to share, but there is a lot more that can be done about privacy compliance. Static checks also assist us in locating and checking possible problems. There are still many run-time monitoring options that can be explored, and the two will work better together.

This article is published from NetEase Cloud Music big front end team, the article is prohibited to be reproduced in any form without authorization. Grp.music – Fe (at) Corp.Netease.com We recruit front-end, iOS and Android all year long. If you are ready to change your job and you like cloud music, join us!