This is the 22nd day of my participation in Gwen Challenge

This article is part 9 of jakewharton’s series on D8 and R8.

  • R8 Optimization: Class Constant Operations
  • Originally written by jakewharton
  • Translator: Antway

In the previous article, we showed how D8 and R8 operate directly on string constants at compile time. R8 can do this because the contents of string constants can be retrieved at the IR layer.

However, there is another object type that can be manipulated at compile time: classes (bytecode). Classes are templates for the instances we interact with at run time. Because bytecode is essentially stored in these templates, you can perform some operations on the class at compile time.

1. Log Tags

There is an ongoing debate (if you can even call it that) about the best way to define a tag string in a class. Historically, there have been two strategies: string text and calling getSimpleName() on a class.

private static final String TAG = "MyClass";
// or
private static final String TAG = MyClass.class.getSimpleName();
Copy the code

Let’s write an example to test the pros and cons.

class MyClass {
  private static final String TAG_STRING = "MyClass";
  private static final String TAG_CLASS = MyClass.class.getSimpleName();

  public static void main(String... args) {
    Log.d(TAG_STRING, "String tag");
    Log.d(TAG_CLASS, "Class tag"); }}Copy the code

Compiling, dexing and then reviewing Dalvik bytecode for the code above.

[000194] MyClass.<clinit>:()V 0000: const-class v0, LMyClass; 0002: invoke-virtual {v0}, Ljava/lang/Class; .getSimpleName:()Ljava/lang/String; 0005: move-result-object v0 0006: sput-object v0, LMyClass; .TAG_CLASS:Ljava/lang/String; 0008: return-void [000120] MyClass.main:([Ljava/lang/String;)V 0000: const-string v1,"MyClass"
0002: const-string v0, "String tag"0004: invoke-static {v1, v0}, Landroid/util/Log; .d:(Ljava/lang/String; Ljava/lang/String;) I 0007: sget-object v1, LMyClass; .a:Ljava/lang/String; 0009: const-string v0,"Class tag"000b: invoke-static {v1, v0}, Landroid/util/Log; .d:(Ljava/lang/String; Ljava/lang/String;) I 000e: return-voidCopy the code

In main, load the string constant of the tag at position 0000, and at 0007, look for the static field and read the value. In the < Clinit > method, static fields are obtained by loading the MyClass class and then calling the getSimpleName method at run time. This method is called when the class is first loaded.

You can see that using string constants is more efficient, but using class.getSimplename () is more flexible for needs like refactoring. We also compile using R8.

[000120] MyClass.main:([Ljava/lang/String;)V
0000: const-string v1, "MyClass"
0002: const-string v0, "String tag"0004: invoke-static {v1, v0}, Landroid/util/Log; .d:(Ljava/lang/String; Ljava/lang/String;) I 0007: const-string v0,"Class tag"0009: invoke-static {v1, v0}, Landroid/util/Log; .d:(Ljava/lang/String; Ljava/lang/String;) I 000c: return-voidCopy the code

You can see that the MyClass value of the variable v1 is repeated in the operation following position 0004.

Because the name of the myClass known at compile time, the R8 has myClass. Class. GetSimpleName () “myClass” replaced by the string variables. Because the field value is now constant, the

method becomes empty and is deleted. Replace the sGET object bytecode with a constant string at the call location. Finally, two constant string bytecodes referencing the same string are deduplicated and reused.

Therefore, R8 ensures that no additional loading takes place. Because the getSimpleName() calculation is simple, D8 actually performs this optimization too!

2. Applicability

Being able to get getSimpleName() (as well as getName() and getCanonicalName()) on myclass.class seems to be of limited use — perhaps even in this log tag case. Optimizations only apply to class text references – getClass() doesn’t work! Again, in combination with other R8 features, this optimization began to apply even more.

Let’s look at the following example:

class Logger {
  static Logger get(Class
        cls) {
    return new Logger(cls.getSimpleName());
  }
  private Logger(String tag) { / *... * /}}class MyClass {
  private static final Logger logger = Logger.get(MyClass.class);
}
Copy the code

If logger.get is embedded at all calls, a call to class.getSimplename that previously had dynamic input for method parameters is changed to a static input for a class reference (in this case, myClass.class). R8 can now replace calls with string literals, resulting in field initializers that call the constructor directly (and will also remove their private modifiers).

class MyClass {
  private static final Logger logger = new Logger("MyClass");
}
Copy the code

This depends on whether the GET method is small enough or meets R8’s inline invocation style.

The Kotlin language provides the ability to force functions to be inlined. It also allows the generic type parameter on an inline function to be marked as reified, ensuring that the compiler knows which class to resolve for at compile time. Using these features, we can ensure that our function is always inline and that getSimpleName is always called on an explicit class reference.

class Logger private constructor(val tag: String) {
 
}
inline fun <reified T : Any> logger(a) = Logger(T::class.java.simpleName)

class MyClass {
 
  companion object {
    private val logger = logger<MyClass>()
  }
}
Copy the code

Logger function of the initial value will always be with myClass. Class. GetSimpleName equivalent bytecode (), then the R8 can replace for string constants.

For other Kotlin examples, type inference generally allows the omission of explicit type parameters.

inline fun <reified T> typeAndValue(value: T) = "${T::class.java.name}: $value"
fun main(a) {
  println(typeAndValue("hey"))}Copy the code

The output from the above example is: “Java.lang.String: “Hey”, and only two string constants in the compiled bytecode, and use StringBuilder to concatenate, then call System.out.println output, if this problem is solved, you will find that only one string constant calls System.out.println.

3. Obfuscation and optimization

Because this Optimization is done on bytecode, it must interact with other features of R8 that may affect classes, such as Obfuscation and Optimization.

Let’s go back to our original example.

class MyClass {
  private static final String TAG_STRING = "MyClass";
  private static final String TAG_CLASS = MyClass.class.getSimpleName();

  public static void main(String... args) {
    Log.d(TAG_STRING, "String tag");
    Log.d(TAG_CLASS, "Class tag"); }}Copy the code

What happens if this class is confused? If R8 does not replace the call to getSimpleName, the first log message will have a myClass flag, and the second log message will have a flag matching an obscure class name such as “a”.

To allow R8 to replace getSimpleName, you need to use a value that matches the runtime behavior. Fortunately, because R8 is also a tool for performing obfuscation, it can wait until the class is given its final name to replace it.

[000158] a.main:([Ljava/lang/String;)V
0000: const-string v1, "MyClass"
0002: const-string v0, "String tag"
0004: invoke-static{v1, v0}, Landroid/util/Log; .d:(Ljava/lang/String; Ljava/lang/String;) I0007: const-string v1, "a"
0009: const-string v0, "Class tag"
000b: invoke-static{v1, v0}, Landroid/util/Log; .d:(Ljava/lang/String; Ljava/lang/String;) I 000e:return-void
Copy the code

Notice how 0007 will now load the tag value for the second log call (different from the original R8 output) and how it correctly reflects the obfuscated name.

Even with obfuscation disabled, R8 has other optimizations that affect class names. Although I intend to cover it in a future article, sometimes R8 will merge a superclass into a subclass if R8 can prove that it does not need a superclass and that subclasses are unique. When this happens, the class name string optimization will correctly reflect the type name, even if the original code equivalent superType. Class. GetSimpleName ().

3. String Data Section

The previous article discussed how operations such as String. substring or string concatenation can be performed at compile time, resulting in an increase in the size of the string portion of the dex file. The optimizations discussed in this article also produce strings that don’t exist, and can get bigger.

So there are two scenarios to consider: “When to turn on obfuscation? When to turn off obfuscation “.

When obfuscation is enabled, the call to getSimpleName() should not create a new string. Both classes and methods will be obfuscated using the same dictionary, which starts with a single letter by default. This means that for obfuscated classes named B, inserting the string “b” is almost always free, because there will be a method or field named b as well. In the DEX file, all strings are stored in a pool that contains literal, class, method, and field names, making the probability of a match when fuzzy greater than Y.

However, replacing getSimpleName () with obfuscation disabled is never free. Although dex files have a uniform string section, class names are stored as type descriptors. This includes package names, with/as the delimiter, prefix L, and suffix; . For myClass, if in the hypothetical com.example package, the string data contains lcom/example/myclass; The entries. Because of this format, the string “myclass” does not exist and needs to be added.

Both getName() and getCanonicalName() produce a new string, and both return a fully qualified string, rather than considering existing qualifiers.

Because obfuscation creates a large number of string objects, it is now available only for top-level types. Works in MyClass, but not for anonymous and inner classes. Studies have also shown that obfuscation should not be used in a separate method to avoid increasing the dex file size.

4. To summarize

In the next article, we’ll discuss another optimization of R8.