Jake Wharton’s two practical lessons on code generation:

1. The economics of code generation (this article).

2. Optimize bytecodes by handling source code (for Android’s latest ViewBinding, this library also eliminates ButterKnife).

Original address :jakewharton.com/the-economi…

Among the many things I learned from Jesse Wilson was the phrase “economics of generating code.” This statement expresses the idea that what we value when generating is different from what we value when writing code.

A code generator is written only once, but the code it generates appears many times. Therefore, any investment in making the generator produce more efficient code will pay off quickly. This means generating as little code and assigning as few objects as possible. I’d like to extend this with two concrete, real examples.

Additional method references

Although not as problematic as in the past, method reference counting still deserves our attention. This is especially true for generated code. Small changes in the generator can cause hundreds or thousands of reference counts to increase or decrease.

The generated class is usually a subtype of a class in a runtime library. In addition to facilitating polymorphism, this can incorporate common functions and behaviors. In the case of JSON model classes, you want to preserve unknown keys and values encountered during parsing. Each generated class can maintain an unknown Map<String,? >, but merging into the base class of the library is ideal.

abstract class JsonModel{ private final Map<String,? > unknownPairs; public final Map<String,? >getUnknownPairs() {returnunknownPairs; }}Copy the code

There is no getUnknownPairs() method in each generated class to obviously reduce the count. However, because counting involves more than just declared methods, reducing the methods referenced in the generated code also has an impact.

Each generated class extends the JsonModel disease implementation toString(), which outputs its own fields as well as getUnkownPairs() mappings.

final class UserModel extends JsonModel{
    private final String name;
    private final String email;
    
    // ...
    
    @Override public String toString() {return "UserModel{"
            +"name="+ name+ ","
            +"email="+ email + ","
            +"unknownPairs="+getKnownPairs()
            +"}"; }}Copy the code

It is surprising how toString calls the getUnknownPairs() method when compiling, dex, and dumping Dalvik bytecode of the above classes using Dexdump.

[00024c] UserModel.toString:()Ljava/lang/String; 0000: iget-object v0, v5, LUserModel; .name:Ljava/lang/String; 0002: iget-object v1, v5, LUserModel; .email:Ljava/lang/String; 0004: invoke-virtual {v5}, LUserModel; .getUnknownPairs:()Ljava/util/Map; 0007: move-result-object v2Copy the code

Although the getUnknownPairs() method is placed in the JsonModel superclass, a reference to the method is generated for each generated class as if the method had been defined on the generated type. Moving this method does not actually decrease the count!

The API layer of a medium-sized application might have 100 model classes. If each generated class contained four calls to methods defined in the superclass, 400 method references would be created for no reason.

Changing the generated code to explicitly use super produces all method references that point directly to superclass methods.

@Override public String toString() {
   return "UserModel{"
       + "name=" + name + ","
       + "email=" + email + ","
-      + "unknownPairs=" + getUnknownPairs()
+      + "unknownPairs=" + super.getUnknownPairs()
       + '} ';
 }
Copy the code
[00024c] UserModel.toString:()Ljava/lang/String; 0000: iget-object v0, v5, LUserModel; .name:Ljava/lang/String; 0002: iget-object v1, v5, LUserModel; .email:Ljava/lang/String; -0004: invoke-virtual {v5}, LUserModel; .getUnknownPairs:()Ljava/util/Map; +0004: invoke-virtual {v5}, LJsonModel; .getUnknownPairs:()Ljava/util/Map; 0007: move-result-object v2Copy the code

These 400 additional references are now reduced to just 1! It’s not usually possible to make such a change, but because we control the base class and the generated class, this change is safe and can significantly reduce method references.

One point to make is that tuning your application with R8 automatically changes method references. But not every user of the code generator uses the optimizer. Making this small change will ensure that everyone benefits.

String repetition

No string is given in the generated code, but it occurs often enough to give enough thought to its impact. In my experience, strings in generated code usually fall into two categories: keys for some type of serialization or for exception error messages. There’s nothing we can do about the former, but the latter is interesting because these strings exist in code paths that are expected to be rarely used.

For example, a code generator binds an Android view in a layout to a field of a class. The view is required when it appears in every configuration of the layout, and we verify its existence at run time with a null check.

Public final class MainBinding {//... public static MainBindingbind(View root) {
    TextView name = root.findViewById(R.id.name);
    if (name == null) {
      throw new NullPointerException("View 'name' required but not found");
    }
    TextView email = root.findViewById(R.id.email);
    if (email == null) {
      throw new NullPointerException("View 'email' required but not found");
    }
    returnnew MainBinding(root, name, email); }}Copy the code

If you are using Baksmali to compile, dex, and dump the contents of the.dex file, you can see the string output in the string data area.

                           |[20] string_data_item
00044f: 22                 |  utf16_size = 34
000450: 5669 6577 2027 6e61|  data = "View \'name\' required but not found"
000458: 6d65 2720 7265 7175|
000460: 6972 6564 2062 7574|
000468: 206e 6f74 2066 6f75|
000470: 6e64 00            |
                           |[21] string_data_item
000473: 23                 |  utf16_size = 35
000474: 5669 6577 2027 656d|  data = "View \'email\' required but not found"
00047c: 6169 6c27 2072 6571|
000484: 7569 7265 6420 6275|
00048c: 7420 6e6f 7420 666f|
000494: 756e 6400          |
Copy the code

To encode in dex file format, these strings require 36 and 37 bytes, respectively (two extra bytes for encoding length and trailing empty strings).

In a real application, we can quickly eliminate these string consumption. Each string takes 32 bytes plus the length of the View ID, which is usually 12 characters. A medium application has heavy rain with 50 discrepancies and 10 views per layout. So 5010 (32+12) calculates the total consumption of 22KB. That’s not a lot of space, but considering we don’t want to use these strings before the program goes wrong, it’s not worth the expense.

In dex strings are deduplicated, so if the public part of the string is separated, we only need to consume space once. In addition, the string data area is also used to reserve field names, so strings that match field names are also released. With this in mind, we might naturally divide these strings into three parts.

if (name == null) {
-  throw new NullPointerException("View 'name' required but not found");
+  throw new NullPointerException("View '" + "name" + "' required but not found");
 }
 TextView email = root.findViewById(R.id.email);
 if (email == null) {
-  throw new NullPointerException("View 'email' required but not found");
+  throw new NullPointerException("View '" + "email" + "' required but not found");
 }
Copy the code

Unfortunately, Javac thinks the concatenation of constants is something it can optimize so that it can turn them back into a single unique string. To make it smart, we need to use StringBuilder to generate code, or use the little-known String.concat method.

if (name == null) {
-  throw new NullPointerException("View 'name' required but not found");
+  throw new NullPointerException("Missing required view with ID: ".concat("name"));
 }
 TextView email = root.findViewById(R.id.email);
 if (email == null) {
-  throw new NullPointerException("View 'email' required but not found");
+  throw new NullPointerException("Missing required view with ID: ".concat("email"));
 }
Copy the code

The dex file quality now contains a single prefix string, and we don’t need the ID string because they are already used in the R.ID field.

                           |[17] string_data_item
00046a: 1f                 |  utf16_size = 31
00046b: 4d69 7373 696e 6720|  data = "Missing required view with ID: "
000473: 7265 7175 6972 6564|
00047b: 2076 6965 7720 7769|
000483: 7468 2049 443a 2000|
Copy the code

22KB string data reduced to 33 bytes! Now it’s worth noting that we spent an extra 7 bytes to load the second String and call String.concat, but since the String is always larger than 32 bytes, it’s still a good choice. There is still room for de-duplicating the actual connections and exception-throwing code so that it costs only once per class, rather than once per view. But I’ll leave that for another article.

Seeing any one of these optimizations in handwritten code should be a cause for concern. The individual savings of applying them are not worth their otherwise unidomatic nature. With code generation, however, economics becomes different. With a single change to the generator, this optimization can be applied to hundreds or thousands of locations for greater effect.

–Jake Wharton