A, takeaway

This article is a Translation of Brian Goetz’s “Translation of Lambda Expressions” and adds some personal understanding. This Translation does not cover all of the chapters in the original, but just shares some important translations with you. This translation will provide a theoretical basis for subsequent articles on ASM handling Lambda expressions and method references.

Ii. About this article

This article Outlines a strategy for converting Lambda expressions and method references from Java source to bytecode. Lambda expressions for Java are formulated by the JSR 335 specification and then implemented by the Lambda Project. An overview of language features can be found in State of the Lambda.

This article focuses on how the compiler generates bytecode when it encounters Lambda expressions and how the Java language participates in evaluating Lambda expressions at run time. Most of this article covers transformation mechanisms that deal with functional interfaces.

Functional interfaces are one of the core concepts of Lambda expressions in Java. A functional interface is an interface that has only one abstract method, such as Runnable, Comparator, etc. Lambda expressions support only functional interfaces, as in the following two examples:

Runnable r = () -> { System.out.println("hello"); };
Collections.sort(strings, (String a, String b) -> a.compareTo(b));
Copy the code

The code generated when the compiler captures a Lambda expression depends on both the Lambda expression itself and the type of functional interface.

Dependencies and symbols

The design of Lambda expressions relies on several features described in JSR 292, These features include InvokeDynamic, Method handles, and enhanced LDC bytecode forms for Mehtod handles and Method Types. Since these features don’t show up in Java source code, we’ll use some pseudo-code to represent them:

  • For method handle constant: MH

(Reference type class-name. Method-name)

  • Short for method type constant: MT

(method – signature),

  • Short for Invokedynamic: INDY

((the bootstrap, static args…). (dynamic args…) )

The reader should have some familiarity with the knowledge in JSR 292.

Fourth, the transformation strategy

There are several ways to represent Lambda expressions in bytecode, such as inner classes, method handles, and dynamic proxies, all of which have advantages and disadvantages. There are two key measures for selecting strategies: one is to not introduce specific strategies in order to provide maximum flexibility for future optimization; The second is to keep the class file format stable.

We can do both using the Invokedynamic directive in JSR 292, which separates the expression of Lambda in binary bytecode from the runtime evaluation mechanism for Lambda expressions. Instead of generating bytecode to create an object that implements a Lambda expression (for example, calling a constructor for an inner class).

We described the way Lambda is constructed and linked at run time to the actual method, which is encoded at compile time in the static and dynamic argument list of the Invokedynamic instruction.

Using the InvokeDynamic directive allows us to wait until runtime to select a transformation strategy. The runtime implementation of this approach is free to choose transformation strategies to evaluate Lambda expressions dynamically.

The chosen runtime implementation is hidden behind the standard API that constructs Lambda so that the static compiler can call the API, and the JRE implementation can choose its own desired implementation strategy. Invokedynamic allows this without incurring the performance cost that might be imposed on subsequent binding methods.

When the compiler encounters a Lambda expression, it first desugar the content of the Lambda method body into a method whose argument list and return type match the Lambda expression, possibly with additional arguments (from the enclosing scope scope).

An Invokedynamic call site is generated where a Lambda expression is encountered. When executed, the invokedynamic call site returns an instance of the transformed functional interface whose implementation contains the contents of the Lambda. The call point for a given Lambda is called a Lambda Factory. The dynamic parameters of the Lambda Factory are captured from the outer scope. The Bootstrap method of the Lambda Factory is a standard method. It’s called Lambda MetaFactory.

Static bootarguments capture information about the Lambda at compile time (including the functional interface to be converted, the method handle to the desugred Lambda method body, and whether to serialize the SAM type). Method references are treated in the same way as Lambda expressions, but most method references do not need to be desugared into a new method; We can simply load a constant method handle for a referenced method and pass it to metaFactory.

5. Deglycosization of Lambda method

The first step in converting a Lambda expression into bytecode is to desugar the body of the Lambda method into a method.

There are several issues to consider about desugarization:

  • Desugar the body of a Lambda method into a static method or an instance method?
  • In which class should the methods generated after desugarization be placed?
  • What should be the accessibility of the resulting method after desugarization?
  • What should be the name of the process that comes out of it?
  • If an adapter is needed to bridge Lambda method body signatures and functional interface method signatures (such as boxing, unboxing, enlarging and narrowing conversions of base types, dynamic parameter conversions, etc.), does the deicing follow the Lambda method body signature, the functional interface signature, or a combination of the two? And who is responsible for the adaptation?
  • If Lambda gets parameters from external scopes, how should those parameters be represented in the method signature for desugaring? (For example, they can be added before or after the argument list, or they can be incorporated into a “frame” argument at compile time.)

As with the desugaged Lambda method body, we also need to consider whether a method reference requires an adapter or a bridge method.

The compiler deduces the method signature of the Lambda expression, including the parameter type, return value type, and exception information, which we call the Natural signature. Lambda expressions also have a target type, which is a functional interface. We refer to Lambda descriptors as method signatures that remove the target type. The value returned from the Lambda Factory that implements the functional interface and captures the behavior of the lambda is called a lambda Object.

All things being equal, private methods are superior to non-private methods, and static methods are superior to instance methods. The best result is that the Lambda method body is desugared in its class, the desugared signature should match the Lambda method body signature, and the required additional parameters should be added to the front of the argument list. And it does not desugarize method references at all. At some point, however, we have to deviate from this baseline strategy.

5.1 stateless Lambdas for the example of desugarization

A simple Lambda expression takes the form of a stateless Lambda if no state is captured from the external scope on the body of the Lambda method:

class A { public void foo() { List<String> list = ... list.forEach( s -> { System.out.println(s); }); }}Copy the code

The natural signature of this Lambda expression is (String) V; Note: This is actually the method signature of the forEach anonymous class. The compiler desugares the body of the Lambda method into a static method with the same signature as the natural signature of the Lambda expression, and generates a method for the desugarer. For example, the following code shows:

class A { public void foo() { List<String> list = ... list.forEach( [lambda for lambda$1 as Block] ); Static void lambda$1(String s) {system.out.println (s); // Static void lambda$1(String s) {system.out.println (s); }}Copy the code

5.2 The lambdas of the desugarization case capture invariant values

Another form of a Lambda expression is a final local variable in the body of a Lambda method that is externally applied or that is implicitly final, Or if you want additional recommendations on the fields of an external instance (here you can think of this as capturing the this.xx field of an external scope) :

class B { public void foo() { List<Person> list = ... final int bottom = ... , top = ... ; list.removeIf( p -> (p.size >= bottom && p.size <= top) ); }}Copy the code

In the above example, Lambda uses local variables bottom and top of final type in the outer scope. The desugared method will use natural Signature (Person). And add some extra parameters to the parameter list. The compiler decides how these additional arguments are represented: arguments can be added one by one to the argument list, placed in a frame class, or placed in an array. Of course, the easiest way is to add the parameters one by one to the front of the parameter list. See the following example:

class B { public void foo() { List<Person> list = ... final int bottom = ... , top = ... ; list.removeIf( [ lambda for lambda$1 as Predicate capturing (bottom, top) ]); } static Boolean lambda$1(int bottom, int top, Person p) {return (p.size >= bottom && p.size <= top; }}Copy the code

Alternatively, you can wrap the parameters used (bottom and top) in a frame or array. The key point is to agree on the types of additional parameters in the desugarization method and the position of those types as dynamic parameters in the Lambda Factory. Because compile time controls both and is generated simultaneously, the compiler has some flexibility in how it wraps parameters.

Vi. The Lambdas Metafactory

Lambda capture will be achieved by invokedynamic call points whose static parameters contain characteristics of both the body of the Lambda method (which can be understood as a descriptor for a functional interface) and the Lambda descriptor (which can be understood as a desugmenting description). The dynamic parameters of the call point, if any, are the captured values.

When called, the call point returns a Lambda object for the associated Lambda method body and descriptor and binds the captured value.

The bootstrap method of the call point is a platform-specific method called Lambda MetaFactory. The virtual machine calls the MetaFactory only once for each capture point, after which it connects to the call point and exits. The link to the call point is lazily loaded, so factory Sites will not be linked if it is not executed. The basic metaFactory static parameters are as follows:

metaFactory(MethodHandles.Lookup caller, // provided by VM
            String invokedName,          // provided by VM
            MethodType invokedType,      // provided by VM
            MethodHandle descriptor,     // lambda descriptor
            MethodHandle impl)           // lambda body
Copy the code

The first three arguments (caller, invokedName, invokedType) are generated automatically when the virtual machine invokes the link. The descripter parameter identifies the functional interface method for the Lambda being converted. The IMPL argument identifies the Lambda method, either as the desugared body of the Lambda method or as the method name in the method reference. There are some differences in the method signature and implementation of the functional interface method. Implementation methods can have additional parameters. Other parameters may not match exactly.

6.1 Lambda capture

We are now ready to transform the functional interfaces of Lambda expressions and method references. We can convert the example class A as follows:

class A { public void foo() { List<String> list = ... list.forEach(indy((MH(metaFactory), MH(invokeVirtual Block.apply), MH(invokeStatic A.lambda$1)( ))); } private static void lambda$1(String s) { System.out.println(s); }}Copy the code

Because the Lambda in A is stateless, the dynamic argument to the Lambda Factory call point is empty. For class B in the example, the dynamic argument is not empty because we have to add the values of bottom and top to the lambda Factory:

class B { public void foo() { List<Person> list = ... final int bottom = ... , top = ... ; list.removeIf(indy((MH(metaFactory), MH(invokeVirtual Predicate.apply), MH(invokeStatic B.lambda$1))( bottom, top )))); } private static boolean lambda$1(int bottom, int top, Person p) { return (p.size >= bottom && p.size <= top; }}Copy the code

6.2 Static method or Instance method

Lambdas like the ones in the previous section can be converted to static methods because they do not use instances of external objects (such as this, super, or members of the external instance). In general, we will use this, super, or members of external instances in Lambda (this case is called instance-capturing lambdas, as opposed to non-instance-capturing lambdas). Non-instance-capturing lambdas, deicing into private static methods; Instance-capturing lambdas, desugared into private Instance methods. This is declared as the first dynamic parameter when capturing instance-capturing lambda.

As an example, consider the following Lambda expression using a minSize field:

list.filter(e -> e.getSize() < minSize )
Copy the code

We first de-sugared the above example into an instance method, and then captured the receiver (this) as the first parameter. The results are as follows:

list.forEach(INDY((MH(metaFactory), MH(invokeVirtual Predicate.apply),
                   MH(invokeVirtual B.lambda$1))( this ))));
 
private boolean lambda$1(Element e) {
    return e.getSize() < minSize;
}
Copy the code

Since the Lambda method body is converted to a private method, the metaFactory call point loads a method handle from the constant pool. For instance methods, this method handle is of type REF_invokeSpecial, For static methods, the type of the method handle is REF_invokeStatic. We de-sugared it as a private method because a private method can use members of its class.

6.3 Capturing method references

Method references can be written in many ways, similar to lambdas, and can be categorized as instance-capturing and non-instance-capturing. Non-instance-capturing method references include static method references (Integer:: parseInt), method references for unbound instances (String:: Length), and constructor references (Foo::new). Dynamic parameter lists are always empty when non-instance-capturing method references, for example:

list.filter(String::isEmpty)
Copy the code

The above example would be converted to:

list.filter(indy(MH(metaFactory), MH(invokeVirtual Predicate.apply),
                 MH(invokeVirtual String.isEmpty))()))
Copy the code

Method references of the instance-capturing type include binding Instance method references (S ::length), super method references (super::foo), and Inner class constructors (Inner::new). When capturing method references of the instance-capturing type, the captured parameter list always has one parameter, which is this.

6.4 Variable Parameters

If a method reference expression refers to a variable parameter method, but the corresponding functional interface is not a variable parameter, the compiler must generate a bridge method and use the bridge method instead of its own target method. The bridging method must handle the adaptation of any required parameter types and the conversion of parameters from mutable to immutable. Examples are as follows:

interface IIS { void foo(Integer a1, Integer a2, String a3); } class Foo { static void m(Number a1, Object... rest) { ... } class Bar {void Bar () {SIS x = Foo::m; }}Copy the code

Here the compiler needs to generate a bridge method to fit the adapter. The first parameter type of the adapter is converted from Number to Integer, and the remaining parameters are placed in an Object array.

class Bar { void bar() { SIS x = indy((MH(metafactory), MH(invokeVirtual IIS.foo), MH(invokeStatic m$bridge))( )) } static private void m$bridge(Integer a1, Integer a2, Foo. M (a1, a2, a3); }}Copy the code

6.5 Parameter Adaptation

The Lambda method for desugaring has a list of arguments and a return value :(A1.. An) ->Ra (if the desugar method is An instance method, then the receiver this is considered as the first argument). Similarly, functional interface methods have a list of arguments and a return value :(F1.. Fm) →Rf (no receiver this parameter), factory call point dynamic parameter type type is (D1.. Dk). If the Lambda is an instance-capturing type, the first dynamic parameter must be receiver this.

The length of all arguments must be added as k+m == n, which means that the argument list length of the Lambda method body should be the length of the dynamic argument list plus the argument list length of the functional interface method.

We split the argument list A1 of the body of the Lambda method.. An is further divided into (D1.. DK H1.. Hm), where the D parameter corresponds to the dynamic parameter, and the H parameter corresponds to the parameter of the functional interface method.

We need the Hi parameter type to be suitable for Fi, similarly, we need the Ra type to be suitable for Rf. When the following scenarios are met, type T matches type U:

  • T == U
  • T is the base type, U is the reference type, and T can be converted to U by boxing
  • T is the reference type, U is the base type, and R can be converted to U by unboxing
  • T and U are both base types, and T can be extended to U (for example, int can be converted to long)
  • Both T and U are reference types, and T can be cast to U

This adaptation is validated when metaFactory links.

Seven,

There are still some chapters after the original text mainly describe MetaFactory through pseudo code with serialization idea, so I won’t introduce it here. The core is still the content of the previous chapters, and many details are very important. For example, if you don’t know the logic mentioned earlier when a method references a variable parameter, you might forget to adapt to the situation.

In addition, stay tuned for the next blog post to share technical knowledge on “ASM Hook Lambda and method references.”

References: twitter.com/BrianGoetz cr.openjdk.java.net/~briangoetz… Jcp.org/en/jsr/deta… Openjdk.java.net/projects/la… Cr.openjdk.java.net/~briangoetz… Jcp.org/en/jsr/deta…