Swift compiler intermediate code SIL type system

This article is a personal translation of How to Talk to Your kids About SIL Type Use. Read with caution.

SIL object and address types

include/swift/SIL/SILType.h

enum class SILValueCategory : uint8_t {
  /// An object is a value of the type.
  Object,

  /// An address is a pointer to an allocated variable of the type
  /// (possibly uninitialized).
  Address,
};
Copy the code

The SIL can be divided into two main types, object and Address. Object types include integers, references to class instances, struct values, or functions. Address is the value that stores a pointer to an object type.

class SILType {
public:
  /// The unsigned is a SILValueCategory.
  using ValueType = llvm::PointerIntPair<TypeBase *, 2, unsigned>;
private:
  ValueType value;

  SILType(CanType ty, SILValueCategory category)
      : value(ty.getPointer(), unsigned(category)) {
  }
Copy the code

As you can see from the constructor of a SILType, a SILType contains a value of type ValueType, which consists of a CanType and a flag indicating whether it is object or Address.

/// CanType - This is a Type that is statically known to be canonical.  To get
/// one of these, use Type->getCanonicalType().  Since all CanType's can be used
/// as 'Type' (they just don't have sugar) we derive from Type.
class CanType : public Type {

Copy the code

The CanType type is canonical.

In SIL code, the parse and print SIL object types are prefixed with $, followed by canonical Formal Type (see the canonical type definition in the previous article if you forget what this means). The SIL address type is prefixed with $*. Here are some simple examples:

$(Int, Optional<(Int) -> ()>) // A tuple containing an integer and an Optional type methodCopy the code

The address type appears in the SIL code generated by the Swift expression that loads and stores the lvalue (assignable location) as well as the INOUT parameter. There are also some formal types that cannot be represented as values in SIL and must be operated on indirectly through addresses; These are called address-only types.

Note that not all Formal types are legitimate SIL types. In particular, function types and metatypes are special, and these will be examined later.

SILType has the following important interfaces:

SILValueCategory getCategory() const { return SILValueCategory(value.getInt()); } SILType getAddressType() const { return SILType(getASTType(), SILValueCategory::Address); } SILType getObjectType() const { return SILType(getASTType(), SILValueCategory::Object); } CanType getASTType() const { return CanType(value.getPointer()); } / / / whether the Address type bool isAddress const () {return getCategory () = = SILValueCategory: : Address; } /// Whether Object type bool isObject() const {return getCategory() == SILValueCategory::Object; }Copy the code

SIL type lowering

Type lowering – Type degradation. The systems we use in writing swift provide formal types. Swift’s Formal type system intentionally abstracts many representative issues, such as ownership transfer conventions and directness of parameters. The SIL is intended to represent most of these implementation details, and these differences should be reflected in the SIL type system, so the SIL type is much richer. Conversion operations from Formal type to SIL Type are known as type degradation, and SIL Types are also known as running types, which are reduced types.

Since SIL is an intermediate language, the SIL value roughly corresponds to the abstract machine’s infinite register. Address-only types are essentially those that are too complex to be stored in registers. Non-address-only types are called loadable types, which means they can be loaded into registers.

It is legal for an address type to point to a non-address-only type, but it is not legal for an object type to contain address-only.

Master/lib/SIL/IR/TypeLowering. The type of CPP relegation all logic.

A class TypeLowering is defined that contains some extended SIL Type information for SIL to use. GetLoweredType returns a SIL Type from a Formal type.

/// getLoweredType - Get the type used to represent values of the Swift type in SIL.
SILType getLoweredType() const {
	return LoweredType;
}

Copy the code

A helper class TypeConverter is defined, which creates and manages TypeLowerings, and provides getTypeLowering to return an object of the TypeLowering class.

 const TypeLowering & getTypeLowering(SILType t, SILFunction &F);
Copy the code

Sometimes you already have an SIL type and need to check whether it is trivial, loadable, or address-only. To do this, the SILType class defines various methods:

/// True if the underlying AST type is trivial, meaning it is loadable and can /// be trivially copied, moved or detroyed. Returns false for address types /// even though they are technically trivial. bool isTrivial(const SILFunction &F) const; /// True if the type, or the referenced type of an address type, is loadable. /// This is the opposite of isAddressOnly. bool isLoadable(const SILFunction &F) const { return ! isAddressOnly(F); } /// True if the type, or the referenced type of an address type, is /// address-only. This is the opposite of isLoadable. bool isAddressOnly(const SILFunction &F) const;Copy the code

Trivial, loadable, and address-only types

There are two key attributes that force the type to address-only:

Values of this type must always exist in memory, because values of these types must register their address in some global list. It makes no sense to pass such a value in a register because the register has no global address.
Values of this type may not have a known size at compile time. Although SIL values can be larger than those in a single register, we must know their sizes at compile time, because IRGen will divide SIL values into zero or more Scalar LLVM values, such as floating point and integers.

A typical example of the first type is a weak reference to a class instance. In Swift, weak references are used to interrupt the reference cycle in a memory-safe manner. A weak reference is through registered in the global structure of Swift runtime all weak references to implement (in stdlib/public/runtime/SwiftObject. Mm) found in the code. When the last strong reference to a class instance is destroyed, the runtime checks for undestroyed weak references to that instance and sets them to nil.

The second canonical example is a value of type generic parameter. Recall that unlike C ++ or Clay, Swift cannot fully instantiate generic functions and types. You can compile code that uses generics without knowing all the specific types that might be bound to these generic parameters at compile time; This is done by indirectly passing generic values that tell the runtime the size and alignment of the type and how to manipulate the metadata of those values.

In addition to the differences between the loadable and address-only types, there are further improvements between the loadable types. The loadable type is trivial if it is free to copy and destroy values of that type without performing additional logic. Examples of the Trivial type include integers, floating point values, and Pointers to permanent (resident) structures such as metatypes.

An example of a Loadable but non-trivial canonical is a strong reference to an instance of class. As long as we have the value own, we can load the class reference into the register. Single-assignment semantics ensure that reference counting semantics are preserved. There is no need to register all strong references globally or store them in memory for any other reason. However, if we want to copy a strong reference, we must increase its reference count. If you destroy a value that contains a strong reference, you must reduce the reference count.

When we ask SIL to lower the aggregate type (such as struct, enum, or tuple type), the code first looks at the lowering of each member of the aggregate. If all members are trivial, the aggregation type is trivial; If all members are loadable, the aggregation type is loadable. And if at least one member is address-only, the aggregation type is address-only.

Note that lowering the class type never requires looking at the class field. Class instances are always Pointers to a single reference count, which is loadable.

In addition, the values of a non-class binding protocol type (called Opaque Existentials) must be address-only because they can contain any specific type that matches that protocol. Since we do not know the full compliance type at compile time, we must assume that at least one of them contains a weak reference (or some other Opaque).

Also called an Existential type, it is a way of abstracting the type. You can assert the existence of a specific type without knowing it.

Review the material on member reference types in the previous section. This comes into play here, as we must use substitutions when using generics parameters to reduce aggregate types. For example, consider the following code:

struct Box<T> { var value: $*Box<Any> // address-only -- the aggregate type contains an existing type $Box<Int> // trivial -- the aggregate type contains only the trivial field $Box<NSObject> // loadable -- Struct Transform<T> {var fn: struct Transform<T> {var fn: Struct Phantom<T> {var counter: struct Phantom<T> // Trivial $Phantom<() -> ()> // Always trivial $Phantom<NSObject> Is trivialCopy the code

The first two types indicate that the degradation of a generic struct depends on the generic parameters. It makes no sense to talk about Box being loadable or address-only; for some types of Foo, just talk about Box

.

Also, I use the Transform example to show that the aggregation is not forced to address-only if the generic parameter is of type address-only, because here the generic parameter is not displayed directly as a field type, but as the parameter type of FN. The other case is the Phantom type above, whose type parameter does not appear in any of its field types at all.

SIL function types

Look at the following code:

struct Transform<T> {
  let fn: (T) -> T
}

func double(x: Int) -> Int {
  return x + x
}

let myTransform = Transform(fn: identity)
Copy the code

There is a generic Transform type that stores a function. The input and output of the function are generic parameters. The generic parameter is of type address-only and therefore must be passed indirectly. You can imagine that at the machine level, transform.fn uses the pointer to the returned parameter as the return value and the pointer to the parameter as the parameter.

On the other hand, the double function operates on integers, which doesn’t matter; Of course, we want the input value X to arrive in a register, and when it is returned from the function, the return value will be stored in another register. Swift’s Formal Type system allows identities to be stored in mytransform.fn because in Substitution T: After Int, the formal type of mytransform. fn matches exactly the formal type of double. However, if compiled as naivetly as described so far, the code will do something wrong at run time, because anyone calling myTransform.fn with an integer will pass the address of the integer value, not the integer value itself as expected by the function itself.

“Identity” is double. “identity” is double. “identity” is double. I guess this is because the article was written earlier, when Swift did not support this syntax.

let myTransform = Transform(fn: double)

myTransform.fn(1) // 2
Copy the code

Obviously, the decline of the function type is more than just the decline of its parameter and result types. We need a more flexible representation for cases where the parameter type might be trivial but must still be passed indirectly. In addition, we realize that the reduction of formal type must account for substitution in some way. In fact, you want to pass the full Corrupted Formal Type to TypeLowering :: getLoweredType(). When lowering function types, metatypes, and tuples, you must use the longer form of this function, which takes two parameters; Corrupted formal type and abstract pattern.

The abstract pattern is essentially the primitive uncorrupted type from which the emotion is derived. When the function type is lowered, the parameter transfer convention is based on an abstract schema rather than on the function Formal type. The result of lowering the FunctionType is an instance of the SILFunctionType class that adds some detail that FunctionType lacks: the conventions about how to pass parameters and results. Whether they are passed or returned by value or by address, and whether there is a transfer of ownership of the Loadable type (trivial does not require a transfer of ownership).

The final complication is that the function itself has a convention that describes how to call it. In our previous example, double is a global function that does not capture any value from the lexical context, so we can pass it as a single function pointer. This is called the thin function. On the other hand, if we take the closure value and store it in myTransform.fn, we must preserve the context to reference the captured value; This is called the thick function. A thick function is represented as two values, followed by a strong reference to the context object.

Here are some examples:

// Original type:    (Any, Int) -> ()
// Substituted type: (Any, Int) -> ()
$@convention(thick) (@in Any, Int) -> ()

// Original type:    (T) -> T
// Substituted type: (T) -> T
$@convention(thick) (@in T) -> @out T

// Original type:    (Int) -> Int
// Substituted type: (Int) -> Int
$@convention(thick) (Int) -> Int

// Original type:    (NSObject) -> NSObject
// Substituted type: (NSObject) -> NSObject
$@convention(thick) (@owned NSObject) -> @owned NSObject

// Original type:    (T) -> T
// Substituted type: (Int) -> Int
$@convention(thick) (@in Int) -> @out Int

Copy the code

Let’s revisit the example at the top of this section. The reduction type of mytransform.fn is:

$@convention(thick) (@in T) -> @out T
Copy the code

We must use the thick function convention because the user can store any function value in it, including closures. In addition, we must pass and return T indirectly because the generic parameter is of type address-only. On the other hand, we have lowered the type of double:

$@convention(thin) (Int) -> Int
Copy the code

At this point, we still can’t compile the code, but at least we can detect type mismatches at the level of SILFunctionTypes, rather than just miscompiling the incorrect code. The form type of the expression matches, but the reduced type mismatch is called abstraction difference. The abstraction difference is processed by SILGen wrapping the function value in a re-abstraction thunk.

Re-abstraction thunk forwarding parameters, call the function and forward the results, and carefully handle any abstract differences in the parameters and results. If the preference parameter is trivial, but the original preference parameter is passed indirectly, thunk will load the value from its address and pass it to the preference function. Similarly, if the result is trivial, but the original result is returned indirectly, thunk will take the result value and store it in the indirect return address provided to Thunk.

If you’ve ever seen re-abstraction in a debugger traceback, you know what they are; For the most part, you can ignore them, which simply means you’re doing things using generics while Swift does some magic behind the scenes.

Re – abstraction thunks implementation here lib/SILGen/SILGenPoly CPP). Main entrance is SILGenFunction: : emitOrigToSubstValue () and SILGenFunction: : emitSubstToOrigValue ().

Conceptually, these operations take the orientation type S and the primitive type O:

emitOrigToSubstValue()Converts a value of type S at abstraction level O to a value of type S at abstraction level S
emitSubstToOrigValue()Converts a value of type S at abstraction level S to a value of type S at abstraction level O

In practice, these functions actually take on a pair of types and abstract patterns, because the same machine is also used to thunk function conversions of various types. For example, Int is the formal sub-type of Any, so () -> Int is the formal sub-type of () -> Any; When passing () -> Int as a value of type () -> Any, the function needs to be wrapped in thunk. Call thunk of the function, wrap the result in an existing object, and return it to the caller.

Lowered metatypes

If none of this makes sense, don’t sweat. It took me some time to understand the reduction of the function type and why re-abstraction was needed and I probably didn’t do a good job of explaining it. Metatypes have something similar, but are easier to describe.

A value of type Metatype must uniquely identify formal Type at run time. Therefore, a value of Type nsobject. Type can contain any subclass of NSObject, of which there are thousands. Class Metatypes is reduced to a pointer to a runtime type Metatypes object.

However, a value of Type int. Type only needs to be uniquely identified as a subtype of Int, of which only one subtype, Int itself, is required. Therefore, int.type does not need to be stored at all and is actually reduced to null.

But, again, before and after the substitution, the reduction is different. If I have a generic type parameter T, t.type must be reduced to a pointer to the runtime value. If I use this value in the context of replacing it with T: = Int, how do I store int.type in it when the int.type value is null?

As before, the answer is that metatypes must be lowered based on the abstract schema that tells SIL what the most general scenario for the value will be. The generated metatype is annotated with convention.

// Original type:    NSObject
// Substituted type: NSObject
$@convention(thick) NSObject

// Original type:    Int
// Substituted type: Int
$@convention(thin) Int

// Original type:    T
// Substituted type: T
$@convention(thick) Int
Copy the code

A Thick Metatype has a run-time representation, but a thin Metatype does not. Note that class metatypes are always thick (although perhaps we can say that the final class metatypes are thin, but we don’t do that now). Only value types can have thin Metatype.

No re-abstraction thunks, Metatypes only needs Thickness conversions. When we go from thin to Thick, we are loading run-time values corresponding to unique compile-time metatype. When we scrimped, we simply discarded the runtime value because we knew it had to be unique at compile time.

One last thing: SIL box types

You may have seen silboxTypes in SIL code that look like the following SIL code form:

$@box Int
Copy the code

Allocated Heap a box type is a container for the value heap-allocated. Boxes appear as the type of the variable capture variable and payload of the indirect enum instance. In the former case, payload is shared and has variable reference semantics; In the latter, it is immutable and behaves as a value. Unfortunately, we can’t distinguish between the two, which may or may not be a problem.

Substitutions with SIL types

Sometimes, it is necessary to perform Substitution on SIL types that contain dependent types to produce fully concrete SIL types.

Recall that Substitution maps GenericTypeParamTypes to Formal Types. In addition, when a SILType contains BoundGenericType, the parameters to BoundGenericType are Formal types, not reduced SIL types.

Components of type tuple, on the other hand, are of type lowered SIL. Therefore, when we apply Substitution T: = int. Type, we expect the behavior of SIL Type Substitution to be as follows:

$Array<T> => $Array< int. Type> $Array<Int.Type> $Array<Int. T) => $(@convention(thick) Int.Type, @convention(thick) Int.Type)Copy the code

The logic to perform SIL type Substitution can be found in the SILType :: Subst () method, and the case on the right side of lower substitution can be handled correctly by using the original generic parameter as an abstract schema callback for type substitution.

Optional payloads are maximally abstract

Now you know enough about the interesting limitations of the current implementation. Recall the definition of the Optional type in Swift:

enum Optional<T> {
  case some(T)
  case none
}
Copy the code

If I store the value of a function of type (Int) -> Int in the optional payload, someone else might operate on it as if it were (T) -> Int, (Int) -> T, or (T) -> T. Therefore, functions stored in optional objects must be maximally abstract.

It’s a tradeoff; Converting Optional<(Int) -> Int> to Optional<(T) -> T does no work at runtime, but this is at the expense of always storing both values as the latter.

It is best to make this change at some point so that you can re-select the optional payload. This will require additional compiler support for the compiler, but they already have special requirements in semantic analysis anyway.

But for now, it’s important to remember that when you use Optionals in SIL, you never lower the payload type. For example, if you call SILType :: getAnyOptionalObjectType() on an optional SILType, you must then use Maximally Opaque AbstractionPattern :: GetOpaque () reduces the result. Error-prone people forget to lower the payload type, or use it itself as an abstract mode to reduce errors, which can easily cause SIL validators to fail or compile errors.

Next steps

The SIL type provides IRGen with more details about function calling conventions and how values are stored in memory and passed around. Currently, we still don’t know the size and alignment of values, or how they map to machine registers. This is IRGen’s work, and I’ll try to explain it in the next article.

Conclusion

The SIL type system introduces the concept of addresses on top of formal type systems. The SIL types are constructed from Formal types by Type lowering. The job of these types is to classify types into types that can be trivially passed, types that can be loaded into registers but require special copy and destruction behavior, and address-only types that must always be passed indirectly. This classification is also reflected in the reduced parameters and results of function types. Two expressions may have the same Formal type, but may have different SIL types, in which case SILGen knows to issue various conversions to bridge the abstract differences. Lowered Metatypes and SIL boxes complete the SIL type system.

How to talk to your kids about SIL type use

Add a section to SIL.rst describing the SIL Ownership Model. #13546

Swift Intermediate Language (SIL)

Ownership SSA and Safe Interior Pointers

SIL Ownership Model