In-depth knowledge of API design and local call conventions

Original address: medium.com/dartlang/im…

Original author: medium.com/dacoharkes

Published: June 9, 2021-8 minutes to read

In Dart version 2.12, we extended the C-Interop feature, known as Dart FFI, to allow structure to be passed by value. This article describes the process of adding this functionality to the Dart SDK. If you’re interested in low-level language implementation details or platform conventions for passing structures by value, read on.

This article describes the process of developing an API and defining an ABI (Applied binary Interface) to implement value-by-value structural functionality. In the two years we’ve been working on this feature (and other Dart FFI features), we’ve found a number of API limitations that need to be changed. The ABI tour is equally interesting, illustrating the many ways you can approach the details of a tricky problem.

Passing by value and passing by reference in C/C++

If you don’t code in C every day, here’s a quick refresher. Suppose we have the following structures and functions in C.

struct Coord {
  double x;
  double y;
  Coord* next;
};

Coord TranslateByValue(Coord coord) {
  coord.x = coord.x + 10.0;
  coord.y = coord.y + 10.0;
  return coord;
}

void TranslateByPointer(Coord* coord) {
  coord->x = coord->x + 10.0;
  coord->y = coord->y + 10.0;
}
Copy the code

We can then use these functions in some simple C code. Let’s say we have a local variable c1.

Coord c1 = {10.0.10.0.nullptr};
Copy the code

If we pass C1 to TranslateByValue, the argument is passed by value, which allows the caller to effectively manipulate a copy of the structure.

Coord c2 = TranslateByValue(c1);
Copy the code

That means c1 stays the same.

However, if we pass C1 by reference with a pointer to the memory that contains c1, then C1 will be changed in place.

TranslateByPointer(&c1);
Copy the code

C1. X now contains 20.0.

API design tour

The original Dart FFI prototype already supported passing Pointers to structures. However, we redesigned the API several times to accommodate various use cases and limitations.

The original design

Our original design allowed us to allocate structures in memory, pass these Pointers to C, and modify the fields of the structure. In this way, the Struct class extends the Pointer class.

@struct
class Coordinate extends Pointer<Void> {
  @Double(a)double x;

  @Double(a)double y;

  @Pointer()
  Coordinate next;

  /// generated by @ffi.struct annotation
  external static int sizeOf();

  static Coordinate allocate({int count: 1}) =>
    allocate<Uint8>(count: count * sizeOf()).cast();
}
Copy the code
finalc = Coordinate.allocate() .. x =10.0
  ..y = 10.0;
Copy the code

The Dart FFI user wrote the previous snippet, and the Dart FFI internally generates an implementation of sizeOf and getter and setter implementations for X, Y, and Next.

However, two years ago we realised there was a problem with this design. We can’t tell the difference between Coordinate and Coordinate star by letting Coordinate expand Pointer.

Distinguish between Coordinate and Coordinate*

We introduced Struct in Dart FFI and let structures extend this class.

abstract class Struct<S extends NativeType> extends NativeType {
  final Pointer<S> addressOf;
}
Copy the code

Now Pointer

in Dart stands for Coordinate* in C, and Coordinate in Dart stands for Coordinate in C.

This means that the next field is of type Pointer

, which makes the @pointer annotation redundant. So, we get rid of the Pointer annotation.

class Coordinate extends Struct<Coordinate> {
  @Double(a)double x;

  @Double(a)double y;

  Pointer<Coordinate> next;
}
Copy the code

Because we now represent Pointers to structures as pointer objects, we begin to use allocation factories on Pointers.

final c = Pointer<Coordinate>.allocate();
Copy the code

To access the field from Pointer

, we need an object of type Coordinate, because that object has x, Y, and next fields. To do this, we already have a load method from Pointer.

c.load<Coordinate>().x = 10.0;
Copy the code

Of course, it’s tedious to say

when you call load. (The same is true for loading a Dart int from Pointer

.) The reason we need this type parameter at load time is to specify the return type of this method to the Dart type system.

The rescue of extension methods

Dart 2.7 introduces an extended approach. By extending the method, we can pattern match the type parameter T in Pointer

.

extension StructPointer<T extends Struct> on Pointer<T> {
  external T get ref;
}
Copy the code

Pattern matching for type parameters takes the hassle out of calling sites.

c.ref.y = 10.0; // ref is pattern matched to be of type Coordinate.
Copy the code

We can also make the type parameter of Struct redundant by using extended method pattern matching, changing the definition of the user structure to.

class Coordinate extends Struct {
  @Double(a)double x;

  @Double(a)double y;

  Pointer<Coordinate> next;
}
Copy the code

Previously, the type parameter constrained the Struct field Pointer addressOf. Instead, we change the field to extend the getter.

extension StructPointer<T extends Struct> on Pointer<T> {
  external Pointer<T> get addressOf;
}
Copy the code

Stop leaking backup storage

When the value of a structure is returned to Dart from C, we don’t want to store the structure in C memory because doing so would be slow and burden the user with freeing memory. Therefore, the structure is copied to TypedData, and the coordinates can have a pointer or a TypedData as support store.

However, addressOf, introduced in the first redesign, has a type of Pointer. This type expresses that it is always supported by C memory, but this is no longer true.

So, we scrapped addressOf.

To optimize

The final step is to require all calls to Dart FFI methods, including structure-specific methods, to have compile-time constant-type arguments.

extension StructPointer<T extends Struct> on Pointer<T> {
  /// Must invoke with a constant [T].
  external T get ref;
}
Copy the code

Method calls allow us to better optimize our code and be more consistent with the semantics of THE C language.

Note that this last change triggered a deprecation notification in Dart 2.12, which is enforced in Dart 2.13.

ABI Discovery Tour

Now that the API is in place, the next question is: where does C want these structures to be when passing or returning values? This is known as the Application binary Interface (ABI).

The document

The most natural thing is to look for documentation. ABI 2019Q1 and AArch64 are the standard for program invocation of ARM architecture. However, official documentation for x86 and X64 fell off the Internet, leaving people to resort to unofficial mirroring or reverse engineering when searching for this information.

A quick glance through the documentation shows various locations of the pass-by-value structure.

  • In multiple CPU and FPU registers.
  • In the stack.
  • A pointer to a copy. The copy is on the caller’s stack frame).
  • Part of it is in the CPU register, part of it is in the stack.

When passed on the stack, there are also questions about what alignment is required and whether all unused CPU and FPU registers are blocked or backfilled.

When a structure is returned by value, the structure can be returned at two locations.

  • In multiple CPU and FPU registers.
  • The caller writes to a memory location, in which case the caller passes in a pointer to the memory location. This reserved memory is also on the caller’s stack frame.

When a pointer to the result location is passed in, there is also the question of whether this conflicts with the normal CPU parameter register.

Refactoring the Dart FFI compilation

This initial investigation was enough for us to realize that we had to redesign a portion of the Dart FFI compiler pipeline. We used the Location type repeatedly, which was originally used to compile Dart code into assembly.

However, in the Dart ABI, we never use non-word-aligned stack positions or use more than two registers at the same time. An experiment that tried to extend the Location type to support these additional locations ended up making a huge and complex difference because Location was heavily used in the Dart virtual machine.

So instead, we replaced the compile pipeline for Dart FFI.

Explore the local ABI

Let’s explore the ABI.

Suppose we have the following structure and C function signature.

struct Struct3Bytes {
  uint8_t a0;
  uint8_t a1;
  uint8_t a2;
};

Struct3Bytes MyFunction(Struct3Bytes, Struct3Bytes, Struct3Bytes,
                        Struct3Bytes, Struct3Bytes, Struct3Bytes,
                        Struct3Bytes, Struct3Bytes);
Copy the code

How do the various ABis pass these structures in MyFunction?

In Linux for X64, there are six CPU parameter registers. The structure is small enough to fit into one register, so the first six parameters go into the six CPU parameter registers, and the last two parameters go onto the stack. Stack parameters are aligned to 8 bytes. Also, the return value fits in a CPU register (larger example).

rdi int64 Compound(size: 3)
rsi int64 Compound(size: 3)
rdx int64 Compound(size: 3)
rcx int64 Compound(size: 3)
r8 int64 Compound(size: 3)
r9 int64 Compound(size: 3)
S+0 Compound(size: 3)
S+8 Compound(size: 3)
=>
rax int64 Compound(size: 3)
Copy the code

So what happens on Windows?

It’s totally different. Windows only has four parameter registers. However, the first register is used to pass a pointer to the memory location to write the return value. Also, all arguments are passed to a copy by pointer, because the structure size is 3 bytes, not a power of 2.

Locations on Windows
Pointer(rdx int64) Compound(size: 3)
Pointer(r8 int64) Compound(size: 3)
Pointer(r9 int64) Compound(size: 3)
Pointer(S+0 int64) Compound(size: 3)
Pointer(S+8 int64) Compound(size: 3)
Pointer(S+16 int64) Compound(size: 3)
Pointer(S+24 int64) Compound(size: 3)
Pointer(S+32 int64) Compound(size: 3)
=>
Pointer(rcx int64, ret:rax int64) Compound(size: 3)
Copy the code

Let’s look at another example. ARM32 on Linux and Android. Suppose we have the following structure and C function signature.

struct Struct16Bytes {
  float a0;
  float a1;
  float a2;
  float a3;
};

Struct16Bytes MyFunction2(Struct16Bytes, float, Struct16Bytes);
Copy the code

These particular types of structures are called homogeneous combinations because they contain only the same elements. Moreover, a homogeneous floating body with up to four members is treated differently from a normal structure. In this case, Linux uses floating-point registers for each floating point in the structure.

Multiple(s0 float, s1 float, s2 float, s3 float) Compound(size: 16)
s4 float
Multiple(s5 float, s6 float, s7 float, s8 float) Compound(size: 16)
=>
Multiple(s0 float, s1 float, s2 float, s3 float) Compound(size: 16)
Copy the code

On Android, use SoftFP instead of HardFP. This means that floating points are passed in integer registers instead of floating point registers. In addition, we pass a pointer to the result. This leads to the strange situation where the first argument is partially passed in the integer register and partially passed on the stack.

M(r1 int32, r2 int32, r3 int32, S+0 int32) Compound(size: 16)
S+4 float
M(S+8 int32, S+12 int32, S+16 int32, S+20 int32) Compound(size: 16)
=>
P(r0 uint32) Compound(size: 16)
Copy the code

Any failure in any link can lead to segmentation failure at run time. Therefore, it is important to understand all aspects of the ABI correctly for each combination of hardware and operating system.

Explore via godbolt.org

Because the documentation is very brief, we found many corners using the compiler explorer godbolt.org. Compiler explorer displays C code and compiled assembly side by side.

The previous screenshot shows that sizeof(Struct3Bytes) is 3 bytes on Windows x86 because 3 has been moved into the return register EAX.

When we change the structure slightly, we can check that the size is still 3.

typedef struct {
  int16_t a0;
  int8_t a1;
} Struct3Bytes;
Copy the code

The size is not 3: mov eax, 4. Because int16 must be 2-byte aligned, the structure must be 2-byte aligned. This means that when an array of these structures is allocated, there is a 1-byte padding after each structure to ensure that the next structure is 2-byte aligned. Therefore, in the native ABI, the structure is 4 bytes.

Explore with generated tests

Unfortunately, compiler Explorer does not support MacOS and iOS. So, to make manual exploration more efficient (and to provide a nice and large suite of tests for this feature), we wrote a test generator.

The main idea is to generate tests in such a way that if they crash, GDB can be used to see what the problem is.

One way to make it easier to see what the problem is when you encounter segmentation failures is to have predictable, easily recognizable values for all parameters. For example, the following test uses consecutive integers, so these integer values can be easily found in registers and stacks.

void testPassStruct3BytesHomogeneousUint8x10() {
  final a0Pointer = calloc<Struct3BytesHomogeneousUint8>();
  final Struct3BytesHomogeneousUint8 a0 = a0Pointer.ref;
  final a1Pointer = calloc<Struct3BytesHomogeneousUint8>();
  // ...
  a0.a0 = 1;
  a0.a1 = 2;
  a0.a2 = 3;
  a1.a0 = 4; 
  // ...

  final result = passStruct3BytesHomogeneousUint8x10(
      a0, a1, a2, a3, a4, a5, a6, a7, a8, a9);
  print("result = $result");
  Expect.equals(465, result);

  calloc.free(a0Pointer);
  calloc.free(a1Pointer);
  // ...
}
Copy the code

Another way to make finding problems easier is to add prints here and there. For example, if we don’t experience segmentation failures during the transition from Dart to C, but we manage to mess up all the parameters, then printing the parameters helps.

int64_t
PassStruct3BytesHomogeneousUint8x10(Struct3BytesHomogeneousUint8 a0,
                                    Struct3BytesHomogeneousUint8 a1,
                                    // ...
                                   ) {
  std: :cout << "PassStruct3BytesHomogeneousUint8x10"
            << "(" << static_cast<int>(a0.a0) << ","
            << static_cast<int>(a0.a1) << "," << static_cast<int>(a0.a2)
            << "()," << static_cast<int>(a1.a0) << ",") < <// ...

  int64_t result = 0;
  result += a0.a0;
  result += a0.a1;
  result += a0.a2;
  result += a1.a0;
  // ...

  std: :cout << "result = " << result << "\n";
  return result;
}
Copy the code

Adding a test is as simple as adding a function type to a configuration file. The ability to quickly add tests results in a huge test suite.

Sure enough, the test suite caught another strange case in the native ABI — this time on ios-ARM64. On ARM64’s iOS, non-structural parameters in the stack are not aligned to the size of the word, but to their own size. The structure is word-sized aligned, but if the structure is a homogeneous structure with only floating points, it is aligned to the size of floating points.

conclusion

This concludes our tour of API design and ABI discovery. With a good test suite and a thorough code review, we logged support for the pass-by-value structure on the main branch of Dart FFI in December 2020, and it’s available in Dart 2.12! If you’re interested in using Dart FFI, you can start with the C interoperability documentation on Dart.dev. If you find any questions or comments about the API design and ABI, feel free to leave them in the comments below. We’d love to hear from you!

Thanks to the Dart language team and the Dart Virtual Machine team for contributing to this Dart FFI feature, and thanks to Kathy Walrath and Michael Thomsen for shaping this blog post


www.deepl.com translation