Author: Chen Tian/Post editor: Zhang Handong

The original link: mp.weixin.qq.com/s/SJsEurfZr…

In Fundamentals of Generic Programming[1], Alexander Stepanov (founder of the concept of generics) describes the history of generalized computer technology in an elegant passage:

So far, he says, we have taken three steps:

  • The first step, a generic computer architecture, is to think of memory as a contiguous set of addressable Spaces
  • Step 2, common computer language: use Pointers as identifiers for uniform reference types
  • Step 3: Generic programming

Today we’re going to talk about generic programming.

Generalized Generic programming is divided into two parts: Generic Data Types, or Parameterized Types, and Generic functions.

Parameterized type

Let’s look at parameterized types first. Parameterized type refers to the definition of a data structure or type without specifying a specific type. Instead, it is used as a parameter, making the definition applicable to all specific types. The benefit of parameterized typing is that languages can be more expressive (somewhat similar to dynamically typed languages) while still maintaining full static type safety.

A less apt but more understandable analogy: Types are to data what generics are to general types. We abstract the core elements from the concrete data and construct the types that can contain the data. U8, for example, covers all possible values from 0 to 255. We further extract the common elements from the types to construct meta-types that can contain these types. For example, the metatype Add, which includes u32, F64, usize and even Complex, can be added.

Let’s look at a practical example.

We’re going to implement a data structure that reads a piece of data from a file, puts it in a cache, and then does a bunch of things to that data. We can easily define a structure like this:

struct MyReader {
  reader: File,
  buf: Vec<u8>,
}

impl MyReader {
  pub fn process(&mut self) {
    self.reader.read(&mut self.buf);
    // further processing on buf
  }
}
Copy the code

However, if the requirement reads not only from a file, but also from a network, from an encrypted data stream, or even from tape, doing the same caching and processing, then we have to use a nearly identical data structure for each requirement. If we use parameterized types, we can define them as follows:

struct MyReader<R> {
  reader: R,
  buf: Vec<u8>
}

impl<R> MyReader<R> {
  pub fn process(&mut self) {
    self.reader.read(&mut self.buf);
    // further processing on buf
  }
}
Copy the code

From the code, we can see more clearly that “the benefit of parameterized typing is that the language can be more expressive while still maintaining full static type safety.” It’s a very powerful tool.

But this poses a problem for the compiler: how does the compiler know at compile time that a Reader can perform a read() operation?

Can’t. Here, the argument R can be of any type, and most types do not directly support the read() operation. So, here we need to restrict the parameter R. The process is similar to the definition of a function:

Fn add(a: usize, b:) // error - We do not know how to allocate memory for a and b during function calls, so we need to restrict a and b further. Usize) // ok - Now we know the limits of A, B (memory size, allowed operations, etc.)Copy the code

Different languages use this in different ways. Java can require

Rust uses traits and Swift uses Protocol, but the goal is the same: the compiler needs enough information to determine if the code above compiles.

If you compile the above code with the Rust compiler, the compiler will give detailed errors:

It even recommends the right traits for you to limit R to, knowing you better than your girlfriend (or boyfriend) does.

Generic function

Static dispatching

Functions operate on objects that are types. When a data type uses a generic type, functions that use it as arguments or return values are also called generic functions. For example:

fn generic<T>(t: T) { todo! () } fn main() { generic::<u8>(42); generic::<&str>(&"hello"); }Copy the code

In the two calls to generic() from main(), the u8 type is used the first time and the &str type is used the second time. The compiler captures the type used at compile time and processes it accordingly, which is called static dispatch. Different languages treat static dispatch quite differently. In Rust, the method of processing is called monomorphization — in English, the compiler compiles a copy of all the types used in your code. For this example, you can imagine the compiler generating code similar to the following:

fn main() {
  generic_int(42);
  generic_ref_str(&"hello");
}
Copy the code

In addition to Rust, C++ also uses singularization to handle generic functions, except that the template C++ uses does not carry a type and is expanded directly during compilation.

Java treats generics differently than Rust does. Java erases types and compiles code like this:

void main() {
  generic((Object) 42);
  generic((Object) "hello");
}
Copy the code

Obviously, the Java approach incurs a performance penalty at runtime (additional Typecast going back and forth between specific types and objects), and because types are erased, it is difficult to make specific optimizations for the results of each generics compilation.

Dynamic dispatch

Static dispatch is great and efficient, but many times, types can be difficult to determine at compile time. Let’s say I want to write a formatting tool, which is common in ides. We can define a Formatter interface and create a series of implementations:

pub trait Formatter { fn format(&self, input &mut str) -> bool; } struct MarkdownFormatter; impl Formatter for MarkdownFormatter { fn format(&self, input &mut str) -> bool { todo! () } } struct RustFormatter; impl Formatter for RustFormatter { fn format(&self, input &mut str) -> bool { todo! () } } struct HtmlFormatter; impl Formatter for HtmlFormatter { fn format(&self, input &mut str) -> bool { todo! ()}}Copy the code

What formatting method to use can only be determined once we open the file and parse its contents. It is not possible to give a specific type at compile time that satisfies functions such as (a file may have one or more formatting tools, such as rust code in a Markdown file, So we need MarkdownFormatter and RustFormatter for formatting, so here we use a Vec to provide all the formatting tools needed) :

pub fn format(input: &mut str, formatters: Vec<???>) {
    for formatter in formatters {
        formatter.format(input);
    }
}
Copy the code

Normally, the types in the Vec<> container need to be consistent, but we can’t give a consistent type here.

So we need a way to tell the compiler that we need and only need any data type that implements the Formatter interface. In Rust, this type is called a Trait Object and is expressed as &dyn Trait or Box

. Here, the dyn keyword is simply used to help better distinguish between common types and Trait types. Thus, the above code can be written as:

pub fn format(input: &mut str, formatters: Vec<Box<dyn Formatter>>) { for formatter in formatters { formatter.format(input); }}Copy the code

In this way, we can construct a list of Formatters at run time, which we pass to the format function to format the file. This is called Dynamic Dispatching.

Although the Trait Object is a concept unique to Rust, it is not new. Let’s see how it works. For ease of introduction, let’s use the Write interface of the Rust library as an example:

pub trait Write { fn write(&mut self, buf: &[u8]) -> Result<usize>; fn flush(&mut self) -> Result<()>; fn write_all(&mut self, buf: &[u8]) -> Result<()> { ... }... }Copy the code

The Write interface contains several methods, of which Write () and flush() must be implemented. In the standard library, Vec

implements the Write interface.

When we want to use this interface for dynamic dispatch, we can assign a reference to a concrete type, such as Vec< U8 >, to the Write interface as shown in the following example:

This results in a Trait Object, whose underlying logic, as you can see above, is nothing more than a Fat pointer — a data structure that contains two Pointers. One pointer points to the data itself and the other to the virtual function table (VTable). This table contains specific types of information, such as size, aligment, and a series of function Pointers:

  • All methods supported by this interface (e.gwrite().flush()
  • A concrete type of drop trait, when a trait object (e.gBox<dyn Writer>It frees all resources it uses.

In this way, when writer.write() is executed, the corresponding function pointer can be found in the vtable to perform specific operations.

So the Trait Object in Rust is no mystery. It is just a variation of the well-known vtable in C++/Java. Of course, C++/Java Pointers to vtables are placed in the class structure at compile time, while Rust is placed in a Trait object. This is why Rust is easy to dynamically dispatch primitive types in a way that C++/Java cannot. In fact, Rust does not distinguish between basic types; for Rust, all types have the same status.

Most languages that implement dynamic dispatch through interfaces use Vtables to assist in dynamic invocation of interface methods, as does Golang (interfaces). Although Similar to Rust in many respects, Swift takes a relatively unique path in support of generics: Witness Table [2]. When I have time to talk about Swift, I can focus on Witeness Table, which is fun (can’t wait to see the video in Resources).

The following flow chart clearly illustrates how different languages implement static and dynamic dispatch, so you can read it carefully. If you read this picture, your understanding of generic functions should be clear enough:

(Sources: Models of Generics and Metaprogramming: Go, Rust, Swift, D and More[4])

The practice of generic programming

Generic programming is more of an idea than a technology. It is not only the type parameterization, function generization so simple, behind reflects the programmer’s abstract thinking ability. The ability of abstract thinking has nothing to do with language and tools. It is the ability that comes from continuous exploration, continuous learning and continuous practice. A language that supports generics doesn’t make you better at generic programming, just because you give me a Steinway doesn’t make me better at liszt’s bell.

The code on the left is the familiar C version of the binary_search algorithm that you can probably write blindfolded. On the right is a similar algorithm designed by Alexander Stepanov for the first version of C++ STL (called lower_bound because it not only uses binary search to return the result of a match, but also the position it should be in if it is not found) :

As you can see, the IMPLEMENTATION of the C version of the algorithm is tightly tied to the details of the parameters, while the Lower_bound version of Alex replaces all implementation details except the abstraction of the parameters with different functions — they have different implementations in the context of different data structures. In the case of distance(), for an array, it is the distance between two indexes; for a linked list, it is the number of hops between two nodes. In my opinion, the ability to distinguish between implementation details and core elements of the algorithm, and to defer implementation details as far back as possible (when the caller calls), is at the heart of generic programming. Alan Perlis has an excellent argument in Epigrams of Programming:

Functions delay binding: data structures induce binding. Moral: Structure data data in the programming process.

Let’s look at another example: the client and the server send a protobuf message, and both sides agree that the first four bytes of the message are the length of a protobuf, followed by a protobuf. We need to make a library to handle message sending and receiving: receive a complete message from the bottom layer and deserialize it to the upper layer, and when the upper layer needs to send a message, serialize the message, add length information and submit it to the bottom layer for sending out.

This requirement often starts with a defined underlying transport mechanism, such as WebSocket, and a defined upper-level message type. If according to the requirements of the direct processing, it is easy to do the system dead: the change of the upper message, will affect the design of the message sending and receiving library; Changes in the underlying transport mechanism, such as switching to HTTP/2 in another project, also affect the design of the message sending and receiving library. So if you don’t abstract your requirements well enough at first, you can get overwhelmed with changes later on.

We can extract the invariable parts: read with length/write with length, and serialization, deserialization. We try not to care what the underlying transport mechanism is when sending and receiving messages; And what kind of message will be sent from the top, do not care. The low-level transmission can be abstracted as Reader/Writer (Stream/Sink under Async), while the upper-level Message can be abstracted as Message. The data received and sent can be thought of as a Frame with different formats, so we can have a design like this:

Even frame structure changes can be quickly implemented without much fuss if they are well defined. You can see the specific code [5].

Sage moment

As I have stated in previous articles, the worldview formed during the development of a language greatly influences and limits the behavior and capabilities of the language itself. In pursuit of extreme efficiency (and zero-cost abstraction), Rust chose singleton over generics to handle static dispatch, which greatly affects compilation speed, and singleton means that Rust code does not distribute well in binary. Let other Rust code be called in the Rust ABI (instead of the C FFI). This also means that even though Rust can replace C in many cases, Rust may never replace C in the binary interface to operating system applications (ABI).

This post is a summary and review of Rust’s BBL, which I posted on Tubi last Thursday. The BBL talk is available on my Github: Tyrchen/Rust-Training, or click “Read the original.” Interestingly, On Saturday (4/30), Jon Gjengset covered almost the same theme: Dispatch and Fat Pointers[6] in the latest installment of his Crust of Rust series, which you can check out. Jon’s live sessions are great and definitely worth the time to learn.

This is the end of the series. Originally I only wanted to write three or four, but I didn’t expect to write eight. There are many other directions to continue writing, such as DEBUG, distributed, state machine, etc. :

But Rust isn’t the only thing in life. My Flutter 2 has just started and has yet to be completed. Swift and iOS development still has a lot of work to do. There are some exciting projects coming up in the Elixir community recently, and some students want me to talk about them. So, I’ll spend some time talking about other things. If you are interested in any topics related to software development, please feel free to discuss them with me.

The resources

[1] Fundamentals of Generic Programming:

[2] Implementing Swift Generics: youtu.be/ctS8FzqcRug

[3] How Swift Achieved Dynamic Linking the Where Rust Couldn ‘t: gankra. Making. IO/blah/Swift -…

[4] Models of Generics and Metaprogramming: Go, Rust, Swift, D and More: thume.ca/2019/07/14/…

[5] async – prost: github.com/tyrchen/asy…

[6] Crust of Rust: Dispatch and Fat Pointers: www.youtube.com/watch?v=xcy…

Gist.github.com/Kimundi/839…

The design of a language often affects the way it is used.

Swift uses Witness Tables to handle generics, while Rust copies specialized code at compile time for each scenario used, which makes some significant differences:

  • Rust compiles more slowly, while Swift compiles much faster
  • If you use Rust Library made by someone else, you need to integrate the source code to compile it, otherwise generics won’t work (you can only integrate it in C FFI form); Swift allows you to use third-party binaries without compromising the utility of generics.

This also illustrates the significant difference between Rust and Swift ecosystems — Swift was designed with the needs of potential business partners in mind.