Arenas in Rust

Translator: MATRIXKOO/Post editor: Zhang Handong

Arenas in Rust


Arenas memory pool in Rust

There’s been some discussion lately about Arenas in Rust, and I thought I’d write an article about it.

Arenas are not “typical” problems in Rust, so few people know about them. You’ll only see them in applications with various use cases. Normally, you just need to swap, and there’s no need to use unsafe for it. So there’s no need to learn to write it, but that knowledge isn’t useless, especially for people who use Arenas.

In addition, MY implementation of self-referential Arenas involves a series of really cool Lifetime operations that I haven’t written about at all.

I wrote it mainly to write about some cool lifecycle effects, but I thought it was worth writing an introduction to all Rustaceans. If you already know what Arenas is and want to see some cool Lifetime tricks, you can jump right here to read it.

What is an arena?

Arenas is essentially a pattern for grouping memory with the same expected lifetime. For example, sometimes you need to allocate a bunch of objects during a certain life cycle, after which they will all be destroyed. It is inefficient to call the system allocator every time, and it is preferable to pre-allocate a bunch of memory for an object and clean it up as soon as it is processed.

That’s right, cache

Broadly speaking, Arenas is used for two reasons:

First, as mentioned above, the main goal of using it may be to reduce memory consumption. For example, in a game or application, there may be a large number of cases that need to be allocated frame by frame and destroyed immediately after use. Especially in game development, this is very common, and memory stress is a big concern for game developers. With Arenas, you can easily assign an Arena, fill it up every frame, and empty it out when the frame is over. Cache locality has other benefits: it ensures that most objects per frame are in the cache for the duration of the frame (which may be used more than other objects) because they are allocated next to each other.

Another reason might be to write self-referential data, such as complex graphs with rings, with which the data can be wiped all at once. For example, when writing a compiler, type information may need to reference other types or other data, resulting in complex looped graphs that may be of Type. Once the type is derived, it does not need to be destroyed specifically, so you can use an Arenas to store all the calculated type information, and when the type information is irrelevant, it can be cleaned up all at once. Using this pattern lets code not worry about whether the self-referencing bits will be released “early,” it guarantees that if there is a Ty, it will live as long as all the other TYs and can refer to them directly.

Translator’s note: This does not result in empty references

The two goals are not necessarily related: you can use an Arenas to achieve both goals simultaneously. However, it is also possible to have an Arenas that forbids the use of self-referential types (you gain some, you lose some). Later in this article, I’ll implement an Arenas that allows self-referential types but does little to ease memory allocation, mainly for ease of implementation. In general, if you want to write Arenas for self-referential types, you can reduce allocator stress at the same time, but there may be trade-offs.

How is arena used in Rust?

Generally speaking, to use an arena, you just need to switch. I did a quick search for the existing Arenas implementation, which is here. I’m going to introduce two libraries that I already know about, but I’m just saying “two”.

It should be noted that if all you need is a ring structure and you don’t have to use ARENas, then a good PetGraph is usually sufficient. Slotmap is also good; It is a map-like data structure that can be used for self-referential data based on generational indexes.

Bumpalo

Bumpalo is a fast Bump Allocator [1] that allows heterogeneous content and loops only if you don’t care about the destructor running.

See also: [1] blog.codingnow.com/2013/11/bum…

use bumpalo::Bump;

// (example slightly modified from `bumpalo` docs)

// Create a new arena to bump allocate into.
let bump = Bump::new();

// Allocate values into the arena.
let scooter = bump.alloc(Doggo {
    cuteness: u64::max_value(),
    age: 8,
    scritches_required: true});// Happy birthday, Scooter!
scooter.age += 1;
Copy the code

Each call to Bump:: Alloc () returns a mutable reference to the allocated object. This can allocate different objects, and they can even refer to each other (without rings, borrowing checks would enforce this). By default, it does not call the destructor on its content. However, you can use Bumpalo ::boxed (or a custom allocator at Nightly) to achieve this effect. You can similarly use Bumpalo :: Collections to get the vectors and strings that Bumpalo supports. Bumpalo ::boxed will not allow self-referrals. x

typed-arena

[typed – arena] (docs. Rs/typed – arena… Areana allocator, which can only store objects of a single type, but can be referenced in a loop:

// Example from typed-arena docs

use std::cell::Cell;
use typed_arena::Arena;

struct CycleParticipant<'a> {
    other: Cell<OptionThe < &'a CycleParticipant<'a> > >,}let arena = Arena::new();

let a = arena.alloc(CycleParticipant { other: Cell::new(None)});let b = arena.alloc(CycleParticipant { other: Cell::new(None)});// mutate them after the fact to set up a cycle
a.other.set(Some(b));
b.other.set(Some(a));
Copy the code

Unlike Bumpalo, the gite-arena destructor is used when the arena itself is out of scope

You may be wondering how safe the destructor is with reference data – after all, the destructor reads the suspended reference whenever a variable is destroyed a second time. We’ll cover this later in the article, but it has to do with drop checking, especially if self-referencing is attempted, and the only explicit destructors allowed by the arena element itself will be destructors with the appropriate tag type.

Implement a self-referential arena

Writing self-referential code is interesting because Rust is very wary of self-referential data. But Areana lets you clearly separate the “I don’t care about this object” and “you can delete this object” phases to allow self-referencing and looping types.

People rarely need to implement their own arenas, and Bumpalo and Typedarena cover most usage scenarios, so check out crates. IO first. However, if you really need a direct implementation, or are interested in specific lifecycle details, then this section is for you.

The key to implementing an Arena Arena with Entry Entry in the following rules:

  • ArenaEntryShould have life cycle parameters:Arena <'arena>Entry <'arena>
  • ArenaMethods should beArena <'arena>To receive for& 'arenaSelf, that is, its own type is< '&' arena arena arena >
  • EntryAlmost always< '&' arena Entry arena >(Creating aliases for this is useful)
  • Use internal variability;ArenaOn the& mut selfWill stop all code compilation. If you are usingunsafeThe variability of, please make sureRefCell <Entry <'arena >>withPhantomData 。

That’s basically it from a lifecycle point of view, all that’s left is to identify the required API. Knowing the rules above, you don’t need to know what the underlying life cycle is like as long as you ensure that the definition area is used with the required guarantees.

Let’s look at an implementation and then dissect how it works.

My library elsa implements an arena using 100% safe code in one of the examples. Since Elsa :: FrozenVec requires its content to be placed after indirect references, the arena is not economical with allocation, and it is not generic, but it is a reasonable way to illustrate how life cycles work without the hassle of using unsafe.

This example implements an arena of type Person <‘arena>, arena <‘arena>. The goal is to achieve some kind of directed social graph that may have loops.

use elsa::FrozenVec;

struct Arena<'arena> {
    people: FrozenVec<Box<Person<'arena> > >,}Copy the code

Elsa ::FrozenVec is an appending only abstraction similar to Vec that lets you call push without passing in mutable references, an implementation that uses only safe.

Each Person <‘arena> has a list of the people they follow, but also keeps track of the people they follow:

struct Person<'arena> {
    pub follows: FrozenVec<PersonRef<'arena> >,pub reverse_follows: FrozenVec<PersonRef<'arena> >,pub name: &'static str,}// following the rule above about references to entry types
type PersonRef<'arena> = &'arena Person<'arena>;
Copy the code

This life cycle arena is actually the “life cycle of the arena itself”. This is where things get weird: usually, if a life cycle parameter is available, the caller can choose what’s in it. Rather than just saying “this is the life cycle of the object itself,” callers can usually instantiate arena <‘static> or arena <‘a> for some ‘a ‘as needed. But here we declare that “an arena is the life cycle of an arena itself”; It was obvious that something was not quite right.

This is where we actually implement:


impl<'arena> Arena<'arena> {
    fn new() -> Arena<'arena> {
        Arena {
            people: FrozenVec::new(),
        }
    }
    
    fn add_person(&'arena self, name: &'static str,
                  follows: Vec<PersonRef<'arena>>) -> PersonRef<'arena> {
        let idx = self.people.len();
        self.people.push(Box::new(Person {
            name,
            follows: follows.into(),
            reverse_follows: Default::default(),
        }));
        let me = &self.people[idx];
        for friend in &me.follows {
            // We're mutating existing arena entries to add references,
            // potentially creating cycles!
            // Add a reference to each element, which may result in circular references
            friend.reverse_follows.push(me)
        }
        me
    }

    fn dump(&'arena self) {
        // code to print out every Person, their followers, and the people who follow them
        // Print out 'Person', their followers, and the people they follow}}Copy the code

Notice the &’arena self in add_person.

This is A good implementation of the “if A cares about B, then B cares about A” case that would normally be handled separately, but this is just an example.

Finally, we can use arena like this:

fn main() {
    let arena = Arena::new();
    let lonely = arena.add_person("lonely".vec![]);
    let best_friend = arena.add_person("best friend".vec![lonely]);
    let threes_a_crowd = arena.add_person("threes a crowd".vec![lonely, best_friend]);
    let rando = arena.add_person("rando".vec![]);
    let _everyone = arena.add_person("follows everyone".vec![rando, threes_a_crowd, lonely, best_friend]);
    arena.dump();
}
Copy the code

In this case, all of the “variability” happens in the implementation of arena itself, but this code might add elements directly to the FOLLOWS/Reverse_FOLLOWS list, or Person might have RefCells for other types of links.

How does the life cycle work

So how does this work? As mentioned earlier, with such abstractions in Rust, callers are usually free to set the lifetime depending on how they are handled. For example, if HashMap

, then ‘a ‘will be adjusted based on the lifetime of what you are trying to insert.
,>

When constructing an Arena, its life cycle is indeed still unlimited, which we can test by examining the code that enforces the life cycle below. (Still compilable)

let arena: Arena<'static> = Arena::new();
Copy the code

You stop working when you want to do something:

let arena: Arena<'static> = Arena::new();
let lonely = arena.add_person("lonely".vec![]);
Copy the code
error[E0597]: `arena` does not live long enough
  --> examples/mutable_arena.rs:5:18| 4 | let arena: Arena<'static> = Arena::new(); | -------------- type annotation requires that `arena` is borrowed for `'static` 5 | let lonely = arena.add_person("lonely", vec! []); | ^^^^^ borrowed value does not live long enough ... 11 | } | - `arena` dropped here while still borrowedCopy the code

The add_person method somehow forces the Arena parameter of an Arena to be set to its own life cycle, thereby constraining it (and cannot force it to be anything else with type annotations). This is a clever interaction with add_person’s &’arena self-signature (i.e. Self is &’arena arena <‘self>), and the fact that ‘arena in arena <‘arena> is the same life cycle.

Typically, in Rust programs, the life cycle is “scalable.” The following code can be compiled:

// ask for two strings *with the same lifetime*
// Require strings with the same life cycle
fn take_strings<'a>(x: &'a str, y: &'a str) {}

// string literal with lifetime 'static
// Require a 'string literal' with a 'static' life cycle
let lives_forever = "foo";
// owned string with shorter, local lifetime
// Requires a 'local' life cycle
let short_lived = String::from("bar");

// still works!
/ / run
take_strings(lives_forever, &*short_lived);
Copy the code

In this code, Rust is pleased to note that while live_forever and &* short_Lived have different lifecycles, it is perfectly acceptable to pretend life_forever has a shorter lifetime during the lifetime of the take_strings function. This is just a reference, and using a reference with a long lifetime is also good for short lifetime situations.

The truth is, this scalability is not the same for all life cycles used! The Nomicon Chapter on Subtyping and Variance details why this is the case, but the general rule of thumb is that most life cycles are “constricted” (or covariant, to be more technical), as in &a STR above, But if some form of variability is involved, they are immutable, also known as “invariants.” If a function type is used, it has an elastic life cycle (that is, resistant to change), but this is rare.

Our Arena <‘ Arena > uses internal variability (via FrozenVec) to keep the ‘Arena unchanged. Let’s look at two lines of code again. When the compiler sees the first line of code below, arena is built, and we call its life cycle “A”. The type of Arena is Arena <‘? Word-wrap: break-word! Important; “> Represented by a representation, but with an unlimited life cycle.

let arena = Arena::new(); 
let lonely = arena.add_person("lonely".vec![]);
Copy the code

Let’s make the life cycle a little bit clearer

let arena = Arena::new(); // type Arena<'? >, lives for 'a

// explicitly write the `self` that gets constructed when you call add_person
// Explicitly write out the constructor function when add_person is called
let ref_to_arena = &arena; // type &'a Arena<'? >
let lonely = Arena::add_person(ref_to_arena, "lonely".vec![]);
Copy the code

Remember the second rule I listed earlier?

  • Arena methods should accept Arena <‘ Arena > as &’ Arena itself, which is of type &’ Arena Arena <‘ Arena > we follow this rule;

The add_person signature is fn add_person(&’arena self). This means that the ref_to_arena lifetime must match the &’arena arena <‘arena> mode. Currently, its life cycle is &’a Arena <‘? >, for ‘? Force the same as ‘a, the lifetime of the arena variable itself. If the life cycle is mutable, the compiler can compress other lifetimes to fit it, but it is constant, and the unrestricted life is forced into an exact life cycle.

With this neat trick, we can force the compiler to set the lifetime parameter of Arena <‘ Arena > to the lifetime of the instance.

After that, the rest of the work is very simple. Arena <‘ Arena > has elements of type Person <‘ Arena >, that is: “Person is allowed to refer to elements that have an ‘Arena lifecycle, such as Arena.”

Type PersonRef <‘arena> =&’arena Person <‘arena> is a convenient shorthand for “referencing a reference in an arena from which the object Person is allowed to be referenced.

How does the destructor work

So far, I haven’t discussed how to ensure security in the presence of destructors. If Arena has circular references and writes a destructor to read those circular references, it will result in dangling references during destruction.

This is where Rust is very vague. There is nothing you need to know, except that “explicit destructors subtly alter borrowing checking behavior.” But understanding the mechanisms here will help build a better mental model.

Add the following code to the arena example:

impl<'arena> Drop for Person<'arena> {
    fn drop(&mut self) {
        println!("goodbye {:? }".self.name);
        for friend in &self.reverse_follows {
            // potentially dangling!
            println!("\t\t{}", friend.name); }}}Copy the code

Error:

error[E0597]: `arena` does not live long enough
  --> examples/mutable_arena.rs:5:18| 5 | let lonely = arena.add_person("lonely", vec! []); | ^^^^^ borrowed value does not live long enough ... 11 | } | - | | | `arena` dropped here while still borrowed | borrow might be used here, when `arena` is dropped and runs the destructor for type `Arena<'_>`Copy the code

The existence of a destructor subtly alters the behavior of the borrowing inspector over the lifetime of self-referential data. The exact rules are tricky and explained in Nomicon, but what actually happens is that after there is a custom destructor on Person <‘arena>, The ‘arena ‘of the’ Person Arena ‘becomes a’ life cycle observed at destruction ‘. It is then taken into account during borrowing checks – knowing that an implicit call to DROP () at the end of the scope can read ‘arena’s data, Rust concludes appropriately that since destruction itself is a mutable operation, it is feasible to call drop() to read the content after destruction.

Of course, a reasonable question to ask is, how do you store something like Box or FrozenVec in an arena if the destructor doesn’t allow you to “wrap” data in ‘arena ‘?

The reason is that Rust knows that Box::Drop cannot check the contents of Person.follows because it doesn’t know what Person is and won’t try to know.

Of course, there are exceptions to everything, and since the destructor can call the specified trait method (or specialized method) to tell you how to read the contents of Person, if a random generic type provides this method, you can subtly change the behavior of the borrowing inspector again. Stdlib types and other custom data structures do this by escaping # [may_dangle] (also known as “eyepatch” after all, the destructor “can’t see” the life cycle) and declaring that no life cycle or generic parameters are read from the definition destructor.

This also applies to libraries such as typed-arena; If you need to create a circular reference, you will not be able to write a custom destructor for a type placed on an arena. But as long as you avoid creating circular references, you can write custom destructors using the typed-arena; Therefore, internal variability cannot be used to make one arena point to another.

Thanks to Mark Cohen and Nika Layzell for reviewing the draft of this article.

Contents: Rust Chinese Collection (Rust_Magazine) march issue