Gain a deep understanding of Rust ownership, borrowing, and the life cycle

原文地址：Understanding Rust: ownership, borrowing, lifetimes

Author: Bugaevc

Translation from: The Gold Project

This article is permalink: github.com/xitu/gold-m…

Translator: Daning’s Luo Zhu

Proofreader: Hoarfroster, Zz Zhaojin

My understanding of these concepts is that once you’ve mastered them, all of them look natural and elegant.

I don’t start from scratch, and I don’t mechanically repeat the contents of official documents (though sometimes 🙈) — if you don’t already know the concepts, you should read the corresponding chapters now, as this article is a supplement to, not a replacement for, what’s in the book.

Also, I suggest you read this excellent article. It’s actually about a similar topic, but with a different focus, and it’s worth reading.

Let’s talk about what resources are first. Resources are precious, “heavy” things that can be acquired and released (or destroyed), such as a socket, an open file, a semaphore, a lock, an area of heap memory. Traditionally, all of these things have been created by calling a function that returns some kind of reference to the resource itself (a memory pointer or a file descriptor), requiring the programmer 👨🏻💻 to explicitly close the file when the program thinks it has finished working with the resource.

There are problems with this approach. To err is human. It is often easy to forget to free resources, leading to what is known as a memory leak. Worse, people may try to access resources that have been released (that is, used after release). If they are lucky, they will receive an error message, which may or may not help them identify and fix the error. Instead, the reference they have (though logically invalid) may still be a reference to a “memory location” that is already occupied by another resource. For example, the memory that other content has been stored, the file descriptors used by other open files, and so on. Trying to access an old resource through an invalid reference can destroy other resources or crash the program completely.

We do not discuss these issues in vain, because they are with us all the time. For example, the Google Chrome release blog has a long record of bugs and crashes that have been fixed by the use of released resources – and it’s a huge waste of effort to identify and fix them.

This is not to say that developers are stupid or forgetful, because the logical process itself is error-prone: it requires you to display the release of resources, but it does not force you to do so. In addition, we usually don’t notice that resources are being released because the problem rarely has any noticeable impact.

Sometimes achieving simple goals requires the invention of complex solutions that lead to more complex logic. It’s hard to avoid getting lost in a huge code base, and bugs pop up here and there until we get used to them. Most of these problems are easy to spot, but resource-related errors are harder to spot. Therefore, if the resource is exploited by the wild pointer, it can be very dangerous.

Of course, a new language like Rust won’t solve bugs for you, but it can successfully influence the way you think, bringing some architecture into your mind that makes this kind of error much less likely.

Rust gives you a safe and clear way to manage your resources. And it doesn’t allow you to manage it in any other way. It’s very strict, but isn’t that what we’re trying to do?

These limits are great for several reasons:

They make you think in the right way. After some experience with Rust, you’ll often find yourself trying to apply similar concepts even when they aren’t built into the syntax of other languages.
They make the code you write more secure. With the exception of a few very rare and extreme cases, Rust guarantees that almost all of your code will not involve the errors we’re talking about.
Although Rust would be as pleasant as a high-level language if it had garbage collection (I’m not saying JavaScript is pleasant!) , but Rust is as fast and close to the bottom as other low-level compiled languages.

With that in mind, let’s look at some of Rust’s advantages.

The ownership of

In Rust, there are clear rules about which block of code resources belong to. In the simplest case, a block of code creates an object that represents a resource. At the end of the block, the object is destroyed and the resource is freed. The important difference here is that the object is not some kind of “weak reference” that is easily forgotten. Internally, this object is just a wrapper for the exact same reference, while externally, it appears to be the resource it represents. When the end of the block of code that owns the resource is reached, the resource is automatically and predictably released.

When compiled to the end of the code that owns the memory, the program automatically and safely frees the resource. Mom no longer has to worry about forgetting to release resources! Because the behavior is automatic and predictable, it will do exactly as you expect.

Now, you might ask, why am I describing these trivial and obvious things, instead of just telling you what smart people call RAII? All right, let’s keep talking.

This concept applies to temporary objects. Such as the following: to write some text into the file – > special code block (for example, a function) will open a file, the result is to get a file object file descriptor (packaging)) – > then to some processing – > then at the end of the block to get a file object – > finally removed and closed the file descriptor.

But there are many scenarios in which this concept doesn’t work. You might want to pass the resource to others, share it between several “users” or even between threads.

Let’s take a look at these. First, you might want to pass the resource to someone else (transfer ownership), so that the person being transferred owns the resource, can do anything with it, and even more importantly, is responsible for releasing it. Rust supports this very well, in fact, this is what happens by default when you make resources available to others.

fn print_sum(v: Vec<i32>) {
    println!("{}", v[0] + v[1]);
    // V is removed and then released
}

fn main() {
    let mut v = Vec::new(); // Resources are created here
    for i in 1.1000 {
        v.push(i);
    }
    // Here, the variable v is used
    // Not less than 4000 bytes of memory
    // -------------------
    // Transfer ownership to the print_sum function
    print_sum(v);
    // We do not own and do not control the variable v in any way
    // Attempting to access v here will raise a compile-time error
    println!("We're done");
    // There will be no release
    // Because print_sum is responsible for everything in v
}
Copy the code

The process of transferring ownership is also known as moving because the resource is moved from an old location (for example, a local variable) to a new location (for example, a function parameter). From a performance perspective, only “weak references” are moved, so the process is fast. But to the code, it looks like we’re actually moving the entire resource to the new location.

There’s a difference between moving and copying. Broadly, they all mean copying data (which would be a “weak reference” if Rust allows copying resources), but after moving, the contents of the original variable are considered no longer valid or important. Rust actually treats the variable as “logically uninitialized,” that is, full of garbage, such as those just created. This type of variable is forbidden (unless you reinitialize it with a new value), and no redistribution of resources occurs: the person who owns the resource now is responsible for cleaning it up after completion.

Movement is not limited to passing parameters. You can move to a variable. You can also move to the return value. To do this, you can move from return values, variables, and function arguments. Implicit and explicit assignments are basically everywhere.

While mobile syntax is a perfectly reasonable way to handle resources, I’ll show later that it would be a disaster for plain old primitive numeric type variables (imagine being unable to copy the value of one int variable to another). Fortunately, Rust has a Copy signature. The type that implements it (used by all primitive types) uses copy syntax for allocation, and all other types use mobile syntax. This is easy to implement; if you want your type to be replicable, you only need to optionally implement the Copy feature.

fn print_sum(a: i32, b: i32) {
    println!("{}", a + b);
    // The copied variables a and b are removed and freed here
}

fn main() {
    let a = 35;
    let b = 42;
    // Copy and pass values
    // The ownership passed by the copied value is passed to print_sum:
    print_sum(a, b);
    // We still retain full control over the original a and b variables
    println!("We still have {} and {}", a, b);
    // The original a and B are removed and then released
}
Copy the code

Now, why is mobile grammar useful? Without them, everything seems so perfect. Well, not exactly. Sometimes, it’s the most logical thing to do. For example, the with_capacity function allocates a string buffer and returns it to the caller. Ownership is transferred, and the function no longer cares if the buffer lives or dies. The caller has full control over the buffer, including responsibility for its release.

It’s the same in C. A feature like Strdup will allocate memory, hand over its memory management to you, and expect you to manage it and eventually allocate it. The difference is that it’s just a pointer, and all they can do is ask or remind you to use free() when they’re done. The aforementioned mobility feature is almost impossible to do, and in Rust, it’s an integral part of the language.

Another example is an iterator adapter, such as count, which makes no sense to access an iterator afterwards.

The opposite question is when do we need to have multiple references to the same resource. The most obvious use case is the scenario where multithreading occurs. Otherwise, mobile syntax may always work if everything is done sequentially. Still, it’s inconvenient to keep moving things back and forth.

Sometimes, even though the code runs in strict order, it still feels like several things are happening at the same time. Imagine iterating over a vector. When the loop is complete, the iterator can transfer ownership of the associated vector to you, but you will not have any access to the vector within the loop. That said, unless you have ownership between your code and the iterator every iteration, it’s a mess. There also seems to be no way to walk through a tree without breaking the stack, then reconstructing it and preparing it for something else later.

At the same time, we won’t be able to perform multithreading, which is inconvenient or even annoying. Thankfully, there’s a cool concept of Rust that can help. That’s borrowing!

To borrow

When a function takes a reference rather than the value itself as an argument, we don’t need to return the value in order to return ownership, because in this case, we didn’t take ownership at all. This method of passing arguments to a function by reference is also called borrowing. — Rust’s Authoritative Guide

We have a variety of ways to interpret borrowing:

It allows us to have multiple references to a resource while still adhering to the concept of “single owner, single responsibility.”
References are similar to Pointers in THE C language.
A reference is also an object. Mutable references are moved and immutable references are copied. When the reference is removed, the borrowing terminates (depending on the lifecycle rule, see the next section).
In the simplest case, a reference behaves like moving back ownership without explicitly doing so.

Here’s what the last one means:

// No borrowing occurred
fn print_sum1(v: Vec<i32- > >)Vec<i32> {
    println!("{}", v[0] + v[1]);
    // Return v to return ownership
    // By the way, since Rust is based on expressions, there is no need to use the return keyword here to return the value
    v
}

// Have borrowed, explicit reference
fn print_sum2(vr: &Vec<i32>) {
    println!("{}", (*vr)[0] + (*vr)[1]);
    // vr is a reference and is removed here because borrowing ends
}

// This is what you should do
fn print_sum3(v: &Vec<i32>) {
    println!("{}", v[0] + v[1]);
    / / with print_sum2
}

fn main() {
    let mut v = Vec::new(); // Create a mutable array
    for i in 1.1000 {
        v.push(i);
    }
    // In this case, v is used
    // No more than 4000 bytes of memory

    // Pass ownership of v to print_sum and return v after execution
    v = print_sum1(v);
    // Now we have regained the ownership of V
    println!("(1) We still have v: {}, {}, ...", v[0], v[1]);

    // Take a reference to v and pass it to print_sum2 (borrow it)
    print_sum2(&v);
    // v can still be used
    println!("(2) We still have v: {}, {}, ...", v[0], v[1]);

    // This is still ok
    print_sum3(&v);
    println!("(3) We still have v: {}, {}, ...", v[0], v[1]);

    // V is removed and released at this point
}
Copy the code

Let’s see what happens here. In the first function, we could always transfer ownership, but we’ve convinced ourselves that sometimes this is not what we want.

In the second function, we make a reference to the vector and pass it to the function. Much like C, we get objects by dereferencing them. Since there is no complex life cycle, borrowing ends when references are removed. Although it looks like the first example, there is an important difference. The main function owns the vector and can only restrict borrowing from it. In this example, the main function doesn’t even have a chance to look at the vector while borrowing it, so it’s no big deal.

The third function combines the advantages of not having to dereferencing the first function and not messing up the ownership of the second function. This is possible because of Rust’s automatic dereference rules. These are a bit complicated, but in most cases, they allow you to write code almost as if you were using an object referred to by a reference, much like C++ references.

Here’s another example:

// Get v by immutable reference
fn count_occurences(v: &Vec<i32>, val: i32) - >usize {
    v.into_iter().filter(|&&x| x == val).count()
}

fn main() {
    let v = vec![2.9.3.1.3.2.5.5.2];
    // Borrow v for iteration
    for &item in &v {
        // the first borrow is still active
        // The first loan is still valid
        // Here we borrow it for the second time
        let res = count_occurences(&v, item);
        println!("{} is repeated {} times", item, res); }}Copy the code

You don’t care what happens inside the count_Occurrences function, just know that it’s borrowing the Vector (again, without moving it). The loop also borrows the vector, so we have two borrows active at the same time. When the loop ends, the main function deletes the vector.

Ha ha, I will be a little bad. I mentioned earlier that multithreading is the main reason to refer to it, but all the examples I’ve shown are single-threaded. If you’re really interested, you can get some details about multithreading in Rust.

Getting and deleting references seems to work well, as if garbage collection were involved. But that’s not the case. This is all done at compile time. For that, Rust needed another magical concept. Let’s look at the following example code:

fn middle_name(full_name: &str) - > &str {
    full_name.split_whitespace().nth(1).unwrap()
}

fn main() {
    let name = String::from("Harry James Potter");
    let res = middle_name(&name);
    assert_eq!(res, "James");
}
Copy the code

This can be compiled successfully, but the following code cannot be compiled:

fn middle_name(full_name: &str) - > &str {
    full_name.split_whitespace().nth(1).unwrap()
}

fn main() {
    let res;
    {
        let name = String::from("Harry James Potter");
        res = middle_name(&name);
        // 'name' is removed here and is later released
    }
    assert_eq!(res, "James");
}
Copy the code

First, let’s explain the string type. String has a String buffer, and an &str (String slice) is either a segment of String type or a segment of other memory (not important here).

To make the explanation more obvious, I write something similar in C:

By the way: In C, you can’t get the middle part of the string, because marking the end of the string would require changing the string, so we’re limited to looking for the last name here.

#include <stdlib.h>
#include <stdio.h>
#include <string.h>

const char *last_name(const char *full_name)
{
    return strrchr(full_name, ' ') + 1;
}

int main(a) {
    const char *buffer = strcpy(malloc(80), "Harry Potter");
    const char *res = last_name(buffer);
    free(buffer);
    printf("%s\n", res);

    return 0;
}
Copy the code

Do you understand now? Before using the results, the buffers are deleted and reallocated. This is a special example of using a resource after release. If the implementation of Printf does not immediately use the memory for other purposes, this C code can compile and run fine. However, in a less specific example, it is still a source of crashes, errors, and security holes. That’s what we talked about before we introduced ownership.

You can’t even compile in Rust (I mean the Rust code above). This static analysis mechanism is built into the language and is available throughout the life cycle.

The life cycle

Resources have a life cycle in Rust. They exist from the moment they are created to the moment they are removed. A lifecycle is often thought of as a scope or block of code, but this is actually not an accurate representation because resources can be moved between blocks of code, as we have seen. We cannot refer to objects that have not been created or deleted, and we will see how this requirement is enforced shortly. Otherwise, none of this seems so different from the concept of ownership.

This is the harder part to understand. References and other objects also have lifetimes, and these lifetimes may be different from the lifetimes of their borrowings (the so-called associated lifecycles).

Let’s rewrite the code. A loan may last longer than the reference it controls. This is usually because another reference can be used, depending on whether the borrowing is active — you can borrow the same object or just part of it, such as the string slice in the above example.

In fact, each reference remembers the borrowing period it represents, which means that each reference has a life cycle. Like all things related to “borrowing checks,” this is done at compile time and incurs no runtime overhead. Unlike other things, you sometimes have to specify lifecycle details explicitly.

With all that said, let’s dig deeper in code:

fn middle_name<'a>(full_name: &'a str) - > &'a str {
    full_name.split_whitespace().nth(1).unwrap()
}

fn main() {
    let name = String::from("Harry James Potter");
    let res = middle_name(&name);
    assert_eq!(res, "James");

    // Will not be compiled:

    /* let res; { let name = String::from("Harry James Potter"); res = middle_name(&name); } assert_eq! (res, "James"); * /
}
Copy the code

In the previous example, we don’t have to specify the lifecycle explicitly, because the lifecycle is detailed enough for the Rust compiler to find it automatically (see [Lifetime ision](doc.rust-lang.org/book/lifeti… # lifetime – elision)). Anyway, we’ve shown how they work here.

<> means that the function is generic throughout its lifetime, and we call it a. That is, for any reference passed in with an associated lifecycle, it will return another reference with the same associated lifecycle. As a friendly reminder, the life cycle of the association is the borrowed life cycle, not the referenced life cycle.

In practice, its implications may not be obvious, so let’s look at it the opposite way. The returned reference is stored in the res variable, which is valid throughout the scope of main(). That’s the life cycle of the reference, so the borrows have at least been around for a long time. This means that the associated life cycle of the parameters passed to the function must be the same, so we can conclude that the name variable must be borrowed for the entire function.

In the example used after release (annotated here), the lifetime of the RES is still the entire function, while the lifetime of the name is not long enough for the borrowing to be valid throughout the entire function. If you try to compile this code, you will no doubt trigger a compilation error.

The Rust compiler tries to keep the life cycle of loaning as short as possible, ideally ending once the reference is removed (this is what I called the “simplest case” at the beginning of the loaning section). The constraint “borrowing should have as long a life cycle as possible” works the other way around, such that the life cycle from result to the original borrowing will stretch very long. As long as all the constraints are met, the process stops, or if not, an error occurs.

You can’t trick Rust into returning borrowed values that are completely unrelated to the lifecycle, because then you would get the same does not live long enough error message in the function, because the irrelevant lifecycle might be much longer than the one passed in.

Let’s take a look at this example:

fn search<'a.'b>(needle: &'a str, haystack: &'b str) - >OptionThe < &'b str> {
    // Imagine some clever algorithm here
    // Returns a slice of the original string
    let len = needle.len();
    if haystack.chars().nth(0) == needle.chars().nth(0) {
        Some(&haystack[..len])
    } else if haystack.chars().nth(1) == needle.chars().nth(0) {
        Some(&haystack[1..len+1])}else {
        None}}fn main() {
    let haystack = "hello little girl";
    let res;
    {
        let needle = String::from("ello");
        res = search(&needle, haystack);
    }
    match res {
        Some(x) => println!("found {}", x),
        None= >println!("nothing found")}// Output "found ello"
}
Copy the code

The search function accepts two references that have completely unrelated lifecycles. Although Haystack is limited, the only requirement about a needle is that the borrowing must be valid while the function itself is executing. When done, borrowing ends immediately and we can safely reallocate the associated memory while still leaving the result of the function unchanged.

Haystack is initialized with a string literal. These are string slices of type & ‘static STR (a always valid borrow). Therefore, we can keep the RES variable in the valid range when needed. This is an exception to the rule of borrowing for the shortest possible term. You can think of this as another restriction on “borrowing strings” : string literal borrowing must last the entire execution time of the program.

Finally, instead of returning the reference itself, we return an internal compound object. This is fully supported and does not affect our lifetime logic.

Therefore, in this example, the function takes two arguments and is generic for both lifetimes. Let’s see what happens if we set the life cycle to the same:

fn the_longest<'a>(s1: &'a str, s2: &'a str) - > &'a str {
    if s1.len() > s2.len() { s1 } else { s2 }
}

fn main() {
    let s1 = String::from("Python");
    // Borrow explicitly to ensure that the borrowing lasts longer than S2
    let s1_b = &s1;
    {
        let s2 = String::from("C");
        let res = the_longest(s1_b, &s2);
        println!("{} is the longest if you judge by name", res); }}Copy the code

I explicitly borrowed outside of the inner code block, so the borrowing will work throughout the rest of main(). This is obviously not the same life cycle as &S2. If only two arguments with the same life cycle are accepted, why is the function called here?

It turns out that the associated lifecycle is constrained by type coercion. Unlike most languages (at least those I’m familiar with), the original (integer) values in Rust are not cast, and for that you must always explicitly cast them. You can find casts in less obvious places, such as these associated life cycles and dynamic Dispatch with Type Erasure.

Let’s compare this with C++ code:

struct A {
    int x;
};

struct B: A {
    int y;
};

struct C: B {
    int z;
};

B func(B arg)
{
    return arg;
}

int main(a) {
    A a;
    B b;
    /* * This works fine: A value of B is A valid value of A * in other words, A value of B */ can be used whenever A value is expected
    a = b;
    /* * On the other hand, it would be a mistake */

    // b = a;

    // This works very well
    C arg;
    A res = func(arg);
    return 0;
}
Copy the code

Derived types are forced to be their base types. When we pass an instance of C, it casts it to B, and then returns, casts it to A, and then stores it in the res variable.

Similarly, in Rust, longer borrows can be forced to shorten. It does not affect borrowing per se, but only where it is needed for a shorter period. Therefore, you can pass a loanlonger than expected for a function (it will be enforced), and you can force the return loanshorter.

Consider this example again:

fn middle_name<'a>(full_name: &'a str) - > &'a str {
    full_name.split_whitespace().nth(1).unwrap()
}

fn main() {
    let name = String::from("Harry James Potter");
    let res = middle_name(&name);
    assert_eq!(res, "James");

    // Will not be compiled:

    /* let res; { let name = String::from("Harry James Potter"); res = middle_name(&name); } assert_eq! (res, "James"); * /
}
Copy the code

One often wonders whether such a function declaration means that the associated lifetime of the parameter must be (at least) as long as the return value, and vice versa.

The answer should be obvious by now. For functions, the two lifecycles are exactly the same. But because you can force it, you can borrow it for longer, and you can even shorten the associated lifetime of the result once it is obtained. So the correct answer is that the argument must be at least as long as the return value.

Also, if you create a function that takes multiple arguments by reference and declare that they must have equal associated lifecycles (as in our previous example), the actual argument to the function will be forced to have the shortest lifetime among them. This simply means that the result cannot exceed any of the borrowed parameters.

This works well with the reverse constraint rule we discussed earlier. The called person doesn’t care about this – it just gets and returns borrows for the same lifecycle.

On the other hand, the caller ensures that the associated life cycle of the parameters is never shorter than the life cycle of the result, which can be achieved by extending them.

tip

You cannot remove a loaned value because it must remain valid after the loaning ends. Even if you move something back in the next line, you can’t move it out. but[mem::replace](https://doc.rust-lang.org/std/mem/fn.replace.html)Features allow you to do both at the same time.
If you want to have something like C++unique_ptrThe same pointer that can be used[Box](https://doc.rust-lang.org/std/boxed/index.html)Type.
If you want to do some basic reference counting – for example in C ++shared_ptr 和 weak_ptr, can be usedThese standard modules
If you really need to get rid of the restrictions imposed by Rust, you can always turn to Unsafe code

If you find any errors in the translation or other areas that need improvement, you are welcome to revise and PR the translation in the Gold Translation program, and you can also get corresponding bonus points. The permanent link to this article at the beginning of this article is the MarkDown link to this article on GitHub.

Diggings translation project is a community for translating quality Internet technical articles from diggings English sharing articles. The content covers the fields of Android, iOS, front end, back end, blockchain, products, design, artificial intelligence and so on. For more high-quality translations, please keep paying attention to The Translation Project, official weibo and zhihu column.

Gain a deep understanding of Rust ownership, borrowing, and the life cycle

The ownership of

To borrow

The life cycle

tip

Related Posts

Excel export in Vernacular

ABAP and Some language features of the DEFAULT mechanism in Java

Netty series: Set up HTTP upload file server