A closer look at Ownership in Rust

The original link: https://blog.thoughtram.io/ownership-in-rust/

Do you want to learn Rust and always hear about Ownership and Borrowing, but you don’t fully understand what it is? Ownership is important, and understanding this concept is a great benefit to beginners of Rust, and it can also help you avoid many compilation errors as you implement your programs. In our previous article, we discussed the ownership model from a JavaScript developer’s perspective. In this article, we’ll take a closer look at how Rust manages memory and how ownership greatly affects the way we write code in Rust and how it keeps memory safe. When you have finished reading this article, you may want to check out two other articles References in Rust and Difference Between String and & STR.

What is memory security? ( What is Memory Safety anyway?)

In discussing what makes Rust stand out as a programming language, it’s best to first understand what memory security means. Especially if you don’t have a systems programming background or primarily use languages with garbage collection mechanisms, this basic feature of Rust can be difficult to understand. As Will Crichton puts it in A great article about Memory Safety in Rust: A Case Study with C:

Memory safety is the property of a program where memory pointers used always point to valid memory, Allocated and of the correct type/size. Memory safety is a frat issue — a Memory unsafe program may crash or produce nondeterministic output depending on the bug.

In fact, many languages allow us to write “memory-unsafe” code, which makes it more bug-prone. Like the following:

  • Dangling pointer: A pointer to invalid data (this makes sense when we understand how data is stored in memory). You can learn more about dangling Pointers here
  • Double frees: Attempts to free the same memory address twice, which results in “undefined behavior.” Learn more here. To illustrate the concept of dangling Pointers, let’s look at the following C++ code and how it is represented in memory:
std: :string s = "Have a nice day";

Copy the code

The initialized string is usually represented in memory using the heap and stack, as follows:

                    buffer

/ capacity

/ / length

/ / /

+ - + - + - +

Stack frame │ │ 16 │ 15 │ < -- s

+ - │ - + - + - +



[- │ -- -- -- -- -- -- -- -- -- -- -- --, capacity -- -- -- -- -- -- -- -- -- -- -- -- -]



+ - V - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - +

Heap │ H │ a │ v │ e │ a │ n │ I │ c │ e │ d │ a │ y │

+–––+–––+–––+–––+–––+–––+–––+–––+–––+–––+–––+–––+–––+–––+–––+–––+



[-- -- -- -- -- -- -- -- -- -- -- -- - length -- -- -- -- -- -- -- -- -- -- -- -- -]

Copy the code

We’ll get to what stacks and heaps are in a minute, but for now it’s more important to understand that what’s stored on the stack is the STD :: String object itself, which is a fixed length of three words (Word). It contains fields that point to a buffer allocated to the heap, where the data is actually stored, as well as the size of the buffer and the length of the text. In other words, STD :: String owns its buffer. When the program destroys the string, the corresponding buffer is also released through the string destructor. However, if we create a pointer to a character in the same buffer, it is entirely possible to have a dangling pointer after the string has been destroyed and the pointer to that character is invalid. If you’re wondering how this works when you’re writing programs in languages like JavaScript or Python, it’s because those languages have garbage collection mechanisms. This means that these languages run with a program that walks through memory and frees anything that is no longer needed. Such programs are called Garbage collectors. While having a garbage collector sounds great, it comes at a price. Because the garbage collector works while your program is running, this must affect the overall performance of your program. Rust does not have a garbage collector, but instead uses ownership and borrowing to solve the problem of securing memory. When we say Rust is memory-safe, we mean that by default Rust’s compiler does not allow us to write memory-unsafe code at all. How cool is that?

Stack and Heap

Before we dive into how Rust handles data ownership, let’s take a quick look at what a heap and stack are and how they relate to what data is stored where. Both heap and stack are parts of memory but are represented by different data data structures. The stack is stored in the order in which the data comes in, but removed in the reverse order (which is faster). The heap is more like a tree structure, but requires a bit more computation when reading or writing data. Which data is on the stack and which data is on the heap depends on the data we are working with. In Rust, any fixed size (the size that can be known at compile time), such as Machine integer, floating-point types, pointer types, and a few other types, is stored on the stack. Dynamic and “unsized” data is stored on the heap. This is because these types, which do not know the size, often need either to be able to grow dynamically or to perform an exact cleanup when they are destructed (this is not just popping a value off the stack). This is why, in the previous example, a string object with a fixed size (a buffer pointer, capacity, and length) is itself a pointer stored on the stack, while the buffer (raw data) is stored on the heap. As for Rust, the language itself generally avoids storing data on the heap and the compiler does not do so implicitly. To do this explicitly, Rust has corresponding pointer types, such as Box, which we’ll cover in another article. To learn more about stacks and heaps, I highly recommend checking out Rust’s Official Chapter on Ownership.

Understanding Ownership

Now that we have some understanding of how data is stored, let’s look at ownership in Rust. In Rust, each value has a unique owner that determines its lifetime. If we compare the C++ code above with the equivalent writing in Rust, the data is stored in memory in almost the same way.

let s = "Have a nice day".to_string();

Copy the code

Similarly, when the owner of a value is freed, or in Rust’s terminology dropped, that value is also dropped. When are these values discarded? That’s the attraction. When the program leaves the variable in the life block, the variable is discarded and the value of the variable is discarded. A block can be a function, an if statement, or almost any code block introduced with braces. Suppose we have the following function:

fn greeting() {

let s = "Have a nice day".to_string();

println!("{}", s); // `s` is dropped here

}

Copy the code

By looking at the code above, we can tell the life cycle of S because we know that Rust will discard its value at the end of the function block. This is also true when dealing with more complex data structures. Let’s take a look at the following code:

let names = vec!["Pascal".to_string(), "Christoph".to_string()];

Copy the code

The above code creates a vector of names. Rust’s vector is like an array or list, but it grows dynamically. We can call push() at run time to put the value in. Our memory looks like this:

[- names -]

+–––+–––+–––+

Stack frame │ │ 3 │ 2 │

+ - │ - + - + - +



[-- │ -- 0 --] [–––– 1 ––––]

+ - V - + - + - + - + - + - + - + - +

Heap │ │ 8 │ 6 │ 12 │ 9 │

+ - │ - + - + - + - │ - + - + - + - + - +

│ \ \ \ │

│ \ \ length

│ \ capacity

│ buffer │

│ │

+ - V - + - + - + - + - + - + - + - +

│ P │ a │ S │ C │ a │ L │

+–––+–––+–––+–––+–––+–––+–––+–––+





+ - V - + - + - + - + - + - + - + - + - + - + - + - +

│ C │ H │ r │ I │ S │ t │ o │ p │ h │ │

+–––+–––+–––+–––+–––+–––+–––+–––+–––+–––+–––+–––+



Copy the code

Notice here how the vector itself and its capacity and length, similar to the string object above, are stored on the stack. It also carries a pointer to where the vector data is stored on the heap. The string objects in the vector are stored on the heap as well, and also have their own private buffers. This creates a tree structure in which each value is owned by a unique variable. When names leaves scope, its value is discarded and eventually causes the string buffer to be discarded as well. This may raise a number of questions worth pondering. How does Rust guarantee that only one variable has its value? How can we have multiple variables pointing to the same data? We’re going to be forced to copy everything to make sure that a value is owned by a unique variable, right?

Moves and Borrowing

Let’s start with the first question, how does Rust guarantee that only one variable has its value? This is because when Rust does something like assign or pass a value to a function, Rust moves the value to the new owner. This is a very important concept because it affects the way we write code in Rust. Let’s look at the following code:

let name = "Pascal".to_string();

let a = name;

let b = name;

Copy the code

If you’re familiar with languages like Python or JavaScript, you might think that both A and B have a reference to name and that they both point to the same data. But when we tried compiling the code above, we quickly realized that wasn’t the case:

error[E0382]: use of moved value: `name`

--> src/main.rs:4:11

|

2 | let name = "Pascal".to_string();

| ---- move occurs because `name` has type `std::string::String`, which does not implement the `Copy` trait

3 | let a = name;

| ---- value moved here

4 | let b = name;

| ^^^^ value used here after move

Copy the code

We get a compilation error with a lot of (useful) information. The compiler tells us that we are trying to assign name to B after moving it to A. The problem is that when we try to assign name to B, name doesn’t actually have a value anymore. Why is that? Because at this point, ownership has been shifted to A. Let’s look at what’s going on in memory so that we can better understand what’s going on. When name is initialized, it looks like our previous example:

            +–––+–––+–––+

Stack frame │ │ 8 │ 6 │ < -- name

+ - │ - + - + - +



+ - V - + - + - + - + - + - + - + - +

Heap │ P │ a │ s │ C │ a │ L │

+–––+–––+–––+–––+–––+–––+–––+–––+

Copy the code

However, when we assign the value of name to A, we also give ownership to A, and the name is uninitialized.

[-- name --] [-- a --]

+–––+–––+–––+–––+–––+–––+

Stack frame │ │ │ • │ 8 │ 6 │

+ - + - + - + - │ - + - + - +



+ -- -- -- -- -- - +



+ - V - + - + - + - + - + - + - + - +

Heap │ P │ a │ s │ C │ a │ L │

+–––+–––+–––+–––+–––+–––+–––+–––+

Copy the code

At this point, it is not surprising that the expression let b = name produces an error. The important point here is that all of this static analysis is done by the compiler without actually running our code. Remember when I said Rust’s compiler doesn’t allow us to write memory-unsafe code? So, how do we deal with this? What if we really want to have multiple variables pointing to the same piece of data? There are two ways to do this, depending on the situation. Copying or cloning the values to handle this is probably the simplest but most expensive way. Obviously, this also means that we will eventually copy the data in memory:

let name = "Pascal".to_string();

let a = name;

let b = a.clone();

Copy the code

Remember, we don’t need to clone the value from name to A because we’re not trying to read the name after the value of name is assigned to A. When we run this program, data is represented in memory as follows before it is discarded:

[-- name --][-- a --][–––– b ––––]

+–––+–––+–––+–––+–––+–––+–––+–––+–––+

Stack frame │ │ │ │ 8 │ 6 │ 8 │ 6 │ 8 │ 6 │

+ - + - + - + - │ - + - + - + - │ - + - + - +

│ │

+ -- -- -- -- -- - + + - - - - +

│ │

+ - V - + - + - + - + - + - + - + - + - V - + - + - + - + - + - + - + - +

Heap │ P │ a │ s │ c │ a │ l │ P │ a │ s │ c │ a │ l │

+–––+–––+–––+–––+–––+–––+–––+–––+–––+–––+–––+–––+–––+–––+–––+–––+

Copy the code

Obviously, it’s not always possible to clone data. Depending on the data we are working with, cloning can sometimes be a very memory intensive operation. Often all we need may be a reference to a value. This is useful when we write functions that don’t require value ownership. Suppose there is a greet() function that passes in name and simply prints it out:

fn greet(name: String) {

println!("Hello, {}!", name);

}

Copy the code

This function does not require the ownership of the passed value to output. Also, this prevents us from calling the function multiple times with the same variable passed in:

let name = "Pascal".to_string();

greet(name);

greet(name); // Move happened earlier so this won't compile

Copy the code

We refer to a variable using the ampersand symbol. When we need a reference, we can use this notation:

fn greet(name: &String) {

println!("Hello, {}!", name);

}

Copy the code

To be clear, we might design this API with & STR instead for various reasons, but we don’t want to make it too complicated here, because we only need an &String right now. Greet () now expects to pass in a string reference so that it can call it multiple times:

let name = "Pascal".to_string();

greet(&name);

greet(&name);

Copy the code

When a function expects to pass in a reference to a value, we say the function borrows the value. Note that the function here never gains ownership of the passed value. We can solve the previous variable assignment problem in a similar way:

let name = "Pascal".to_string();

let a = &name;

let b = &name;

Copy the code

Using the code above, name does not lose ownership and A and B are just Pointers to the same data. The same goes for the following expressions:

let name = "Pascal".to_string();

let a = &name;

let b = a;

Copy the code

There is no problem calling greet() between these assignments:

let name = "Pascal".to_string();

let a = &name;

greet(a);

let b = a;

greet(a);

Copy the code

Conclusion (Conclusion)

These are just the tip of the iceberg. There’s a lot more to consider about data ownership, borrowing, and movement, but hopefully this article has given you a basic understanding of the principles behind how Rust keeps memory safe.