Explained: Ownership in Rust
If you’re someone who works mostly with either Java, Python or JavaScript, then you’re already familiar with garbage collected languages.
However, if you’ve been working with either C, C++ or Assembly (kudos to you btw!), then you must be familiar with manual memory management.
Memory management is the process of allocating memory and deallocating memory. In other words, it’s the process of finding unused memory and later returning that memory when it is no longer used.
- The Rust Book
But what is the difference between these two types of languages — garbage collected and non-garbage collected (manual memory management)?
Garbage Collection
In garbage collected languages, there’s a garbage collector program that identifies and reclaims memory that is no longer used.
It is a form of automatic memory management which helps prevent memory leaks and optimise the use of available memory.
The garbage collector automatically frees up memory allocated to objects that are no longer reachable or needed, allowing you, the programmer, to focus on other aspects of coding without worrying about manually managing memory deallocation.
However, garbage collection introduces performance overheads due to the additional work required to automatically manage memory.
Manual Memory Management
Non-garbage collected languages on the other hand are those where memory management is typically handled manually by the programmer.
This means that programmers must explicitly allocate memory, and deallocate it to prevent memory leaks and other issues.
Though manual memory management gives developers direct control over when and how memory is allocated and deallocated and has no garbage collection overheads, but it also comes with several potential issues like memory leaks (when allocated memory is not properly deallocated), dangling pointers (when a pointer still references a memory location after that memory has been freed), double free (when an attempt is made to free the same memory location more than once) etc.
So, where does Rust fit in?
Rust does not use a garbage collector nor does it rely on manual memory management.
Instead, Rust employs a unique memory management system based on ownership with the help of the borrow checker.
It guarantees memory-safety, preventing common bugs associated with manual memory management and has no runtime overhead associated with garbage collection.
But before we talk about ownership in detail, first let’s understand a few concepts.
Stack vs Heap
If you’re coming from a traditional computer science background, then you might remember learning about the stack and the heap memory.
Stack memory is for static memory allocation, typically used for variables with a known lifetime and scope.
Heap memory is for dynamic memory allocation, suitable for objects that need to persist beyond the scope of a single function call.
Let’s talk about memory within the context of Rust.
Rust’s Memory Model
Variables Live in the Stack
When you call a function in Rust, it allocates a stack frame for it. You can think of a stack frame as a mapping from variables to their values within a single scope, such as a function.
For example:
In the given code, when main is called, Rust allocates a stack frame for it and after the function returns, Rust automatically deallocates the function’s frame.
L1 — Function Entry (fn main() {):
- The
mainfunction is called. - The stack frame for
mainis created, but it's currently empty.
L2 — First Variable Declaration (let x = 1;):
- Variable
xis declared and initialized to1. - The stack frame now contains
xwith a value of1.
L3 — Second Variable Declaration (let y = 1;):
- Variable
yis declared and initialized to1. - The stack frame now contains both
xandy, each with a value of1.
L4 — Function Exit (}):
- The
mainfunction is about to exit. - The stack frame for
mainis emptied as the function scope ends, and local variablesxandygo out of scope.
Consider another example:
L1 — Function Entry (fn main() {):
- The
mainfunction is called. - The stack frame for
mainis created, but it's currently empty.
L2 — First Variable Declaration in main (let x = 1;):
- Variable
xis declared and initialized to1. - The stack frame now contains
xwith a value of1.
L3 — Function Call (test();):
- The
testfunction is called. - The stack frame for
mainnow shows a pending call totest.
L4 — Entry into test Function (fn test() {):
- The
testfunction's stack frame is created but currently empty. - The stack now includes frames for both
mainandtest.
L5 — Variable Declaration in test (let y = 1;):
- Variable
yis declared and initialized to1within thetestfunction. - The stack frame for
testnow includesywith a value of1.
L6 — Exit from test Function (}):
- The
testfunction finishes execution and its stack frame is removed. - Control returns to the
mainfunction with its stack frame intact.
L7 — Resuming main after test returns:
- The
testfunction has returned, and the stack frame fortestis now empty. - The stack frame for
mainstill containsx.
L8 — Exit from main Function (}):
- The
mainfunction finishes execution. - The stack frame for
mainis cleared as the program ends.
Notice in the above diagram that these frames are neatly organised into a stack of currently-called-functions where the most recent frame added is always the next frame freed (LIFO).
Copying Data
When an expression reads a variable (for example is assignments and function calls), the variable’s value is copied from its slot in the stack frame.
However, copying data can take up a lot of memory.
Imagine if x was an array containing a million elements! Copying x into y would cause the main frame to contain 2 million elements.
This is not a very efficient use of the available memory. And this problem would only multiply if our program handled multiple such large datasets.
This is where pointers step in.
Pointing to data in the heap
One way to mitigate this problem is by allocating this data in the heap and pointing to it.
A pointer is a value that describes a location in memory.
Rust data structures like Vec, String, and HashMap use the heap memory by default.
For example:
L1 — Function Entry (fn main() {):
- The
mainfunction is called. - The stack frame for
mainis created, but it is currently empty.
L2 — String Allocation (let name = String::from("John");):
- A
Stringis created from the literal"John". - The variable
nameis declared and stored on the stack. - The actual string data
"John"is stored on the heap. - The stack frame for
mainnow contains a reference (pointer) to the heap-allocated string data.
L3 — Function Exit (}):
- The
mainfunction is about to exit. - The stack frame for
mainis cleared, and thenamevariable goes out of scope.
Note that, the variable still lives in the stack but its value is stored in the heap.
Rust also provides you the Box construct for putting data on the heap.
Now consider the previous example where we had an array of million elements, but this time instead of storing the elements in the stack we’re going to store it in the heap using Box::new() :
L2 —Heap Allocation(let x = Box::new([0; 1_000_000]);):
- A
Boxis created, which allocates an array of1,000,000zeros on the heap. - The variable
xis declared and stored on the stack which points to the heap-allocated array.
L3 — Move of Box (let y = x;):
- Unlike previously,
ydoes not create another copy of the (million) elements in the heap and point to it. - Instead, the ownership of
xis moved toy. (Notice howxis now greyed out)
This is what is meant by a move:
In Rust, all heap data must be owned by exactly one variable. When you do let y = x;Rust copies the pointer from x into y, but the pointed-to data is not copied.
And now the new owner is y so x becomes invalid and you cannot use it to access the heap anymore:
fn main() {
let x = Box::new([0; 1_000_000]);
let y = x;
println!("{}", x); // ERROR
}error[E0382]: borrow of moved value: `x`
--> src/main.rs:4:22
|
2 | let x = Box::new([0; 1_000_000]);
| - move occurs because `x` has type `Box<[i32; 1000000]>`, which does not implement the `Copy` trait
3 | let y = x;
| - value moved here
4 | println!("{:?}", x);
| ^ value borrowed here after moveHeap data can only be accessed through its current owner y, not the previous owner:
fn main() {
let x = Box::new([0; 1_000_000]);
let y = x;
println!("{:?}", y); // CORRECT: y is the new owner
}Deallocation of Memory
Stack frames are automatically managed by Rust. When a function is called, Rust allocates a stack frame for the called function. When the call ends, Rust deallocates the stack frame.
But what about the heap data?
Well, Rust automatically frees a box’s heap memory when it deallocates the variable’s stack frame that owns the box.
So going back to the previous example:
L4 — Function Exit (}):
- The
mainfunction is about to exit. - The stack frame for
mainis cleared, and theyvariable goes out of scope. - The heap-allocated array is deallocated when the
Boxowner goes out of scope.
Consider another example where ownership moves around different functions:
L1 — Function Entry (fn main() {):
- The
mainfunction is called. - The stack frame for
mainis created.
L2 — String Creation (let first = String::from("Ferris");):
- A
Stringnamedfirstis created with the value"Ferris". - The variable
firstis allocated on the stack, and the actual string data is stored on the heap. firststores a pointer to the heap data.
L3 — Function Call (let full = add_suffix(first);):
- The
add_suffixfunction is called withfirstas an argument. - Ownership of the
firststring is moved to theadd_suffixfunction andfirstbecomes invalid.
L4 — Function Entry (fn add_suffix(mut name: String) -> String {):
- The
add_suffixfunction is called withnameinitialized to"Ferris". - The stack frame for
add_suffixis created, and thenamevariable is allocated on the stack. nameis now the new owner of the heap data.
L5 — Modify String (name.push_str(" Jr.");):
- The
push_strmethod is called onname, appending" Jr."to it. This does three things. First, it creates a new larger allocation. Second, it writes “Ferris Jr.” into the new allocation. Third, it frees the original heap memory.firstnow points to deallocated memory (denoted by ⦻). - The
namevariable now contains the value"Ferris Jr.".
L6 — Return Modified String (name):
- The modified
namestring is returned from theadd_suffixfunction. - Ownership of the string is moved back to the caller (
mainfunction). - The stack frame for
add_suffixis cleared.
L7 — Continue in main Function:
- After returning from
add_suffix, control is back in themainfunction. - The variable
fullnow owns the string"Ferris Jr.".
L8 — Print the Result (println!("{full}");):
- The
println!macro is called to print the value offull. - The value
"Ferris Jr."is printed.
L9 — Function Exit (}):
- The
mainfunction exits. - The stack frame for
mainis cleared, and thefullvariable goes out of scope. - The heap-allocated memory for the
Stringis also deallocated since it was owned byfull.
Notice how in the above example the heap data is not tied to just one stack frame, instead it attaches itself to different stack frames based on which variable owns it.
Now you might be wondering how does this concept of Ownership guarantee memory safety in Rust?
In short, it prevents undefined behaviour which thus ensures memory-safety.
Undefined behaviour can lead to various issues such as program crashes, security vulnerabilities, or data corruption.
But what do we mean by undefined behaviour?
Undefined Behaviour
Undefined behaviour in Rust refers to actions that the Rust compiler and runtime do not guarantee to behave predictably or correctly.
Consider this slightly different version of the previous code:
Here, when you do name.push_str(" Doe"); , it deallocates the previous string and reallocates the new updated string. This results in first pointing to freed memory.
Without ownership, you would have been able to access first thus resulting in undefined behaviour.
However, Rust prevents this undefined behaviour during compilation itself:
fn main() {
let first = String::from("John");
let mut name = first;
name.push_str(" Doe");
println!("{first}");
}Compiling chapter-3 v0.1.0 (/Users/urvashi/work/rust_projects/chapter-3)
error[E0382]: borrow of moved value: `first`
--> src/main.rs:5:15
|
2 | let first = String::from("John");
| ----- move occurs because `first` has type `String`, which does not implement the `Copy` trait
3 | let mut name = first;
| ----- value moved here
4 | name.push_str(" Doe");
5 | println!("{first}");
| ^^^^^^^ value borrowed here after moveSimilarly, imagine if Rust allowed you to manually deallocate memory using a function like free:
let b = Box::new([0; 100]);
free(b);
assert!(b[0] == 0); // UNDEFINED BEHAVIOURHere, again it would result in undefined behaviour as you’re trying to read the pointer b after freeing its memory. That would attempt to access invalid memory, which could cause the program to crash. Or worse, it could not crash and return arbitrary data. Therefore this program is unsafe.
Instead, Rust automatically frees a box’s heap memory using the ownership model to prevent such undefined behaviour.
A foundational goal of Rust is to ensure that your programs never have undefined behavior. That is the meaning of “safety.”
A secondary goal of Rust is to prevent undefined behavior at compile-time instead of run-time.
Summary
Rust employs a unique memory management system based on ownership, borrowing, and lifetimes to ensure memory safety without the need for a garbage collector or manual memory management.
- Stack vs. Heap: Rust differentiates between stack and heap memory. Stack memory is used for local variables with a known, short lifetime, while heap memory is used for data that needs to persist beyond the current scope.
- Copy and Move Semantics: Rust minimises unnecessary data copying by allowing ownership to be moved rather than copied. This is particularly useful for large data structures, reducing memory usage and improving performance.
- Ownership: Each piece of data in Rust has a single owner, and the data is automatically deallocated when the owner goes out of scope. This prevents memory leaks and dangling pointers.
- Undefined Behaviour Prevention: By ensuring that memory is always accessed safely and correctly, Rust prevents undefined behaviour, which can lead to program crashes, security vulnerabilities, and data corruption.
However, this was just one piece of the puzzle. Rust uses a combination of concepts like ownership, borrowing, and slices to ensure memory safety. You can learn more about Rust here.
