If you’re someone who works mostly with either Java, Python or JavaScript, then you’re already familiar with garbage collected languages.
However, if you’ve been working with either C, C++ or Assembly (kudos to you btw!), then you must be familiar with manual memory management.
Memory management is the process of allocating memory and deallocating memory. In other words, it’s the process of finding unused memory and later returning that memory when it is no longer used.
- The Rust Book
But what is the difference between these two types of languages — garbage collected and non-garbage collected (manual memory management)?
Garbage Collection
In garbage collected languages, there’s a garbage collector program that identifies and reclaims memory that is no longer used.
It is a form of automatic memory management which helps prevent memory leaks and optimise the use of available memory.
The garbage collector automatically frees up memory allocated to objects that are no longer reachable or needed, allowing you, the programmer, to focus on other aspects of coding without worrying about manually managing memory deallocation.
However, garbage collection introduces performance overheads due to the additional work required to automatically manage memory.
Manual Memory Management
Non-garbage collected languages on the other hand are those where memory management is typically handled manually by the programmer.
This means that programmers must explicitly allocate memory, and deallocate it to prevent memory leaks and other issues.
Though manual memory management gives developers direct control over when and how memory is allocated and deallocated and has no garbage collection overheads, but it also comes with several potential issues like memory leaks (when allocated memory is not properly deallocated), dangling pointers (when a pointer still references a memory location after that memory has been freed), double free (when an attempt is made to free the same memory location more than once) etc.
So, where does Rust fit in?
Rust does not use a garbage collector nor does it rely on manual memory management.
Instead, Rust employs a unique memory management system based on ownership with the help of the borrow checker.
It guarantees memory-safety, preventing common bugs associated with manual memory management and has no runtime overhead associated with garbage collection.
But before we talk about ownership in detail, first let’s understand a few concepts.
Stack vs Heap
If you’re coming from a traditional computer science background, then you might remember learning about the stack and the heap memory.
Stack memory is for static memory allocation, typically used for variables with a known lifetime and scope.
Heap memory is for dynamic memory allocation, suitable for objects that need to persist beyond the scope of a single function call.
Let’s talk about memory within the context of Rust.
Rust’s Memory Model
Variables Live in the Stack
When you call a function in Rust, it allocates a stack frame for it. You can think of a stack frame as a mapping from variables to their values within a single scope, such as a function.
For example:
In the given code, when main
is called, Rust allocates a stack frame for it and after the function returns, Rust automatically deallocates the function’s frame.
L1 — Function Entry (fn main() {
):
- The
main
function is called. - The stack frame for
main
is created, but it's currently empty.
L2 — First Variable Declaration (let x = 1;
):
- Variable
x
is declared and initialized to1
. - The stack frame now contains
x
with a value of1
.
L3 — Second Variable Declaration (let y = 1;
):
- Variable
y
is declared and initialized to1
. - The stack frame now contains both
x
andy
, each with a value of1
.
L4 — Function Exit (}
):
- The
main
function is about to exit. - The stack frame for
main
is emptied as the function scope ends, and local variablesx
andy
go out of scope.
Consider another example:
L1 — Function Entry (fn main() {
):
- The
main
function is called. - The stack frame for
main
is created, but it's currently empty.
L2 — First Variable Declaration in main
(let x = 1;
):
- Variable
x
is declared and initialized to1
. - The stack frame now contains
x
with a value of1
.
L3 — Function Call (test();
):
- The
test
function is called. - The stack frame for
main
now shows a pending call totest
.
L4 — Entry into test
Function (fn test() {
):
- The
test
function's stack frame is created but currently empty. - The stack now includes frames for both
main
andtest
.
L5 — Variable Declaration in test
(let y = 1;
):
- Variable
y
is declared and initialized to1
within thetest
function. - The stack frame for
test
now includesy
with a value of1
.
L6 — Exit from test
Function (}
):
- The
test
function finishes execution and its stack frame is removed. - Control returns to the
main
function with its stack frame intact.
L7 — Resuming main
after test
returns:
- The
test
function has returned, and the stack frame fortest
is now empty. - The stack frame for
main
still containsx
.
L8 — Exit from main
Function (}
):
- The
main
function finishes execution. - The stack frame for
main
is cleared as the program ends.
Notice in the above diagram that these frames are neatly organised into a stack of currently-called-functions where the most recent frame added is always the next frame freed (LIFO).
Copying Data
When an expression reads a variable (for example is assignments and function calls), the variable’s value is copied from its slot in the stack frame.
However, copying data can take up a lot of memory.
Imagine if x
was an array containing a million elements! Copying x
into y
would cause the main
frame to contain 2 million elements.
This is not a very efficient use of the available memory. And this problem would only multiply if our program handled multiple such large datasets.
This is where pointers step in.
Pointing to data in the heap
One way to mitigate this problem is by allocating this data in the heap and pointing to it.
A pointer is a value that describes a location in memory.
Rust data structures like Vec
, String
, and HashMap
use the heap memory by default.
For example:
L1 — Function Entry (fn main() {
):
- The
main
function is called. - The stack frame for
main
is created, but it is currently empty.
L2 — String Allocation (let name = String::from("John");
):
- A
String
is created from the literal"John"
. - The variable
name
is declared and stored on the stack. - The actual string data
"John"
is stored on the heap. - The stack frame for
main
now contains a reference (pointer) to the heap-allocated string data.
L3 — Function Exit (}
):
- The
main
function is about to exit. - The stack frame for
main
is cleared, and thename
variable goes out of scope.
Note that, the variable still lives in the stack but its value is stored in the heap.
Rust also provides you the Box
construct for putting data on the heap.
Now consider the previous example where we had an array of million elements, but this time instead of storing the elements in the stack we’re going to store it in the heap using Box::new()
:
L2 —Heap Allocation(let x = Box::new([0; 1_000_000]);
):
- A
Box
is created, which allocates an array of1,000,000
zeros on the heap. - The variable
x
is declared and stored on the stack which points to the heap-allocated array.
L3 — Move of Box (let y = x;
):
- Unlike previously,
y
does not create another copy of the (million) elements in the heap and point to it. - Instead, the ownership of
x
is moved toy
. (Notice howx
is now greyed out)
This is what is meant by a move:
In Rust, all heap data must be owned by exactly one variable. When you do let y = x;
Rust copies the pointer from x
into y
, but the pointed-to data is not copied.
And now the new owner is y
so x
becomes invalid and you cannot use it to access the heap anymore:
fn main() {
let x = Box::new([0; 1_000_000]);
let y = x;
println!("{}", x); // ERROR
}
error[E0382]: borrow of moved value: `x`
--> src/main.rs:4:22
|
2 | let x = Box::new([0; 1_000_000]);
| - move occurs because `x` has type `Box<[i32; 1000000]>`, which does not implement the `Copy` trait
3 | let y = x;
| - value moved here
4 | println!("{:?}", x);
| ^ value borrowed here after move
Heap data can only be accessed through its current owner y
, not the previous owner:
fn main() {
let x = Box::new([0; 1_000_000]);
let y = x;
println!("{:?}", y); // CORRECT: y is the new owner
}
Deallocation of Memory
Stack frames are automatically managed by Rust. When a function is called, Rust allocates a stack frame for the called function. When the call ends, Rust deallocates the stack frame.
But what about the heap data?
Well, Rust automatically frees a box’s heap memory when it deallocates the variable’s stack frame that owns the box.
So going back to the previous example:
L4 — Function Exit (}
):
- The
main
function is about to exit. - The stack frame for
main
is cleared, and they
variable goes out of scope. - The heap-allocated array is deallocated when the
Box
owner goes out of scope.
Consider another example where ownership moves around different functions:
L1 — Function Entry (fn main() {
):
- The
main
function is called. - The stack frame for
main
is created.
L2 — String Creation (let first = String::from("Ferris");
):
- A
String
namedfirst
is created with the value"Ferris"
. - The variable
first
is allocated on the stack, and the actual string data is stored on the heap. first
stores a pointer to the heap data.
L3 — Function Call (let full = add_suffix(first);
):
- The
add_suffix
function is called withfirst
as an argument. - Ownership of the
first
string is moved to theadd_suffix
function andfirst
becomes invalid.
L4 — Function Entry (fn add_suffix(mut name: String) -> String {
):
- The
add_suffix
function is called withname
initialized to"Ferris"
. - The stack frame for
add_suffix
is created, and thename
variable is allocated on the stack. name
is now the new owner of the heap data.
L5 — Modify String (name.push_str(" Jr.");
):
- The
push_str
method is called onname
, appending" Jr."
to it. This does three things. First, it creates a new larger allocation. Second, it writes “Ferris Jr.” into the new allocation. Third, it frees the original heap memory.first
now points to deallocated memory (denoted by ⦻). - The
name
variable now contains the value"Ferris Jr."
.
L6 — Return Modified String (name
):
- The modified
name
string is returned from theadd_suffix
function. - Ownership of the string is moved back to the caller (
main
function). - The stack frame for
add_suffix
is cleared.
L7 — Continue in main
Function:
- After returning from
add_suffix
, control is back in themain
function. - The variable
full
now owns the string"Ferris Jr."
.
L8 — Print the Result (println!("{full}");
):
- The
println!
macro is called to print the value offull
. - The value
"Ferris Jr."
is printed.
L9 — Function Exit (}
):
- The
main
function exits. - The stack frame for
main
is cleared, and thefull
variable goes out of scope. - The heap-allocated memory for the
String
is also deallocated since it was owned byfull
.
Notice how in the above example the heap data is not tied to just one stack frame, instead it attaches itself to different stack frames based on which variable owns it.
Now you might be wondering how does this concept of Ownership guarantee memory safety in Rust?
In short, it prevents undefined behaviour which thus ensures memory-safety.
Undefined behaviour can lead to various issues such as program crashes, security vulnerabilities, or data corruption.
But what do we mean by undefined behaviour?
Undefined Behaviour
Undefined behaviour in Rust refers to actions that the Rust compiler and runtime do not guarantee to behave predictably or correctly.
Consider this slightly different version of the previous code:
Here, when you do name.push_str(" Doe");
, it deallocates the previous string and reallocates the new updated string. This results in first
pointing to freed memory.
Without ownership, you would have been able to access first
thus resulting in undefined behaviour.
However, Rust prevents this undefined behaviour during compilation itself:
fn main() {
let first = String::from("John");
let mut name = first;
name.push_str(" Doe");
println!("{first}");
}
Compiling chapter-3 v0.1.0 (/Users/urvashi/work/rust_projects/chapter-3)
error[E0382]: borrow of moved value: `first`
--> src/main.rs:5:15
|
2 | let first = String::from("John");
| ----- move occurs because `first` has type `String`, which does not implement the `Copy` trait
3 | let mut name = first;
| ----- value moved here
4 | name.push_str(" Doe");
5 | println!("{first}");
| ^^^^^^^ value borrowed here after move
Similarly, imagine if Rust allowed you to manually deallocate memory using a function like free
:
let b = Box::new([0; 100]);
free(b);
assert!(b[0] == 0); // UNDEFINED BEHAVIOUR
Here, again it would result in undefined behaviour as you’re trying to read the pointer b
after freeing its memory. That would attempt to access invalid memory, which could cause the program to crash. Or worse, it could not crash and return arbitrary data. Therefore this program is unsafe.
Instead, Rust automatically frees a box’s heap memory using the ownership model to prevent such undefined behaviour.
A foundational goal of Rust is to ensure that your programs never have undefined behavior. That is the meaning of “safety.”
A secondary goal of Rust is to prevent undefined behavior at compile-time instead of run-time.
Summary
Rust employs a unique memory management system based on ownership, borrowing, and lifetimes to ensure memory safety without the need for a garbage collector or manual memory management.
- Stack vs. Heap: Rust differentiates between stack and heap memory. Stack memory is used for local variables with a known, short lifetime, while heap memory is used for data that needs to persist beyond the current scope.
- Copy and Move Semantics: Rust minimises unnecessary data copying by allowing ownership to be moved rather than copied. This is particularly useful for large data structures, reducing memory usage and improving performance.
- Ownership: Each piece of data in Rust has a single owner, and the data is automatically deallocated when the owner goes out of scope. This prevents memory leaks and dangling pointers.
- Undefined Behaviour Prevention: By ensuring that memory is always accessed safely and correctly, Rust prevents undefined behaviour, which can lead to program crashes, security vulnerabilities, and data corruption.
However, this was just one piece of the puzzle. Rust uses a combination of concepts like ownership, borrowing, and slices to ensure memory safety. You can learn more about Rust here.