Rust Re-borrowing and Memory Safety

Rust Re-borrowing and Memory Safety

Introduction

Rust is only 8 years old, and has been increasingly adopted by the software development community, according to this GitHub article, from last year:

Rusthas topped the chart as “the most desired programming language” inStack Overflow’s annual developer survey. And with more than 80% of developers reporting that they’d like to use the language again next year”

One of the biggest reasons for this achievement, comes from its capability to handle possible memory management problems, a subject whose article today revolves around it.

We will talk about a mechanism of Rust ownership called re-borrowing, a subject for which we don’t find documentation very often.

Why Memory Safety ?

When we are running a program on our computers, we are consuming RAM memory for saving/allocating runtime data, which is crucial for our application's health and performance. All programming languages allocate memory as long as your program needs, but each language handles this mechanism differently. Ahead of talking about the differences between the programming languages, we need to talk about the stack and the heap.

Stack memory is a sort of memory allocation that the OS continuously manages and uses to store local variables and function calls in a LIFO order. On the other hand, heap memory is a type of dynamic memory allocation used for storing objects and data structures that require a longer lifespan than stack memory

The stack is way faster than the heap, and very efficiently, reclaimed when the function exits, which can be convenient for the programmer if the data is no longer required.

Although, on the heap, when you allocate space, this is only freed when you specify that the allocation is no longer needed. In languages like C, when the developer allocates space in the heap, he needs to determine the deallocation of that memory, when the data is not needed anymore. In the case he doesn’t do it, his program will have memory leak issues, thus, allocating and deallocating memory manually, was a big cumbersome task for developers. That is why, many languages like Java, Python and Javascript, came with the concept of the garbage collector, which is a mechanism that looks for claimed spaces that aren’t being used, and free up them.

Since then, many programming languages started to have garbage collector, using many different methodologies, but having a garbage collector, means that we have an additional process in our program, from which we can infer that, at some level, it will impact the performance.

Nowadays, there are a lot of different algorithms for GC, but as stated, at some point we will have a decrease in performance on our system, and as long as we have more heap allocations, we will need more GC cycles we will need.

One of the types of GC, is called stop-the-word, which is used in Javascript and essentially stops everything from doing that task. This might be imperceptible for some applications where real-time actions and data processing are not a requirement, but on the other hand, this would progressively impact our results.

Now that our hero takes place, Rust doesn’t have GC, which means we manually need to allocate dynamic memory ? Not, at all; Rust solves that problem with a mechanism called Ownership, which is a set of strict rules, defined by the compiler, that guarantees that there won’t be any memory leaks. This is what makes Rust so fast: being a strong candidate for real-time tasks.

Before talking about Ownership, we need to clarify that Rust isn’t perfect; these regulations improve a lot of memory management, but not 100%. Additionally, developers can use unsafe keyword, that tells the compiler a block can be executed without the static analysis, which can cause memory leak problems. You can use unsafe code to tell the compiler, “Trust me, I know what I’m doing.” Be warned, however, that you use unsafe Rust at your own risk. If you use unsafe code incorrectly, problems can occur due to memory unsafeness, such as null pointer dereferencing.

Ownership

Now that we have all the context, we know that some languages the user needs to manage how memory is allocated/free, others use garbage collector, and finally Rust uses the ownership approach, which are a set of rules that the compiler checks, and if any of these rules are violated, the program won’t compile, showing an error to the developer.

According to the description of ownership in Rust official documentation(https://doc.rust-lang.org/book/ch04-01-what-is-ownership.html):

Because ownership is a new concept for many programmers, it does take some time to get used to it. The good news is that the more experienced you become with Rust and the rules of the ownership system, the easier you’ll find it to naturally develop code that is safe and efficient. Keep at it!

Therefore, since ownership is a new concept, most of you may find it a little bit difficult to learn Rust, but once you get used to it, you will have a clear view of how the language works, and additionally, this “forces” you to pay attention to the memory management subject, one thing that is basically disregarded by many developers.

Here are some basic rules:

  • Each value in Rust has an owner.

  • There can only be one owner at a time.

  • When the owner goes out of scope, the value will be dropped.

In Rust, everything is immutable, unless you specify that something should be mutable, using the mut syntax when declaring a variable. Let’s take a look at this piece of code:

fn my_function() {
                                    // here no variables are defined
    let my_variable = "hello";     // here we are declaring my_variable
                                  // my_variable is the owner of the value string value

}                               // when the scope of that function ends, the owner of that value is gone

On this last piece of code, the hello string value has been assigned to my_variable, which means that this variable is the owner of that value.

But if we do the following:

fn my_function() {
    let str1: String = String::from("World");
    let str2: String = str1;

    // If you try to execute the following print statement, it will return an error, because you moved
    // the value from str1 to str2, and now str2 is the owner of the value, so you can't use it.
    // println!("Hello {}", str1)
}

Here, we are moving the ownership of value from str1 to str2, which means that if we try to use str1 after moving its ownership, we will get a compiler error. By doing this, Rust knows exactly how much memory has been used, and when we move ownership, Rust considers str1 as no longer valid.

The Borrow operator

Here, things starts to cause confusion in the developer’s mind. Let's take a look at this code:

fn main() {
    let s = String::from("hello");
    let new_string = &s;
    println!("{s}");
    println!("{new_string}");
}

This code will compile, and you might say that new_string is a pointer to s, although pointers exist in Rust, the & operator doesn’t mean we are creating a pointer, it means we are borrowing a value from a variable.

When you borrow a value from a variable, you are basically borrowing the reference from it, meaning that you could use that value, but you can’t change it. When the function goes out of scope, all the borrows and ownerships are gone. Therefore, this is different from languages like C or Golang, where you create a pointer, and you can modify it.

You can have as many borrowings as you want within the function scope. But, let’s suppose you want to change a mutable variable with a borrow. For this, we will have the concept of the mutable borrow, using the syntax &mut.

fn main() {
    let mut s = String::from("hello");
    let new_string = &mut s;
    new_string.push_str(" world!");

    println!("{s}");
}

This code works perfectly, and you know you are going to say, “This documentation is lying to me, this is a pointer in action”. In fact, this works like a pointer and smells like a pointer, but for Rust this is a mutable borrow, which means we borrowed a mutable reference to a variable, which can change the value it points to.

Let’s take a look at this Golang code, for instance:

func main() {
    myString := "hello"

    firstPointer := &myString
    secondPointer := &myString
    fmt.Println(*firstPointer)
    fmt.Println(*secondPointer)
}

We created two pointers, and we are able to dereference both for printing their values. Now if we try to do the same in Rust:

fn main() {
    let mut str3: String = String::from("Hello");
    let str4 = &mut str3; // str4 takes the mutable reference 
    let str5 = &mut str3; // the mutable reference ownership moves to str5

    println!("{}", str4);
}

If you execute this code, you will get a compiler error, saying that cannot borrow str3 as mutable more than once at a time. This happens because we have a borrowing rule that states that we can only borrow a mutable reference to a single variable at a time, which means that the ownership of the mutable reference has been moved to str5.

Why do we have this ? Basically, restricting the mutable borrow to a single owner at a time, makes it easier to run-time, to know that each variable may have at most one mutable borrow, avoiding having too many references, and allocations on the heap that don’t have any references to it. Now, with the immutable borrow &, we can have as many as we want, because we aren’t changing any value.

In the end of the day, pointers exists in Rust, as well as stack, and heap allocations, and also getting the reference of a variable that had been allocated in the heap, using &. Although, in Rust we can this borrowing and mutable borrow, and have special rules for it, to ensure memory safety.

Additionally, Rust lets you break the rules, you can use the keyword unsafe in your code, and it won’t enforce these memory safety guarantees, but at your own risk.

Functions and Ownership

Let’s consider the following code:

fn main() {
    let n = 0;
    print_number(0);

    println!("{n}")
}

fn print_number(number: i32) {
    println!("{number}")
}

We are just calling a function for printing the variable value, and printing it twice, which is very dumb on our part, but let’s stick to the academic goals of this article.

Considering ownership in Rust, and that we are dealing with a numeric literal, when we call the function and pass the number, we aren’t moving the ownership, we are basically copying the value to the print_number variable. Thus, we aren’t moving the value, we are copying it to a new variable, so both variable have the same value, but different ownerships. Talking about memory management, and allocation on the stack:

  • main gets executed, and placed at the bottom of the stack

  • n is declared as i32, having the value of 0, so n has ownership of this value

  • print_number function gets added to the stack

  • variable number is declared with the same value of 0

  • we print the value, print_number and number are dropped from the stack

  • n still has the ownership of the value 0

Why are we talking only about the stack ? By cause of Rust also has rules for allocations:

*Rust stack allocates by default. Box is used to allocate to the heap. Primitive types and local variables of a function are allocated on the stack.*Data types that are dynamic in size, such as String , Vector , Box , etc., are allocated on the heap*.*

Hence, as we were using i32, which is a primitive type, it’s all allocated on the stack. Now let’s try this code:

fn print_string(str4: String) {
    println!("{str4}")
}
fn main() {
    let str3  =  String::from("test");
    print_string(str3);

    println!("{str3}")
}

It looks exactly the same as the last one, but now we are using a String type, instead of i32, and I must disappoint you, that this code won’t work, mainly because as we just have seen on the allocation rules, the String is dynamic in size, and will always be placed on the heap.

Then, when we are declaring str3, we are storing the String on the heap, and the reference to that on the stack. Rust adds another rule here, that we can’t copy these references to allocations on the heap, it would simply duplicate this pointer, resulting in a double free later.

In the other hand, Rust allows us cloning the content of the String, which in the background is basically as allocating more memory on the heap and copying the entire String content. If we execute this next code, it should work:

fn print_string(str4: String) {
    println!("{str4}")
}
fn main() {
    let str3  =  String::from("test");
    print_string(str3.clone());

    println!("{str3}")
}

Lastly, copying and cloning are different things:

  • copy: performs a bitwise copy of the value, all the primitive types can be copied. So, when sending a number to a function, Rust is performing a copy under the hood.

  • clone: performs a whole copy of the content on the heap, you can use clone on primitive types and the ones that are necessarily allocated on the heap.

More details on copy and clone: https://oswalt.dev/2023/12/copy-and-clone-in-rust/.

BE AWARE, that in this section we are only talking about ownership and passing values to a function, without using borrow and mutable borrow operators, which is when we reach a very special case, called re-borrow.

Finally! Re-borrowing!

There is one important topic on Rust, that is not covered in the majority of documentations, re-borrowing, which is an exception to mutable reference borrows. As we know in Rust, we can only borrow one mutable reference at a time. For example, the following code would generate an error:

let a = 0;
let b = &mut a;
let c = &mut a;
println!("{b}"); // an error will be generated here, because the current borrower of the mutable reference is variable c.

This next example is also going to generate an error:

let a = 0;
let b = &mut a;
let c = b;
println!("{b}"); // Same case, we are using b, but we moved the borrow from b to c

Let's run reborrow() function, defined above, which in first instance, we think that it will generate an error, but it works:

fn reborrow() {
    let mut a = 7;
    let b = &mut a;
    let c = &mut *b; //  this is called re-borrow
    *c = 1;

    println!("{b}");
    // println!("{c}"); if we try to print c, it will generate an error, because the re-borrow already returned to b
}

Why does it work ? When we use the syntax &mut *b, we are basically doing a re-borrow, which means we are temporarily lending the mutable reference to another, in this case, to the c variable. When we use b again, the re-borrow ends, and we can't use c again. Now the question is, why do we have the re-borrow mechanism, considering that Rust must be memory-safe? One of the reasons for re-borrowing, is that we might want to send a mutable reference to a function call, and then use that mutable reference again when the function returns. In Rust, every time we pass a mutable reference to a function, Rust in the background forces a re-borrow. Let's check out this example:

// Here, r is receiving a re-borrow of the mutable reference, and when the function ends, the re-borrow ends, and we still can use b variable.
fn foo(r: &mut i32) {
    *r += 1;
}

fn second_reborrow() {
    let mut a = 7;
    let b = &mut a; // first mutable reference borrow

    foo(b); // here we are sending the mutable reference to the function, but what rust is doing behind the scenes is &mut *b
    // this means we are re-borrowing the mutable reference that belongs, to b. We can't re-borrow from the owner, we can only re-borrow from the borrower.

    println!("{b}"); // Expected error, `b` was moved out.
}

This might sound really confusing, but let's think about the memory management side, on stack allocations. We know that all function calls are stored in the stack, and that a stack is LIFO, so let's suppose that b borrowed a mutable reference from a. This will be placed on the top of the stack, and on the next line, I will re-borrow b to a function. This new function call will be placed at the top of the stack, and Rust knows that there won't be any usages of b, until the new called function returns, because b it's defined in the parent function scope. Therefore, the compiler knows that we can only re-use b, when the called function returns, so there won't be concurrency, which guarantees the application holds the memory-safe attribute.

DISCLAIMER AT THIS POINT! We can only re-borrow mutable references, the following code won’t work:

fn main() {
   let mut a = 2;
    let b = &mut a;
    let c = &mut *a;

    println!("{b}")
}

Here, we are trying to execute a re-borrow of a variable, which is not a mutable reference, it’s a simple integer. We could re-borrow b, which is a mutable reference of a.

Furthermore, if you got this point clear, considering we are in a multi-threading environment, does that still stands ? Let's take a look at this example:

fn start_thread(r: &mut i32) {
    thread::spawn( || {
        println!("{}", r);
    });
}

fn reborrow_thread() {
    let mut a = 7;
    let b = &mut a;

    start_thread(b); // here we are re-borrowing, Rust is doing &mut *b in the background.
}

Here we will get the error argument requires that 1 must outlive static. This means that Rust has the re-borrow mechanism, but doesn't allow this value to escape to another thread, because this would require multiple references to a value, and since it's not memory-safe, this is not allowed.

Let's leave things a little bit more complicated, like mutable borrow, where we can only have one at a time. The re-borrow works the same way, for example, this next function would cause an error:

fn reborrow_error() {
    let mut a = 7;
    let b = &mut a; // here we are passing the mutable reference to b
    let c = &mut *b; // c is the re-borrow of b
   let d = &mut *b; // we moved the re-borrow of b to d
    *c = 2; // as the re-borrow now it's with d, we can't use c, this will generate an error
}

But, let's take a deeper look at the next function, which will work:

fn reborrow_chain() {
    let mut a = 7;
    let b = &mut a; // b receives the mutable borrow from a
    let c = &mut *b; // c receives the re-borrow from b
    let d = &mut *c; // d receives the re-borrow from c
    *d = 2;
    println!("{c}");
    println!("{b}");
}

This might blow your mind. Why is this working ? Considering that we can only have a mutable borrow or re-borrow at a time. But, if you take a deeper look, between this example, and the last one that doesn't work, this new one it's not moving the re-borrow ownership, c is re-borrowing b, but d is re-borrowing c, and we may call this the re-borrowing chain. Remember, that we said that when you execute a re-borrow, once you use the variable that you re-borrowed from, the re-borrow ends? For illustrating this, let's imagine that re-borrow works like a LIFO stack, so at the second re-borrow(let d = &mut *c), we will have a stack like this(don't confuse with the stack on the memory, it's metaphoric) →

//        | d re-borrow c      |    when we stop using d                             when we stop using c
//        | c re-borrow b      |     or mention c again     | c re-borrow b      |    or mention b again
//        | b mutable borrow a |          ====>             | b mutable borrow a |           ====>             | b mutable borrow a |
//        | a definition       |                            | a definition       |                             | a definition       |

If we take a look at the example that generates an error:

//         | d re-borrow b      |     -> here on d variable, we are "stealing/moving" the re-borrow of b from c, if we try to use c again, it will generate an error
//         | c re-borrow b      |
//         | b mutable borrow a |
//         | a definition       |

With this stack analogy, we can imagine how the compiler will think, in the sense that, I can do re-borrows in chain, because every time I use the variable I re-borrowed from, the re-borrow will end, and we will free memory allocated space. So in the workable example, if we use c again, the re-borrow that d take will end, and finally when we use b, the re-borrow from c ends. Since this is a chain, if straight after doing all the re-borrows, if I use b, all the re-borrows will end.

Conclusion

Rust is a little bit different from other programming languages, especially when talking about memory safety. The ownership concept takes a little bit of time to get used to, but it's indeed a mechanism that improves a lot the performance of a program, which is why Rust is the first option for use cases like:

  • Cryptocurrency and Blockchain Environment

  • IoT

  • Embedded Systems

  • Gaming

  • Machine Learning