Open AccessArticle

SafeMD: Ownership-Based Safe Memory Deallocation for C Programs

College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China

Department of Computer Science, TU Kaiserslautern, 67653 Kaiserslautern, Germany

Author to whom correspondence should be addressed.

Electronics 2024, 13(21), 4307; https://doi.org/10.3390/electronics13214307

Submission received: 27 September 2024 / Revised: 29 October 2024 / Accepted: 31 October 2024 / Published: 1 November 2024

(This article belongs to the Special Issue Advances in Data-Driven Artificial Intelligence)

Download

Browse Figures

Versions Notes

Abstract

Rust is a relatively new programming language that aims to provide memory safety at compile time. It introduces a novel ownership system that enforces the automatic deallocation of unused resources without using a garbage collector. In light of Rust’s promise of safety, a natural question arises about the possible benefits of exploiting ownership to ensure the memory safety of C programs. In our previous work, we developed a formal ownership checker to verify whether a C program satisfies exclusive ownership constraints. In this paper, we further propose an ownership-based safe memory deallocation approach, named SafeMD, to fix memory leaks in the C programs that satisfy exclusive ownership defined in the prior formal ownership checker. Benefiting from the C programs satisfying exclusive ownership, SafeMD obviates alias and inter-procedural analysis. Also, the patches generated by SafeMD make the input C programs still satisfy exclusive ownership. Usually, a C program that satisfies the exclusive ownership constraints is safer than its normal version. Our evaluation shows that SafeMD is effective in fixing memory leaks of C programs that satisfy exclusive ownership.

Keywords:

C; memory leaks; memory deallocation; Rust; ownership

1. Introduction

C is widely used for implementing system and embedded software, which are usually safety-critical systems [1,2]. However, their manual memory management can easily produce memory leaks (MLs) in C programs. Memory leaks mainly occur when a programmer allocates an object but forgets to deallocate it. Memory leaks may have a large negative impact on software systems if not carefully examined and fixed. In fact, memory leaks are direct sources of security vulnerabilities. Some memory leak vulnerabilities have been disclosed in Linux kernels (e.g., CVE-2022-27819 [3], CVE-2017-10810 [4]).

Recently, an emerging programming language designed for highly safe systems, i.e., Rust [5], has received an increasing amount of attention. Compared with C/C++, Rust introduces an ownership system (OwS) to provide memory safety at compile time, which can avoid many memory errors, such as dangling pointers, data races and memory leaks. The basic idea of OwS is exclusive ownership, i.e., at any time, each resource has a unique owner. When the unique owner of a resource goes out of its scope, the resource can be automatically dropped without using the garbage collector. Because of the unique owner, this automatic drop scheme is safe, i.e., it does not incur new errors like use-after-free (UAF) and double-free (DF).

In light of Rust’s promise of safety, the emergence of OwS in Rust provides a new insight to guarantee the memory safety of C programs. Therefore, in our previous work [6], as shown in Figure 1, we developed a formal ownership checker, named SafeOSL, to verify whether a C program satisfies the exclusive ownership constraint. If a C program passes the checking of SafeOSL, it means that this C program satisfies exclusive ownership. For such a special C program, in this paper, we further propose an ownership-based memory deallocation, named SafeMD, to fix memory leaks. The output of SafeMD is a C program that is free of memory leaks, and, most importantly, the C program, after the repair of SafeMD, still satisfies exclusive ownership. Usually, a C program that satisfies the exclusive ownership is safer than its normal version.

Many static techniques on memory-leak fixing have been proposed in the program repair community [7,8,9,10,11]. If the input C programs satisfy exclusive ownership, these techniques suffer from some drawbacks when fixing memory leaks. Firstly, some techniques can fix memory leaks but may introduce new errors, like UAF and DF. Secondly, some techniques can safely fix memory leaks but are complex as they often rely on alias and inter-procedural analysis. Thirdly, some techniques can safely fix memory leaks, but the patches that they generate cannot guarantee that the C programs satisfy the exclusive ownership. Usually, a C program that satisfies the exclusive ownership is safer than its normal version.

In this paper, we propose an ownership-based memory deallocation, named SafeMD, to fix memory leaks of C programs that satisfy exclusive ownership. SafeMD can generate a set of free statements to safely deallocate all allocated memory objects without introducing UAF and DF. SafeMD includes two steps: (1) using a static analysis that collects patch candidates for each allocated object. The main idea of collecting patch candidates is to track ownership of the object and free the object where the owner last used it. Benefiting from the input C programs satisfying exclusive ownership, this step obviates alias and inter-procedural analysis. Also, the patch candidates collected satisfy exclusive ownership. (2) Finding correct patches by solving an exact cover problem. Because ownership designates which function is responsible for deallocating memory objects, SafeMD simplifies inter-procedural analysis to intra-procedural analysis, and each analysis performs the above two steps.

The experimental results demonstrate that SafeMD is able to fix the memory leaks in C programs. We evaluated SafeMD with two different benchmark sets: Juliet Test Suite (JTS) for C [12] and 26 open-source C repositories [10]. We compared SafeMD with MemFix. When the input C programs satisfy exclusive ownership, SafeMD can fix more memory-leak patterns than MemFix.

We summarize the contributions of this paper as follows.

We present SafeMD, an ownership-based safe memory deallocation technique for C programs that satisfy exclusive ownership. Compared with the existing techniques, SafeMD obviates alias and inter-procedural analysis, and the patches generated satisfy exclusive ownership.
We implement SafeMD and compare it with MemFix.
We explore the benefit of Rust’s novel ownership-based memory management in C.

2. Related Work

2.1. Approaches for Memory-Leak Fixing

Some prominent techniques have been proposed to statically fix memory leaks. Leakfix [7] performs pointer analysis on the whole C program to identify and safely fix memory leaks. Each procedure is classified into three types: those that allocate, deallocate or use a given memory allocation. It first abstracts the program into an abstract control flow graph (CFG) where each node is a procedure classified into above three types. With this graph, the task of finding correct patches is equivalent to finding edges in the graph that meet a set of conditions. AutoFix [8] combines static analysis with runtime checking to prevent memory leaks. In its static analysis, Andersen’s pointer analysis is used to build the value-flow graph (VFG) for the program. Based on the VFG, AutoFix performs a graph reachability analysis to identify leaky paths and then conducts a liveness analysis to locate the program points for inserting patches on identified leaky paths. FootPatch [9] can fix memory leaks by applying local reasoning based on separation logic. But, it may introduce new errors as it checks the patch correctness against the given error report only. Memfix [10] can safely repair ML, DF and UAF in a unified fashion. The key insight behind MemFix is that finding a correct patch for memory leaks corresponds to solving an exact cover problem. Before analysis, Memfix performs standard pointer [13] and alias analyses [14]. SAVER [11] can safely fix memory errors such as ML, UAF and DF. It performs pointer analysis to construct object flow graphs (OFGs) that capture the program’s heap-related behavior. Based on the OFG, fixing memory errors can be formulated as a graph-labeling problem over the OFG.

For the C programs that satisfy exclusive ownership, most of the work mentioned above still performs alias and inter-procedural analysis to fix memory leaks, making the repair complex and inefficient. However, the exclusive ownership satisfied by input programs can ease memory-leak fixing since exclusive ownership entirely rules out aliases. Also, the patches generated by the existing work may violate exclusive ownership. Therefore, this paper proposes an approach that exploits the particularity of ownership to fix memory leaks for C programs that satisfy exclusive ownership.

Several dynamic-based techniques have been proposed [15,16,17]. DEF_LEAK [15] performs dynamic symbolic execution to expose memory leaks occurring in all execution paths. In their approach, the program to be analyzed is instrumented before execution. During the program execution, information about each allocated memory is updated when corresponding statements are executed. Based on this information, DEF_LEAK records the changes in variables pointing to each memory, detects memory leaks and fixes leaked memory. LeakPoint [16] is a dynamic analysis framework that performs taint propagation on pointers to detect memory leaks. It can identify last-use sites of the leaked objects and suggest the patches for fixing them. AddressWatcher [17] is a dynamic tool for fixing memory leaks. It allows the semantics of a memory object to be tracked on multiple execution paths. It accomplishes this by using a leak database that allows one to store and compare different execution paths of a leak over several test cases.

2.2. Ownership for Memory Safety

Ownership has been used in OO programming to enable controlled aliasing [18,19] and prevent data races [20,21]. Most of these works construct an ownership-type system for Java and require programmers to provide various annotations. A small number of the works have applied ownership to detect memory errors of C programs. Heine et al. [22] present an ownership-type system to detect ML and DF. Their ownerships range over integer values

{0, 1}

. In their model, every object is pointed to by one and only one owning pointer (i.e., ownership value equals 1), which holds the exclusive right and obligation to either delete the object or to transfer the right to another owning pointer. However, the rules of their ownership model are not very strict; for example, it adds an optional ownership transfer in assignment and thus allows for arbitrary aliases. Swamy et al. [23] develop a language Cyclone that introduces a simple concept similar to ownership to detect dangling pointers. Unlike C, their language requires programmers to provide various annotations (such as whether a pointer is aliased or not). Suenaga et al. [24] propose a fractional-ownership-type system to detect ML, DF and UAF in C. Their model augments a pointer type with a fractional ownership, which is a rational number

x \in [0, 1]

. In their ownership model, a non-zero ownership expresses a permission to dereference the pointer, and an ownership of 1 expresses a permission to update and deallocate the memory cell referenced by the pointer. Therefore, if one has a non-zero ownership less than 1, one has to eventually combine it with other ownerships to obtain an ownership of 1 in order to deallocate the pointer. Sonobe et al. [25] extend the fractional-ownership-type system in [24] to fix memory leaks. Their technique conducts type inference for the extended-type system to detect where to insert deallocation statements.

In recent years, ownership in Rust has received much attention. A majority of existing work toward Rust mainly focuses on formal verification (including ownership) of Rust programs [26,27,28] and empirical research on the effectiveness of Rust ownership in fighting against memory bugs [29,30]. Recently, some work on (semi-)automatically translating C code to Rust has been proposed [31,32,33]. Compared to the ownership proposed in the earlier literature (before Rust was released), the potential advantages of Rust ownership are: (1) it has more strict rules; (2) its implementation is simpler and more efficient than fractional-based ownership; (3) it has been proven to be effective in preventing memory errors [30,34]. Therefore, in our previous work [6], we exploit Rust ownership to check for memory errors of C programs, and, in this paper, we further exploit Rust ownership to fix memory leaks for C programs that satisfy exclusive ownership.

3. Ownership System in Rust

Rust guarantees memory safety at compile time by introducing an ownership system and consequently avoids many memory errors, such as dangling pointers and memory leaks. Ownership in Rust denotes a set of rules that govern how the Rust compiler manages memory. The idea of OwS is exclusive ownership, which means that each resource has a unique variable as its owner at any time. Ownership can be transferred among owners. When the owner of a resource goes out of its scope, the resource can be automatically dropped without using the garbage collector. Below, we introduce ownership transferring.

Ownership and Assignments. In Rust, ownership can be transferred in assignments. Consider the code in Listing 1, where line 2 creates a String object o on the heap, and let

s 1

be the owner of o. At line 4, the assignment transfers the ownership of o from

s 1

s 2

. To maintain the unique owner, the assignment performs move semantics, which makes

s 1

become the old owner and no longer valid until it is re-assigned a value again. Therefore, the Rust compiler will issue an error at line 5. This is different with pointer assignments in C, where both

s 1

and

s 2

are valid and can be used. Because

s 2

is the unique owner of o, when the owner

s 2

goes out of its scope, the Rust compiler automatically inserts a drop destructor to free o at line 6, which can avoid memory leaks. Therefore, the code at line 7 is rejected as o has already been destroyed. Now, we take a closer look at line 8. When

s 1

goes out of its scope and tries to free o, the Rust compiler does not insert drop since it finds that the ownership of

s 1

has been moved. This means that an object cannot be freed by any of its old owners (like

s 1

), which can ensure that memory deallocation does not introduce DF.

Listing 1. Transferring ownership in assignments.

Ownership and Functions. Ownership can also be transferred in function calls. When ownership of an object is moved to a callee via parameters, this object is no longer available in the caller. For example, in Listing 2, the function call at line 3 moves the ownership of the object o created at line 2 to takes_ownership, so the Rust compiler will issue an error at line 4, where s becomes the old owner and cannot be accessed in the main function. The Rust compiler performs intra-procedural analysis to insert a drop destructor, which compiles each function individually. It relies on ownership to determine whether the current function has a responsibility to free objects. For example, the Rust compiler first compiles the main function. It finds that s is moved to takes_ownership; therefore, the main function has no responsibility to free the object o. The Rust compiler does not insert drop at line 5. Next, the Rust compiler compiles the takes_ownership function. Because

s s

is a String type that can move ownership, the Rust compiler will automatically insert drop once

s s

goes out of its scope at line 9.

Listing 2. Transferring ownership via parameters.

Besides parameter passing, return values can also transfer ownership. For example, in Listing 3, gives_back moves ownership out from gives_back via return value

s s

to its caller main, which means that gives_back has no responsibility to free the object. Exclusive ownership can ensure that automatic memory deallocation is safe. When the Rust compiler compiles the main function, the object o is dropped automatically once

s 1

goes out of scope at line 6 but nothing happens for s because s is moved. This avoids DF when s and

s 1

go out of their scope (line 6) and both try to free the object o. When compiling the gives_back function, it fails to insert drop to free the object pointed to by

s s

since

s s

is moved out from gives_back. This can avoid UAF if

s s

is freed while

s 1

is used in the main function.

In conclusion, the main ideas of OwS are summarized as follows.

R1: Each resource has a unique owner at any time.

R2: When the owner of a resource goes out of its scope, it deallocates the resource that it owned. Any old owners of the resource cannot deallocate the resource (this can avoid DF and UAF).

In our previous work, SafeOSL ensures that a C program satisfies R1. For such a special C program, SafeMD proposed by this paper borrows the idea of R2 to fix its memory leaks.

Listing 3. Transferring ownership via return values.

4. Approach Overview

We illustrate the algorithm of SafeMD using a simple example in Listing 4. This code satisfies the exclusive ownership. For example, t = foo(p) at line 24 moves ownership of

o 2

into foo1 and, after here, p is no longer used. Before analysis, we remove all free statements from programs. SafeMD will generate the patches that can safely deallocate all allocated objects without introducing UAF and DF, as shown in Listing 5. In addition, the C programs fixed by SafeMD still satisfy the exclusive ownership.

SafeMD includes two steps: (1) collect patch candidates for each object by tracking ownership. The main idea of collecting patch candidates is to track ownership of the object and free the object where the owner last used it. (2) Find a correct patch from patch candidates by solving an exact cover problem over the allocated objects. SafeMD analyzes each function individually and each analysis contains the above two steps. Figure 2 and Figure 3 show the analysis of the main and foo1 function, respectively. We only explain the analysis for the main function below.

Listing 4. code with memory-leaks.

Listing 5. SafeMD-generated patches.

Figure 2. SafeMD for main function.

Figure 3. SafeMD for foo1 function.

Step 1: Collecting Patch Candidates by Ownership Tracking. This analysis step is based on a control flow graph (CGF). The CFG of the main function and analysis results at each node is presented in Figure 2. This analysis maintains owner and patch information for each allocated object as a state of the following form:

< o, n e w O w n e r, o l d O w n e r s, p a t c h, p a t c h N o t >

where o is a heap object represented by its allocation site,

n e w O w n e r

is a pointer who is the unique owner of o,

o l d O w n e r s

is a set of pointers that are the old owners of o,

p a t c h

is a set of patches that can safely deallocate the object and

p a t c h N o t

is a set of unsafe patches that may introduce UAF and DF. Both

p a t c h

and

p a t c h N o t

are denoted by a pair

(n, e)

, which means that an object can be deallocated by inserting a deallocation statement free(e) right after line n, where n is a program point and e is a pointer expression.

For the main function, the analysis of SafeMD starts with the function signature. At line 20, because the main function has no parameters, the initial state is marked as empty. The allocation statement at line 22 creates a new tuple

{〈 o_{1}, p, Ø, {(22, p)}, Ø 〉}

: the allocation site is

o_{1}

, its owner is p and the safe patch is

(22, p)

, which indicates that we can safely free

o_{1}

via owner p after line 22. Now,

o l d O w n e r s

and

p a t c h N o t

are empty.

We first consider the false branch at line 29. The analysis updates the states as follows:

\{\begin{matrix} 〈o_{1}, p, Ø, \{(22, p), (29, p)\}, Ø〉 \\ 〈o_{2}, z, Ø, \{(29, z)\}, Ø〉 \end{matrix}\}

(1)

A new tuple for the new object

o_{2}

allocated at line 29 is created. For the state of

o_{1}

, a new safe patch

(29, p)

is added into

p a t c h

At line 31, the function call updates the states as follows:

\{\begin{matrix} τ_{1} = 〈o_{1}, p, Ø, \{(22, p), (29, p), (31, p)\}, Ø〉 \\ τ_{2} = 〈o_{2}, ⊥, \{z\}, Ø, \{(29, z)\}, \{(31, z)\}〉 \end{matrix}\}

(2)

Because the object

o_{2}

is used as an argument z in foo2(z), to avoid UAF, in state

τ_{2}

, we remove the safe patch

(29, z)

from the

p a t c h

and add it into

p a t c h N o t

. The call foo2(z) moves ownership of

o_{2}

into foo2 via parameter passing; for this, we carry out three changes: (1) we mark

n e w O w n e r

with ⊥ to indicate that the ownership of

o_{2}

is moved. Thus, z becomes the old owner. (2) We reset

p a t c h

to Ø to denote that the main function has no responsibility to free

o_{2}

since it has lost ownership of

o_{2}

. (3) Because the old owner cannot free

o_{2}

, a new unsafe patch

(31, z)

, where z is now an old owner of

o_{2}

, is generated.

Next, we consider the true branch at line 24, where the function call updates the state

{〈 o_{1}, p, Ø, {(22, p)}, Ø 〉}

as follows:

\{\begin{matrix} 〈o_{1}, ⊥, \{p\}, Ø, \{(22, p), (24, p)\}〉 \\ 〈o_{3}, t, Ø, \{(24, t)\}, Ø〉 \end{matrix}\}

(3)

The ownership of object

o_{1}

is moved into foo1, so the update for the state of

o_{1}

is the same as the state of

o_{2}

in (2). Note that foo1 returns a pointer that points to a valid object, so we create a new state: the allocation site is

o_{3}

, its owner is the receiver t and the safe patch is

(24, t)

At line 25, the states are updated as follows:

\{\begin{matrix} τ_{3} = 〈o_{1}, ⊥, \{p\}, Ø, \{(22, p), (24, p), (25, p)\}〉 \\ τ_{4} = 〈o_{3}, t, Ø, \{(25, t)\}, \{(24, t)\}〉 \end{matrix}\}

(4)

τ_{4}

, because

o_{3}

is used by

* t

at line 25, we remove the safe patch

(24, t)

from the state and declare it as unsafe to avoid UAF. Now, the only safe patch for

o_{3}

(25, t)

. In

τ_{3}

, a new unsafe patch

(25, p)

is generated. This is because p is an old owner of

o_{1}

and thus cannot free

o_{1}

after line 25.

At the join point line 33, our analysis maintains each state separately for each different branch. With the states in (2) and (4) as input, the analysis produces the following states as output:

\{\begin{matrix} τ_{1}^{'} = 〈o_{1}, p, Ø, \{(22, p), (29, p), (31, p)\}, Ø〉 \\ τ_{2}^{'} = 〈o_{2}, ⊥, \{z\}, Ø, \{(29, z)\}, \{(31, z)\}〉 \\ τ_{3}^{'} = 〈o_{1}, ⊥, \{p\}, Ø, \{(22, p), (24, p), (25, p)\}〉 \\ τ_{4}^{'} = 〈o_{3}, t, Ø, \{(25, t)\}, \{(24, t)\}〉 \end{matrix}\}

(5)

The return statement returns a value of 0 instead of moving any object’s ownership, so the states

τ_{1}^{'} \sim τ_{4}^{'}

are the same as

τ_{1} \sim τ_{4}

. The analysis finishes with the states in (5).

Step 2: Finding Correct Patches by Solving Exact Cover Problem. After we collect patch candidates in step 1, step 2 is to find correct patches that safely deallocate all objects (i.e., no memory leaks) while not introducing UAF and DF. Finding a correct patch can be reduced to solve an exact cover problem.

First, due to ownership, we collect the valid states whose

n e w_o w n e r

is not ⊥. The valid states denote that the current procedure has responsibility to free the objects. From the owner information of states in (5), we obtain the valid states

V a l i d S t a t e s = {τ_{1}^{'}, τ_{4}^{'}} .

Then, from the patch information of states in (5), we collect the safe patches from

p a t c h

and unsafe patches from

p a t c h N o t

from all states:

\begin{matrix} S a f e & = {(22, p), (29, p), (31, p), (25, t)} \\ U n S a f e & = {(29, z), (31, z), (22, p), (24, p), (25, p), (24, t)} . \end{matrix}

Thus, candidate patches are those in

S a f e

but not in

U n S a f e

\begin{matrix} C a n d P a t c h_{R} = S a f e ∖ U n S a f e = {(29, p), (31, p), (25, t)} \end{matrix}

The patches in

C a n d P a t c h_{R}

cannot incur UAF because the patches that may cause UAF are collected in

U n S a f e

. However, using all patches in

C a n d P a t c h_{R}

may cause DF. For example, using both

(29, p)

and

(31, p)

will incur double-free for

o_{1}

in the false branch. So, we have to find a subset of the candidate patches that does not introduce DF while deallocating all memory objects. This can be solved by an exact cover problem over valid states, which is represented by the following incidence matrix:

	$τ_{1}^{'}$	$τ_{4}^{'}$
$(29, p)$	1	0
$(31, p)$	1	0
$(25, t)$	0	1

Each row r in matrix represents a patch in

C a n d P a t c h_{R}

and each column

τ

represents a valid state in

V a l i d S t a t e s

. The entry in row r and column

τ

is 1 if patch r is included in

p a t c h

of state

τ

and 0 otherwise. For example,

τ_{1}^{'}

contains

(29, p)

p a t c h

, so the entry in row

(29, p)

and column

τ_{1}^{'}

is 1. Solving an exact cover problem represented by the above incidence matrix is achieved by the selection of rows such that each column contains only a single 1 among selected rows. In this example, the correct patches are computed as

{(29, p), (25, t)} \lor {(31, p), (25, t)}

. Both the patches

{(29, p), (25, t)}

and

{(31, p), (25, t)}

cover all states (i.e., no memory leaks) and each state is covered by at most one patch (i.e., no DF). In addition, these two patches satisfy exclusive ownership constraints. Considering that the objects are deallocated as early as possible, we choose

{(29, p), (25, t)}

to free the objects in the main function, as shown in Listing 5.

SafeMD will generate the correct patch

(4, m)

for foo1. The four patches generated by SafeMD for the code in Listing 4 can safely fix the memory leaks across functions. MemFix fails to safely fix this code: it will generate free(t) and free(p) at line 26 in the main function, which may introduce DF. Other fixing techniques can safely fix this code, but the patches generated by them may violate exclusive ownership.

5. Approach Details

This section presents the algorithm of SafeMD, which modifies the algorithm of MemFix to serve only C programs that satisfy exclusive ownership constraints. Section 5.1 defines a core language. The two steps of SafeMD are presented in Section 5.2 and Section 5.3, respectively.

5.1. Language

For simplicity, we formalize SafeMD on top of a simple pointer language. Let P be an input program for SafeMD. A program P is represented by a CFG (

C

,↪,

c_{e}

c_{x}

), where

C

denotes the set of program points, (↪) ⊆

C

C

is the set of flow edges,

c_{1}

↪

c_{2}

indicates that there is a possible flow of execution from

c_{1}

c_{2}

and

c_{e}

and

c_{x}

are the unique entry and exit nodes of P. A program point

c \in C

is associated with a command, denoted

c m d (c)

, as defined by the following grammar:

\begin{matrix} c m d \to alloc (p) | assign (p, e) | return (p) | fun Def (f n a m e, p a r_l i s t, p a r_t_l i s t, r e t_t) \\ | call (f n a m e, a r g_l i s t, a r g_t_l i s t, r e c e i v e r) \\ p \to x | * x | null \\ e, p a r_l i s t, a r g_l i s t, r e c e i v e r \to p | none \\ p a r_t_l i s t, r e t_t, a r g_t_l i s t \to t y p e \\ t y p e \to void | int | \dots | struct | \dots | type * | type [] \end{matrix}

A pointer expression p can be a variable x, a pointer dereference

* x

or a NULL pointer. Allocation command

alloc (p)

creates a new memory object pointed to by p. Assignment command

assign (p, e)

assigns the expression e to p. The command

fun Def (f n a m e, p a r_l i s t, p a r_t_l i s t, r e t_t)

describes a function signature, where

f n a m e

p a r_l i s t

p a r_t_l i s t

and

r e t_t

denote the function name, parameter list, parameter type list and return type, respectively. The command

call (f n a m e, a r g_l i s t, a r g_t_l i s t, r e c e i v e r)

describes the call on function

f n a m e

, where

a r g_l i s t

a r g_t_l i s t

and

r e c e i v e r

represent the argument list, argument type list and a variable who receives the return value, respectively. “none” means empty; for example, if

a r g_l i s t

a r g_t_l i s t

and

r e c e i v e r

equal none, it represents a function call with no arguments and return values. The deallocation statements

free (p)

are ignored because we remove them from the program P before analysis. In addition, the program P must also satisfy exclusive ownership.

5.2. Step 1: Collecting Patch Candidates by Ownership Tracking

The first step of SafeMD is to statically analyze the procedure to collect patch candidates. The idea of this step is to track ownership of the object and free the object where the owner last used it.

5.2.1. Abstract Domain

The abstract domain of the analysis is defined as follows:

A	∈ $D$ = $C \to P (S t a t e)$
s	∈ $S t a t e$ = $A l l o c S i t e \times N e w O w n e r \times O l d O w n e r \times P a t c h \times P a t c h N o t$
o	∈ $A l l o c S i t e$ ⊆ $C$
$n e w O w n e r$	∈ $N e w O w n e r$ = $A P$
$o l d O w n e r s$	∈ $O l d O w n e r$ = $P (A P)$
$p a t c h$	∈ $P a t c h$ = $P (C \times A P)$
$p a t c h N o t$	∈ $P a t c h N o t$ = $P (C \times A P)$
p	∈ $A P$ = ${x, * x, null \| x \in V a r} \cup {⊥}$

V a r

is the finite set of program variables in P.

A l l o c S i t e \subseteq C

is the finite set of allocation sites in P, i.e., the nodes whose associated commands are alloc(p). A domain element

A \in D

is a finite map that maps each program point to a set of reachable states. A state

s = 〈 o, n e w O w n e r, o l d O w n e r s, p a t c h, p a t c h N o t 〉

describes an abstract object with owner and patch information, where

o \in A l l o c S i t e

is the allocation site of the object,

n e w O w n e r \in A P

is an access path that points to the object via a unique owner,

o l d O w n e r s \subseteq A P

is a set of access paths that points to the object via old owners,

p a t c h \subseteq C \times A P

is a set of patches that can safely deallocate the object and

p a t c h N o t \subseteq C \times A P

is a set of unsafe patches.

A P

denotes the set of access paths that can be generated for the given program P. For the language in Section 5.1,

A P

equals the set of pointer expression p, i.e.,

A P

can be a variable x, a pointer dereference

* x

, a NULL pointer or an invalid access path ⊥. Each element in

p a t c h

and

p a t c h N o t

is a pair

(c, p) \in C \times A P

that consists of a program point c and an access path p. A patch

(c, p)

represents a free(p) statement that can be inserted right after the program point c.

5.2.2. Abstract Semantics

SafeMD collects patch candidates for each function individually. For each function, the analysis starts from the program point c where the function is defined, i.e., the node whose command is

fun Def (f n a m e, p a r_l i s t, p a r_t_l i s t, r e t_t)

, and computes an initial set

S_{0}

that consists of initial states.

S_{0}

is defined as follows:

S_{0} = \{\begin{matrix} ⋃_{p a r \in p a r_l i s t} γ_{c} (p a r) & if p a r_l i s t \neq Ø \\ Ø & otherwise . \end{matrix}

γ_{c} (p a r) = \{\begin{matrix} 〈 c, p a r, Ø, {(c, p a r)}, Ø 〉 & if i s P o i n t e r (p a r) = t r u e \\ Ø & otherwise . \end{matrix}

If the function

f n a m e

has no parameter,

S_{0}

is empty; otherwise, it is computed by

γ_{c} : V a r \to S t a t e

which generates an initial state for a parameter

p a r

f n a m e

. The initial states for all parameters form

S_{0}

. For the computation of

γ_{c}

, if a parameter

p a r

points to a heap object (i.e., the function

i s P o i n t e r (x)

equals true), a new state is created since the ownership of objects can be transferred by parameter passing: the allocation site of the object is the program point c, the owner of the object is

p a r

and the safe patch is

(c, p a r)

. For example, in Figure 3,

S_{0}

for foo1 definition at line 1 only contains a state, i.e.,

< o_{1}, m, Ø, {(1, m)}, Ø >

Start with

S_{0}

: SafeMD updates the states at each node based on the command associated with that node until reaching the exit node. In other words, the analysis computes a least fixed point lf_p

F \in D

of the semantics function

F \in D \to D

F (X) = λ c . f_{c} (⨆_{c^{'} ↪ c} X (c^{'}))

where

X \in D

and

f_{c} : P (S t a t e) \to P (S t a t e)

is the transfer function at a program point c:

f_{c} (S) = \{\begin{matrix} S^{'} \cup {s_{n e w}} & if c m d (c) = alloc (p) \\ S^{'} & if c m d (c) = assign (p, e) \\ S^{'} & if c m d (c) = return (p) \\ S^{'} \cup {s_{n e w}} & if c m d (c) = call (_,_,_, p) \land i s P o i n t e r (p) = t r u e \\ S^{'} & if c m d (c) = call (_,_,_, none) . \end{matrix}

where

s_{n e w} = 〈 c, p, Ø, {(c, p)}, Ø 〉

f_{c}

updates the states in S according to different commands. For alloc(p),

f_{c}

not only updates the existing states in S to

S^{'}

but also creates a new state

s_{n e w}

. Similarly, for the command call

(_,_,_, p)

, where p is a pointer, a new state for the object pointed by p is created.

The set

S^{'}

is updated by two transfer functions

ϕ_{c}

and

φ_{c}

S^{'} = ⋃_{s \in S} (φ_{c} \circ ϕ_{c}) (s) .

For a state s, we first update owner information by

ϕ_{c} : S t a t e \to S t a t e

and then patch information by

φ_{c} : S t a t e \to S t a t e

. Next, we define

ϕ_{c}

and

φ_{c}

for different commands.

Given a state s at the program point c,

s = 〈 o, n e w O w n e r, o l d O w n e r s, p a t c h, p a t c h N o t 〉

(1): When $c m d (c)$ = alloc(p), $ϕ_{c}$ makes $n e w O w n e r$ and $o l d O w n e r s$ unchanged:

$\begin{matrix} ϕ_{c} (s) = 〈 o, n e w O w n e r^{'}, o l d O w n e r s^{'}, p a t c h, p a t c h N o t 〉 \\ n e w O w n e r^{'} = n e w O w n e r, o l d O w n e r s^{'} = o l d O w n e r s . \end{matrix}$

Then,

φ_{c}

updates

p a t c h

and

p a t c h N o t

as follows:

\begin{matrix} φ_{c} (s) = 〈 o, n e w O w n e r, o l d O w n e r s, p a t c h^{'}, p a t c h N o t^{'} 〉 \\ p a t c h N o t^{'} = p a t c h N o t \cup G O, p a t c h^{'} = (p a t c h \cup G N) ∖ p a t c h N o t^{'} . \end{matrix}

where

G N = {(c, q) | q = n e w O w n e r}

G O = {(c, q) | q \in o l d O w n e r s}

G N

contains a safe patch that is newly generated at c via a unique owner.

G O

is the set of unsafe patches that are newly generated at c via old owners, because an object cannot be deallocated via old owners. Also, we exclude

p a t c h N o t^{'}

from

p a t c h^{'}

to ensure that

p a t c h^{'}

and

p a t c h N o t^{'}

are disjoint.

(2): When $c m d (c)$ = assign(p, e), $ϕ_{c}$ updates $n e w O w n e r$ and $o l d O w n e r s$ as follows:

$\begin{matrix} ϕ_{c} (s) = 〈 o, n e w O w n e r^{'}, o l d O w n e r s^{'}, p a t c h, p a t c h N o t 〉 \\ n e w O w n e r^{'} = \{\begin{matrix} p & if e = n e w O w n e r \\ n e w O w n e r & o t h e r w i s e . \end{matrix} \\ o l d O w n e r s^{'} = \{\begin{matrix} o l d O w n e r s \cup {e} ∖ n e w O w n e r^{'} & if e = n e w O w n e r \\ o l d O w n e r s & o t h e r w i s e . \end{matrix} \end{matrix}$

An assignment p = e can transfer the ownership of object o from e to p, making p and e become the new owner and old owner, respectively. We also exclude

n e w O w n e r^{'}

from

o l d O w n e r s^{'}

to ensure that

n e w O w n e r^{'}

and

o l d O w n e r s^{'}

are disjoint.

Then,

φ_{c}

updates

p a t c h

and

p a t c h N o t

as follows:

\begin{matrix} φ_{c} (s) = 〈 o, n e w O w n e r, o l d O w n e r s, p a t c h^{'}, p a t c h N o t^{'} 〉 \\ p a t c h N o t^{'} = \{\begin{matrix} p a t c h N o t \cup p a t c h \cup G O & if o is used at c \\ p a t c h N o t \cup G O & otherwise . \end{matrix} \\ p a t c h^{'} = \{\begin{matrix} G N ∖ p a t c h N o t^{'} & if o is used at c \\ (p a t c h \cup G N) ∖ p a t c h N o t^{'} & otherwise . \end{matrix} \end{matrix}

The condition “o is used at c” contains two cases. In the first case p = e, where o is used by expression e and transfers the ownership of o from e to p, the only safe patch is

(c, p)

generated by

G N

. For

p a t c h N o t^{'}

, it adds a set of new unsafe patches related to old owners that contains

p a t c h

and

G O

. The pointer assignment q = m at line 7 in Figure 3 is such an example.

Consider the second case p = e, where o is used by expression e but does not transfer the ownership of o: to prevent UAF, the safe patches are removed from

p a t c h

and added to

p a t c h N o t

, and the only safe patch is generated by

G N

. Also,

G O

is included in

p a t c h N o t^{'}

. For example, the assignment ... = *q in Figure 3 uses the object pointed to by q via dereference expression

* q

, but it does not transfer the ownership of the object.

For the otherwise case, where o is not used at c, we only merge new safe patches generated by

G N

with

p a t c h

and unsafe patches generated by

G O

with

p a t c h N o t

(3): When $c m d (c)$ = return(p), $ϕ_{c}$ updates $n e w O w n e r$ and $o l d O w n e r s$ as follows:

$\begin{matrix} ϕ_{c} (s) = 〈 o, n e w O w n e r^{'}, o l d O w n e r s^{'}, p a t c h, p a t c h N o t 〉 \\ o l d O w n e r s^{'} = \{\begin{matrix} (o l d O w n e r s \cup {p}) ∖ n e w O w n e r^{'} & if p = n e w O w n e r \\ o l d O w n e r s & o t h e r w i s e . \end{matrix} \\ n e w O w n e r^{'} = \{\begin{matrix} ⊥ & if p = n e w O w n e r \\ n e w O w n e r & otherwise . \end{matrix} \end{matrix}$

The ownership of objects can be transferred by return values. If return value p equals the owner of object o in state s (i.e.,

p = n e w O w n e r

), then the ownership of o is moved out from the callee and p becomes the old owner. We use the symbol ⊥ to indicate that the ownership of o is moved.

Then,

φ_{c}

updates

p a t c h

and

p a t c h N o t

as follows:

\begin{matrix} φ_{c} (s) = 〈 o, n e w O w n e r, o l d O w n e r s, p a t c h^{'}, p a t c h N o t^{'} 〉 \\ p a t c h N o t^{'} = \{\begin{matrix} p a t c h N o t \cup p a t c h & if n e w O w n e r = ⊥ \\ p a t c h N o t & otherwise . \end{matrix} p a t c h^{'} = \{\begin{matrix} Ø & if n e w O w n e r = ⊥ \\ p a t c h & otherwise . \end{matrix} \end{matrix}

n e w O w n e r

of object o is updated to ⊥ by

ϕ_{c}

, the safe patches in

p a t c h

become unsafe because o is returned to the caller. We reset

p a t c h

to Ø to indicate that the callee cannot free o since it has lost ownership of o. The responsibility for deallocating o falls on the caller. Consider the foo1 in Figure 3: SafeMD will merge the states computed from the if-else branch as follows:

\{\begin{matrix} τ_{1} = 〈o_{1}, m, Ø, \{(1, m), (4, m), (10, m)\}, Ø〉 \\ τ_{2} = 〈o_{2}, q, Ø, \{(10, q)\}, \{(4, q)\}〉 \\ τ_{3} = 〈o_{1}, q, \{m\}, \{(10, q)\}, \{(1, m), (8, m), (8, q), (10, m)\}〉 \end{matrix}\}

For the objects in state

τ_{2}

and

τ_{3}

, because their ownership is moved out from foo1 by return statement “return q”, their

n e w O w n e r

equals ⊥ and q becomes the old owner. Also, their

p a t c h

is reset to Ø. For state

τ_{1}

, it remains unchanged. The update result is shown below.

\{\begin{matrix} τ_{1} = 〈o_{1}, m, Ø, \{(1, m), (4, m), (10, m)\}, Ø〉 \\ τ_{2}^{'} = 〈o_{2}, ⊥, \{q\}, Ø, \{(4, q), (10, q)\}〉 \\ τ_{3}^{'} = 〈o_{1}, ⊥, \{m, q\}, Ø, \{(8, q), (1, m), (8, m), (10, m), (10, q)\}〉 \end{matrix}\}

(4): When $c m d (c)$ = call( $f n a m e$ , $a r g_l i s t$ , $a r g_t_l i s t, r e c e i v e r)$ , $ϕ_{c}$ updates $n e w O w n e r$ and $o l d O w n e r s$ as follows:

$\begin{matrix} ϕ_{c} (s) = 〈 o, n e w O w n e r^{'}, o l d O w n e r s^{'}, p a t c h, p a t c h N o t 〉 \\ o l d O w n e r s^{'} = \{\begin{matrix} (o l d O w n e r s \cup {n e w O w n e r}) ∖ n e w O w n e r^{'} & if n e w O w n e r \in a r g_l i s t \\ o l d O w n e r s & otherwise . \end{matrix} \\ n e w O w n e r^{'} = \{\begin{matrix} ⊥ & if n e w O w n e r \in a r g_l i s t \\ n e w O w n e r & otherwise . \end{matrix} \end{matrix}$

The ownership of objects can be transferred to the callee by parameter passing. If

n e w O w n e r

of o is passed to

f n a m e

as an argument, i.e.,

n e w O w n e r \in a r g_l i s t

, then

n e w O w n e r

becomes the old owner. We mark

n e w O w n e r

with ⊥ to indicate that the ownership of o is moved.

Then,

φ_{c}

updates

p a t c h

and

p a t c h N o t

as follows:

\begin{matrix} φ_{c} (s) = 〈 o, n e w O w n e r, o l d O w n e r s, p a t c h^{'}, p a t c h N o t^{'} 〉 \\ p a t c h N o t^{'} = \{\begin{matrix} p a t c h N o t \cup p a t c h \cup G O & if n e w O w n e r = ⊥ \\ p a t c h N o t \cup p a t c h \cup G O & if n e w O w n e r \neq ⊥ but o is used in a r g_l i s t \\ p a t c h N o t \cup G O & otherwise . \end{matrix} \\ p a t c h^{'} = \{\begin{matrix} Ø & if n e w O w n e r = ⊥ \\ G N ∖ p a t c h N o t^{'} & if n e w O w n e r \neq ⊥ but o is used in a r g_l i s t \\ {p a t c h \cup G N} ∖ p a t c h N o t^{'} & otherwise . \end{matrix} \end{matrix}

We discuss three cases. The first case is

n e w O w n e r = ⊥

, which indicates that the ownership of object is moved. For the

p a t c h N o t^{'}

, it adds a set of new unsafe patches related to old owners that contains

p a t c h

and

G O

. We reset the

p a t c h

to Ø to denote that the caller is not responsible for deallocating the object since the ownership of the object is moved to the callee.

The second case is “

n e w O w n e r \neq ⊥ b u t o i s u s e d a t c

”, which indicates that the ownership of object o is not transferred but o is used (e.g., by pointer dereference) in arguments. Because o is used, the safe patches in

p a t c h

are added to

p a t c h N o t

to avoid UAF, and the only safe patch is generated by

G N

. For example, function call foo(*p), where p is a pointer that points to a valid object, belongs to this case.

For the

o t h e r w i s e

case, where the ownership of object o is not transferred and o is also not used, we merge new safe patches generated by

G N

with

p a t c h

and unsafe patches generated by

G O

with

p a t c h N o t

5.3. Step 2: Finding Correct Patches by Solving Exact Cover Problem

Once the owner and patch information for each object are collected in step 1, the second step of SafeMD is to find the correct patches that can safely deallocate all allocated objects (no ML) while not introducing UAF and DF. MemFix found correct patches by solving an exact cover problem. We use this method but modify it because ownership is considered.

Let

R = (l f_{p} F) (c_{x}) \subseteq S t a t e

be the set of reachable states computed by step 1 at the exit node of the program. We first give the definition of candidate correct patches and valid states from R.

Definition 1

(Candidate Correct Patches). The set of candidate correct patches collects the possible safe patches for each object, which are defined as follows:

\begin{matrix} S a f e & = ⋃ {p a t c h | 〈_,_,_, p a t c h,_〉 \in R} . \\ U n S a f e & = ⋃ {p a t c h N o t | 〈_,_,_,_, p a t c h N o t 〉 \in R} . \\ C a n d P a t c h_{R} & = S a f e ∖ U n S a f e . \end{matrix}

S a f e

collects the patches that can safely deallocate an object, and

U n S a f e

collects the unsafe patches that can cause UAF and DF (caused by pointer aliasing). We exclude

U n S a f e

from

S a f e

to obtain the set

C a n d P a t c h_{R}

, which is used to find correct patches.

The patches in

C a n d P a t c h_{R}

cannot cause UAF since these unsafe patches are all collected in

U n S a f e

and thus already excluded from

C a n d P a t c h_{R}

. In addition, the patches in

C a n d P a t c h_{R}

also satisfy exclusive ownership constraints; that is, old owners cannot be used to deallocate objects. After all, the C programs satisfying exclusive ownership are more safe than normal C programs. However, the patches in

C a n d P a t c h_{R}

may cause DF. Thanks to ownership, plenty of unsafe patches caused by pointer aliasing are excluded from

C a n d P a t c h_{R}

, since SafeMD collects the unsafe patches related to old owners (aliases) in

U n S a f e

. This can noticeably reduce the search space for finding the correct patches as we avoid verifying patch combinations containing pointer aliasing. However,

C a n d P a t c h_{R}

cannot exclude DF caused by freeing the memory multiple times using the same pointer. For example, if an object can be safely deallocated at line 4 and line 5, then

(3, p)

and

(4, p)

may be in

C a n d P a t c h_{R}

, and using both of them will incur DF. Therefore, this step aims to find the correct patches that do not cause such a kind of DF.

Definition 2

(Valid States). The valid states indicate that the ownership of objects has not been moved and thus the objects should be deallocated in the current function, which is defined as follows:

\begin{matrix} V a l i d S t a t e s = ⋃ {〈_, n e w O w n e r,_,_,_〉 \in R | n e w O w n e r \neq ⊥} . \end{matrix}

Next, we present the definition of the problem for finding the correct patches, which can be reduced into solving an exact cover problem over valid states.

Definition 3

(The Problem of Finding Correct Patches). Let

M : C a n d P a t c h_{R} \to P (V a l i d S t a t e s)

be the function from candidate correct patches to the valid states that can be safely deallocated by the corresponding patches:

\begin{matrix} M (r) = {<_,_,_, p a t c h,_> \in V a l i d S t a t e s | r \in p a t c h} . \end{matrix}

From M, find a subset

C o r r e c t \subseteq C a n d P a t c h_{R}

such that

$V a l i d S t a t e s = ⋃_{r \in C o r r e c t} M (r)$ , which means that $C o r r e c t$ covers all valid states;
$M (r_{1}) ⋂ M (r_{2}) = Ø$ for all $r_{1}$ , $r_{2} \in C o r r e c t$ , which means that the chosen subsets in $M (r)$ (where $r \in C o r r e c t$ ) are pairwise disjoint.

M describes the incidence matrix in Figure 2 and Figure 3. The first condition means that all allocated objects must be deallocated, which guarantees the absence of memory leaks. The second condition means that every allocated object is deallocated no more than once, which guarantees the absence of DF. Recall that UAF is avoided in

C a n d P a t c h_{R}

6. Evaluation

We evaluated SafeMD with two different benchmark sets and compared it with MemFix, a static-based approach for fixing memory leaks in C/C++ programs. Our experiments were performed on a PC with Intel Core i7-7700 CPU (3.60 GHZ) and 8 GB RAM running 64-bit Ubuntu 18.04.3 LTS.

6.1. Implementation

We implemented SafeMD as a stand-alone tool [35]. The framework of SafeMD is shown in Figure 4. We first make use of the open-source code analysis platform for C/C++ based on code property graphs, Joern [36], to extract CFGs for all functions in our benchmarks. Then, all CFGs (dot files) are loaded into NetworkX for graph traversal to collect patch candidates (Section 5.2) and calculate correct patches (Section 5.3). Our implementation supports the C standard memory allocators malloc and calloc except for realloc, since realloc may be fixed safely by adding conditional statements, which is beyond the scope of the current algorithm of SafeMD.

In step 1, recall that, in the generation of initial states

S_{0}

mentioned in Section 5.2.2, a new state is created if the parameter points to a heap object. Although SafeMD analyzes each function individually, to improve the accuracy of

S_{0}

, SafeMD starts from the main function and proceeds according to the call graph, and sets a flag to guide the generation of

S_{0}

for each callee function. In step 2, the exact cover problem is NP-complete. Our implementation uses existing DFS-based search algorithm to solve the exact cover problem. This algorithm takes optimization strategies to improve the search speed for the exact cover problem. Also, our algorithm of finding correct patches will not find all solutions; instead, it will return the first solution that it finds.

6.2. Benchmark

We use two benchmarks to evaluate SafeMD. In Table 1, the first benchmark is relevant to memory leak (CWE-401) in Juliet Test Suite (JTS) for C, and “int_malloc” and “twoIntsStruct_malloc” mean a memory object pointed to by an integer pointer and struct pointer leaks, respectively. The second benchmark has 26 model programs with memory leaks. These programs are selected from 50 test programs that are provided by [10], who constructed them from five GitHub open-source C repositories.

To evaluate SafeMD, the programs in these benchmarks are first modified to satisfy exclusive ownership constraints. For the C programs that are difficult to modify, they have been removed from the benchmarks, leaving a total of 102 programs (76 in CWE-401 and 26 in open-source C repositories). Note that the modifications in benchmarks do not guarantee semantics preservation since, in this experiment, we aim to provide the test programs that satisfy exclusive ownership constraints, and how to rewrite C programs to satisfy exclusive ownership is beyond the scope of this paper.

In our modifications for C programs in these benchmarks, we only focus on pointers returned by dynamic memory allocation functions and modify these pointers to make their use satisfy ownership constraints. For a better understanding of the experimental results, we briefly list the main code features of C programs that satisfy exclusive ownership below.

Assignments. For the pointer assignments, such as q = p;, if variable p points to a heap object, then p cannot be used after this assignment until it is re-assigned. Particularly, for the assignment of compound data type (e.g., struct), such as q = p.f;, the struct itself p cannot be used after assignment, but other members of struct p are still available.
Function calls. For the calls that call user-defined functions, such as foo(p), if variable p points to a heap object, then, in the caller, p cannot be used after the call site. For the library functions (except for standard allocation and deallocation functions), such as memcpy(p,...), because we cannot modify the code of library functions, we assume that the ownership of p does not move into library functions, and thus p is still available in the caller.

6.3. Results

Table 1 shows the results on JTS and open-source C repositories. #Loc represents the average number of lines of code (after modification). SafeMD analyzes each function individually, so we count the number of user-defined functions in all test programs and use #Function to list the average number of functions that the programs have. SafeMD/MemFix/#Pgm represents the number of test cases that can be fixed by SafeMD and MemFix. #Time reports the maximum execution time performed by SafeMD or MemFix. For test cases in CWE-401, because these programs have relatively simple structures and data types, they can easily be modified to satisfy exclusive ownership while preserving semantics. Both SafeMD and MemFix can fix a total of 72 programs (out of 76 programs) with an accuracy of 95%. Four test programs including function pointers are not supported by SafeMD and MemFix. The maximum execution time for fixing these test cases is smaller than 1.0 s.

For open-source C repositories, SafeMD and MemFix can fix a total of 18 programs (out of 26 programs) with an accuracy of 69% and 13 programs (out of 26 programs) with an accuracy of 50%, respectively. We manually look at these programs to investigate the shortcomings of SafeMD. We consider the following four scenarios (S1–S4):

S1. Memory leak fixed by both SafeMD and Memfix. For these test cases (e.g., Binutils), we find that the allocated and leaked object is largely limited in scope within a procedure and has limited leaked paths. The code snippet from binutils in Listing 6 shows the leak pattern that can be fixed by both SafeMD and Memfix.

Listing 6. Memory leak pattern fixed by both SafeMD and Memfix.

In the above code snippet that satisfies exclusive ownership, the allocated and leaked object

o 1

is limited in the current procedure. An error path refers to a program path where an abrupt return happens under abnormal situations. The object

o 1

allocated at line 1 is leaked on both the error path (when goto FAIL is executed on line 3) and the non-error path, and both SafeMD and Memfix can safely fix such memory leaks by generating patches for the error path and non-error path. However, the patches generated by Memfix may violate exclusive ownership.

S2. Memory leak fixed by SafeMD only. For these test cases, we find that the leaked object is transferred to another procedure. The code snippet in Listing 7 shows the leak pattern that can be fixed only by SafeMD.

Listing 7. Memory leak pattern fixed by SafeMD only.

In the above code snippet that satisfies exclusive ownership, MemFix will perform interprocedural analysis and insert free(q) and free(p) at line 18 and 19 to free memory objects, but it may introduce DF for object

o 2

(at line 19) when the else branch is true in the f function. In this case, to safely fix memory leaks, the correct patch if(flag == 1) free(q); is required at line 18. But, MemFix fails to fix because it is unable to generate patches with conditional statements. However, SafeMD can safely fix these memory leaks without combining conditional expressions, since it tracks the owner of the leaked object and frees the leaked object from its owner. Specifically, SafeMD analyzes function f and g individually, and generates free(m) and free(q) at line 5 and 18, respectively. The function call at line 17 transfers the ownership of

o 2

from p to m; therefore, SafeMD concludes that f is responsible for freeing the object

o 2

if the ownership of

o 2

is not returned by f; that is, SafeMD will generate free(m) in the true branch of f since this execution path does not return the ownership of

o 2

to g. Because this code satisfies exclusive ownership, where p is no longer used after line 17, the patch free(m) is safe. On the contrary, in f, the execution path where the false branch is taken returns the ownership of

o 2

to g; therefore, SafeMD concludes that g is responsible for freeing the object

o 2

and thus generates free(q) at line 18. Compared with SafeMD, MemFix tries to free

o 2

in the g function, but it fails since it cannot generate conditional patches.

S3. Memory leak fixed by Memfix only. These are cases that cannot be fixed by SafeMD primarily due to the ownership limitations of SafeMD, which means that the strictness of ownership enforced in C makes it unsafe for SafeMD to free objects in some situations.

One situation that can be fixed by Memfix but not by SafeMD is when ownership constraints are enforced on operations related to arrays and linked lists. Consider the Listing 8’s code pattern found in Git. Line 3 creates an object

o 1

. This code satisfies the ownership constraints since the old owner

m s g 2

is no longer used after assignment ptr2 = msg2. SafeMD will generate free(ptr2) at line 10, leading to an invalid free because

p t r 2

does not point to the start address of

o 1

. However, Memfix does not enforce that the patches satisfy the ownership constraints, so it will generate free(msg2) at line 10, which is a safe patch. One way of addressing this problem for SafeMD is to modify this code (which is beyond the scope of this paper); for example, we can modify the assignment ptr2 = msg2 to ptr2 = &msg2, where the ownership is not transferred to

p t r 2

; thus, SafeMD can generate the safe patch free(msg2) at line 10.

Listing 8. Memory leak pattern fixed by Memfix only.

S4. Memory leak not fixed by both SafeMD and Memfix. For these test cases, they are related to reallocated memory and function pointers, and SafeMD and Memfix cannot deal with these features. For example, the test cases in Git often store memory objects using reallocation, which makes the portion of successful fixing relatively low.

To sum up, our comparison demonstrates that SafeMD can fix more memory-leak patterns than Memfix (see S2) when the input C programs satisfy exclusive ownership. Also, the patches generated by SafeMD make the input C programs still satisfy exclusive ownership (see S1). However, because of the exclusive ownership satisfied by C programs, SafeMD can lead to invalid frees on the array (see S3).

7. Conclusions

We propose SafeMD, an ownership-based memory deallocation for C programs that satisfy exclusive ownership. Benefiting from ownership, SafeMD obviates alias and inter-procedural analysis during collecting patch candidates. Also, the patches generated by SafeMD make the input C programs still satisfy exclusive ownership. Our experiment shows the effectiveness of SafeMD in fixing memory leaks of real-world C programs. It also shows that the ownership system of Rust can be used to guarantee the memory safety of C language. As future work, we plan to consider more memory allocators and relax ownership constraints according to the semantics of C language (e.g., array traversal) to make SafeMD more practical (ease the problem of S3 mentioned in Section 6.3).

Author Contributions

Methodology, X.Y.; validation, Z.H. and G.S.; data curation, S.K. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The original data presented in this study are openly available in https://bitbucket.org/yxhnuaa/safemd/src/master/ (accessed on 10 October 2024).

Conflicts of Interest

The authors declare no conflicts of interest.

References

Chen, Z.; Gu, Y.; Huang, Z.Q.; Zheng, J.; Liu, C.; Liu, Z.Y. Model checking aircraft controller software: A case study. Softw. Pract. Exp. 2015, 45, 989–1017. [Google Scholar] [CrossRef]
Wang, W.W. MLEE:Effective Detection of Memory Leaks on Early-Exit Paths in OS Kernels. In Proceedings of the 2021 USENIX Annual Technical Conference, USENIX ATC, Vancouver, BC, Canada, 14–16 July 2021. [Google Scholar]
CVE-CVE-2022–27819. MITRE Corporation. Available online: https://cve.mitre.org/cgi-bin/cvename.cgi?name=2022-27819 (accessed on 10 October 2024).
CVE-CVE-2017–1081. MITRE Corporation. Available online: https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2017-1081 (accessed on 10 October 2024).
Rust. Rust Team. Available online: https://www.rust-lang.org/ (accessed on 10 October 2024).
Yin, X.H.; Huang, Z.Q.; Kan, S.L.; Shen, G.H.; Liu, Y.; Wang, F. SafeOSL: Ensuring memory safety of C via ownership-based intermediate language. Softw. Pract. Exp. 2022, 52, 1114–1142. [Google Scholar] [CrossRef]
Gao, Q.; Xiong, Y.F.; Mi, Y.H.; Zhang, L.; Yang, W.K.; Zhou, A.P.; Xie, B.; Mei, H. Safe Memory-Leak Fixing for C Programs. In Proceedings of the 37th IEEE/ACM International Conference on Software Engineering, Florence, Italy, 16–24 May 2015; Volume 1. [Google Scholar]
Yan, H.; Sui, Y.L.; Chen, S.P.; Xue, J.L. Automated memory leak fixing on value-flow slices for C programs. In Proceedings of the 31st Annual ACM Symposium on Applied Computing, Pisa, Italy, 4–8 April 2016. [Google Scholar]
Tonder, R.J.; Goues, C.L. Static automated program repair for heap properties. In Proceedings of the 40th International Conference on Software Engineering, Gothenburg, Sweden, 27 May–3 June 2018. [Google Scholar]
Lee, J.; Hong, S.H.; Oh, H.J. MemFix: Static analysis-based repair of memory deallocation errors for C. In Proceedings of the 2018 ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Lake Buena Vista, FL, USA, 4–9 November 2018. [Google Scholar]
Hong, S.H.; Lee, J.; Lee, J.S.; Oh, H.J. SAVER: Scalable, precise, and safe memory-error repair. In Proceedings of the 42nd International Conference on Software Engineering, Seoul, Republic of Korea, 27 June–19 July 2020. [Google Scholar]
Juliet Test Suite 1.2. NSA Center for Assured Software. Available online: https://samate.nist.gov/SRD/view.php?tsID=86 (accessed on 10 October 2024).
Smaragdakis, Y.; Kastrinis, G. Pointer Analysis. Found. Trends Program. Lang. 2015, 2, 1–69. [Google Scholar] [CrossRef]
Balatsouras, G.; Ferles, K.; Kastrinis, G.; Smaragdakis, Y. A Datalog model of must-alias analysis. In Proceedings of the 6th ACM SIGPLAN International Workshop on State of the Art in Program Analysis, Barcelona, Spain, 18 June 2017. [Google Scholar]
Yu, B.; Tian, C.; Zhang, N.; Duan, Z.; Du, H. A dynamic approach to detecting, eliminating and fixing memory leaks. J. Comb. Optim. 2021, 42, 409–426. [Google Scholar] [CrossRef]
Clause, J.; Orso, A. LEAKPOINT: Pinpointing the causes of memory leaks. In Proceedings of the 32nd International Conference on Software Engineering, Cape Town, South Africa, 1–8 May 2010. [Google Scholar]
Murali, A.; Alfadel, M.; Nagappan, M.; Xu, M.; Sun, C.N. AddressWatcher: Sanitizer-Based Localization of Memory Leak Fixes. IEEE Trans. Softw. Eng. 2024, 50, 2398–2411. [Google Scholar] [CrossRef]
Clarke, D.G.; Potter, J.; Noble, J. Ownership Types for Flexible Alias Protection. In Proceedings of the ACM SIGPLAN Conference on Object-Oriented Programming Systems, Languages and Applications, Vancouver, BC, Canada, 18–22 October 1998. [Google Scholar]
Clarke, D.G.; Ostlund, J.; Sergey, I.; Wrigstad, T. Ownership types: A survey. Aliasing in object-oriented programming. Types Anal. Verif. 2013, 7850, 15–58. [Google Scholar]
Boyapati, C.; Lee, R.; Rinard, M.C. Ownership types for safe programming: Preventing data races and deadlocks. In Proceedings of the ACM SIGPLAN Conference on Object-Oriented Programming Systems, Languages and Applications, Seattle, WA, USA, 4–8 November 2002. [Google Scholar]
Kloos, J.; Majumdar, R.; Vafeiadis, V. Asynchronous Liquid Separation Types. In Proceedings of the 29th European Conference on Object-Oriented Programming, Prague, Czech Republic, 5–10 July 2015; Volume 37. [Google Scholar]
Heine, D.L.; Lam, M.S. A practical flow-sensitive and context-sensitive C and C++ memory leak detector. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, San Diego, CA, USA, 9–11 June 2003. [Google Scholar]
Swamy, N.; Hicks, M.W.; Morrisett, G.; Grossman, D.; Jim, T. Safe manual memory management in Cyclone. Sci. Comput. Program. 2006, 62, 122–144. [Google Scholar] [CrossRef]
Suenaga, K.; Kobayashi, N. Fractional Ownerships for Safe Memory Deallocation. In Proceedings of the 7th Asian Symposium on Programming Languages and Systems, Seoul, Republic of Korea, 14–16 December 2009; Volume 5904. [Google Scholar]
Sonobe, T.; Suenaga, K.; Igarashi, A. Automatic Memory Management Based on Program Transformation Using Ownership. In Proceedings of the 12th Asian Symposium on Programming Languages and Systems, Singapore, 17–19 November 2014; Volume 8858. [Google Scholar]
Toman, J.; Pernsteiner, S.; Torlak, E. Crust: A Bounded Verifier for Rust (N). In Proceedings of the 30th IEEE/ACM International Conference on Automated Software Engineering, Lincoln, NE, USA, 9–13 November 2015. [Google Scholar]
Astrauskas, V.; Bílý, A.; Fiala, J.; Grannan, Z.; Zhang, M.; Matheja, C.; Müller, P.; Poli, F.; Summers, A.J. The Prusti Project: Formal Verification for Rust. In Proceedings of the 14th International Symposium on NASA Formal Methods, Pasadena, CA, USA, 24–27 May 2022; Volume 13260. [Google Scholar]
Lattuada, A.; Hance, T.; Cho, C.; Brun, M.; Subasinghe, I.; Zhou, Y.; Howell, J.; Parno, B.; Hawblitzel, C. Verus: Verifying Rust Programs using Linear Ghost Types. Proc. ACM Program. Lang. 2023, 7, 286–315. [Google Scholar] [CrossRef]
Qin, B.Q.; Chen, Y.L.; Yu, Z.M.; Song, L.H.; Zhang, Y.Y. Understanding memory and thread safety practices and issues in real-world Rust programs. In Proceedings of the 41st ACM SIGPLAN International Conference on Programming Language Design and Implementation, London, UK, 15–20 June 2020. [Google Scholar]
Xu, H.; Chen, Z.B.; Sun, M.S.; Zhou, Y.F.; Lyu, M.R. Memory-Safety Challenge Considered Solved? An In-Depth Study with All Rust CVEs. Proc. ACM Program. Lang. 2022, 31, 1–25. [Google Scholar] [CrossRef]
Emre, M.; Schroeder, R.; Dewey, K.; Hardekopf, B. Translating C to safer Rust. Proc. ACM Program. Lang. 2021, 5, 1–29. [Google Scholar] [CrossRef]
Emre, M.; Boyland, P.; Parekh, A.; Schroeder, R.; Dewey, K.; Hardekopf, B. Aliasing Limits on Translating C to Safe Rust. Proc. ACM Program. Lang. 2023, 7, 551–579. [Google Scholar] [CrossRef]
Zhang, H.L.; David, C.; Yu, Y.J.; Wang, M. Ownership Guided C to Rust Translation. In Proceedings of the 35th International Conference on Computer Aided Verification, Paris, France, 17–22 July 2023; Volume 13966. [Google Scholar]
Jung, R.; Jourdan, J.H.; Krebbers, R.; Dreyer, D. RustBelt: Securing the foundations of the rust programming language. ACM Trans. Softw. Eng. Methodol. 2018, 2, 1–34. [Google Scholar] [CrossRef]
SafeMD. Xiaohua Yin. Available online: https://bitbucket.org/yxhnuaa/safemd/src/master/ (accessed on 10 October 2024).
joern. ShiftLeft Corporation. Available online: https://joern.io/ (accessed on 10 October 2024).

Figure 1. The ownership-based framework of ensuring memory safety of C programs.

Figure 4. The framework of SafeMD.

Table 1. Evaluation results on CWE-401 and open-source C repositories.

Benchmark	Program	#Loc	#Function	SafeMD/MemFix/#Pgm.	#Time (s)
CWE-401	int_malloc	82	10	36/36/38	<1.0
CWE-401	twoIntsStruct_malloc	92	9	36/36/38	<1.0
Total				72/72/76
Open-Source C Repo.	Binutils	127	6	4/4/5	<1.0
	Git	150	5	2/3/5	<1.0
	OpenSSH	150	4	5/2/6	<1.0
	OpenSSL	134	3	3/1/4	<1.0
	Tmux	154	6	4/3/6	<1.0
Total				18/13/26

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yin, X.; Huang, Z.; Kan, S.; Shen, G. SafeMD: Ownership-Based Safe Memory Deallocation for C Programs. Electronics 2024, 13, 4307. https://doi.org/10.3390/electronics13214307

AMA Style

Yin X, Huang Z, Kan S, Shen G. SafeMD: Ownership-Based Safe Memory Deallocation for C Programs. Electronics. 2024; 13(21):4307. https://doi.org/10.3390/electronics13214307

Chicago/Turabian Style

Yin, Xiaohua, Zhiqiu Huang, Shuanglong Kan, and Guohua Shen. 2024. "SafeMD: Ownership-Based Safe Memory Deallocation for C Programs" Electronics 13, no. 21: 4307. https://doi.org/10.3390/electronics13214307

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

SafeMD: Ownership-Based Safe Memory Deallocation for C Programs

Abstract

1. Introduction

2. Related Work

2.1. Approaches for Memory-Leak Fixing

2.2. Ownership for Memory Safety

3. Ownership System in Rust

4. Approach Overview

5. Approach Details

5.1. Language

5.2. Step 1: Collecting Patch Candidates by Ownership Tracking

5.2.1. Abstract Domain

5.2.2. Abstract Semantics

5.3. Step 2: Finding Correct Patches by Solving Exact Cover Problem

6. Evaluation

6.1. Implementation

6.2. Benchmark

6.3. Results

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI