1. Introduction
C is widely used for implementing system and embedded software, which are usually safety-critical systems [
1,
2]. However, their manual memory management can easily produce memory leaks (MLs) in C programs. Memory leaks mainly occur when a programmer allocates an object but forgets to deallocate it. Memory leaks may have a large negative impact on software systems if not carefully examined and fixed. In fact, memory leaks are direct sources of security vulnerabilities. Some memory leak vulnerabilities have been disclosed in Linux kernels (e.g., CVE-2022-27819 [
3], CVE-2017-10810 [
4]).
Recently, an emerging programming language designed for highly safe systems, i.e., Rust [
5], has received an increasing amount of attention. Compared with C/C++, Rust introduces an ownership system (OwS) to provide memory safety at compile time, which can avoid many memory errors, such as dangling pointers, data races and memory leaks. The basic idea of OwS is exclusive ownership, i.e., at any time, each resource has a unique owner. When the unique owner of a resource goes out of its scope, the resource can be automatically dropped without using the garbage collector. Because of the unique owner, this automatic drop scheme is safe, i.e., it does not incur new errors like use-after-free (UAF) and double-free (DF).
In light of Rust’s promise of safety, the emergence of OwS in Rust provides a new insight to guarantee the memory safety of C programs. Therefore, in our previous work [
6], as shown in
Figure 1, we developed a formal ownership checker, named SafeOSL, to verify whether a C program satisfies the exclusive ownership constraint. If a C program passes the checking of SafeOSL, it means that this C program satisfies exclusive ownership. For such a special C program, in this paper, we further propose an ownership-based memory deallocation, named SafeMD, to fix memory leaks. The output of SafeMD is a C program that is free of memory leaks, and, most importantly, the C program, after the repair of SafeMD, still satisfies exclusive ownership. Usually, a C program that satisfies the exclusive ownership is safer than its normal version.
Many static techniques on memory-leak fixing have been proposed in the program repair community [
7,
8,
9,
10,
11]. If the input C programs satisfy exclusive ownership, these techniques suffer from some drawbacks when fixing memory leaks. Firstly, some techniques can fix memory leaks but may introduce new errors, like UAF and DF. Secondly, some techniques can safely fix memory leaks but are complex as they often rely on alias and inter-procedural analysis. Thirdly, some techniques can safely fix memory leaks, but the patches that they generate cannot guarantee that the C programs satisfy the exclusive ownership. Usually, a C program that satisfies the exclusive ownership is safer than its normal version.
In this paper, we propose an ownership-based memory deallocation, named SafeMD, to fix memory leaks of C programs that satisfy exclusive ownership. SafeMD can generate a set of free statements to safely deallocate all allocated memory objects without introducing UAF and DF. SafeMD includes two steps: (1) using a static analysis that collects patch candidates for each allocated object. The main idea of collecting patch candidates is to track ownership of the object and free the object where the owner last used it. Benefiting from the input C programs satisfying exclusive ownership, this step obviates alias and inter-procedural analysis. Also, the patch candidates collected satisfy exclusive ownership. (2) Finding correct patches by solving an exact cover problem. Because ownership designates which function is responsible for deallocating memory objects, SafeMD simplifies inter-procedural analysis to intra-procedural analysis, and each analysis performs the above two steps.
The experimental results demonstrate that SafeMD is able to fix the memory leaks in C programs. We evaluated SafeMD with two different benchmark sets: Juliet Test Suite (JTS) for C [
12] and 26 open-source C repositories [
10]. We compared SafeMD with MemFix. When the input C programs satisfy exclusive ownership, SafeMD can fix more memory-leak patterns than MemFix.
We summarize the contributions of this paper as follows.
We present SafeMD, an ownership-based safe memory deallocation technique for C programs that satisfy exclusive ownership. Compared with the existing techniques, SafeMD obviates alias and inter-procedural analysis, and the patches generated satisfy exclusive ownership.
We implement SafeMD and compare it with MemFix.
We explore the benefit of Rust’s novel ownership-based memory management in C.
3. Ownership System in Rust
Rust guarantees memory safety at compile time by introducing an ownership system and consequently avoids many memory errors, such as dangling pointers and memory leaks. Ownership in Rust denotes a set of rules that govern how the Rust compiler manages memory. The idea of OwS is exclusive ownership, which means that each resource has a unique variable as its owner at any time. Ownership can be transferred among owners. When the owner of a resource goes out of its scope, the resource can be automatically dropped without using the garbage collector. Below, we introduce ownership transferring.
Ownership and Assignments. In Rust, ownership can be transferred in assignments. Consider the code in Listing 1, where line 2 creates a String object o on the heap, and let be the owner of o. At line 4, the assignment transfers the ownership of o from to . To maintain the unique owner, the assignment performs move semantics, which makes become the old owner and no longer valid until it is re-assigned a value again. Therefore, the Rust compiler will issue an error at line 5. This is different with pointer assignments in C, where both and are valid and can be used. Because is the unique owner of o, when the owner goes out of its scope, the Rust compiler automatically inserts a drop destructor to free o at line 6, which can avoid memory leaks. Therefore, the code at line 7 is rejected as o has already been destroyed. Now, we take a closer look at line 8. When goes out of its scope and tries to free o, the Rust compiler does not insert drop since it finds that the ownership of has been moved. This means that an object cannot be freed by any of its old owners (like ), which can ensure that memory deallocation does not introduce DF.
Listing 1. Transferring ownership in assignments. |
|
Ownership and Functions. Ownership can also be transferred in function calls. When ownership of an object is moved to a callee via parameters, this object is no longer available in the caller. For example, in Listing 2, the function call at line 3 moves the ownership of the object o created at line 2 to takes_ownership, so the Rust compiler will issue an error at line 4, where s becomes the old owner and cannot be accessed in the main function. The Rust compiler performs intra-procedural analysis to insert a drop destructor, which compiles each function individually. It relies on ownership to determine whether the current function has a responsibility to free objects. For example, the Rust compiler first compiles the main function. It finds that s is moved to takes_ownership; therefore, the main function has no responsibility to free the object o. The Rust compiler does not insert drop at line 5. Next, the Rust compiler compiles the takes_ownership function. Because is a String type that can move ownership, the Rust compiler will automatically insert drop once goes out of its scope at line 9.
Listing 2. Transferring ownership via parameters. |
|
Besides parameter passing, return values can also transfer ownership. For example, in Listing 3, gives_back moves ownership out from gives_back via return value to its caller main, which means that gives_back has no responsibility to free the object. Exclusive ownership can ensure that automatic memory deallocation is safe. When the Rust compiler compiles the main function, the object o is dropped automatically once goes out of scope at line 6 but nothing happens for s because s is moved. This avoids DF when s and go out of their scope (line 6) and both try to free the object o. When compiling the gives_back function, it fails to insert drop to free the object pointed to by since is moved out from gives_back. This can avoid UAF if is freed while is used in the main function.
In conclusion, the main ideas of OwS are summarized as follows.
R1: Each resource has a unique owner at any time.
R2: When the owner of a resource goes out of its scope, it deallocates the resource that it owned. Any old owners of the resource cannot deallocate the resource (this can avoid DF and UAF).
In our previous work, SafeOSL ensures that a C program satisfies R1. For such a special C program, SafeMD proposed by this paper borrows the idea of R2 to fix its memory leaks.
Listing 3. Transferring ownership via return values. |
|
4. Approach Overview
We illustrate the algorithm of SafeMD using a simple example in Listing 4. This code satisfies the exclusive ownership. For example, t = foo(p) at line 24 moves ownership of into foo1 and, after here, p is no longer used. Before analysis, we remove all free statements from programs. SafeMD will generate the patches that can safely deallocate all allocated objects without introducing UAF and DF, as shown in Listing 5. In addition, the C programs fixed by SafeMD still satisfy the exclusive ownership.
SafeMD includes two steps: (1) collect patch candidates for each object by tracking ownership. The main idea of collecting patch candidates is to track ownership of the object and free the object where the owner last used it. (2) Find a correct patch from patch candidates by solving an exact cover problem over the allocated objects. SafeMD analyzes each function individually and each analysis contains the above two steps.
Figure 2 and
Figure 3 show the analysis of the main and foo1 function, respectively. We only explain the analysis for the
main function below.
Listing 4. code with memory-leaks. |
|
Listing 5. SafeMD-generated patches. |
|
Figure 2.
SafeMD for main function.
Figure 2.
SafeMD for main function.
Figure 3.
SafeMD for foo1 function.
Figure 3.
SafeMD for foo1 function.
Step 1: Collecting Patch Candidates by Ownership Tracking. This analysis step is based on a control flow graph (CGF). The CFG of the main function and analysis results at each node is presented in
Figure 2. This analysis maintains owner and patch information for each allocated object as a state of the following form:
where
o is a heap object represented by its allocation site,
is a pointer who is the unique owner of
o,
is a set of pointers that are the old owners of
o,
is a set of patches that can safely deallocate the object and
is a set of unsafe patches that may introduce UAF and DF. Both
and
are denoted by a pair
, which means that an object can be deallocated by inserting a deallocation statement
free(
e) right after line
n, where
n is a program point and
e is a pointer expression.
For the main function, the analysis of SafeMD starts with the function signature. At line 20, because the main function has no parameters, the initial state is marked as empty. The allocation statement at line 22 creates a new tuple : the allocation site is , its owner is p and the safe patch is , which indicates that we can safely free via owner p after line 22. Now, and are empty.
We first consider the false branch at line 29. The analysis updates the states as follows:
A new tuple for the new object allocated at line 29 is created. For the state of , a new safe patch is added into .
At line 31, the function call updates the states as follows:
Because the object is used as an argument z in foo2(z), to avoid UAF, in state , we remove the safe patch from the and add it into . The call foo2(z) moves ownership of into foo2 via parameter passing; for this, we carry out three changes: (1) we mark with ⊥ to indicate that the ownership of is moved. Thus, z becomes the old owner. (2) We reset to Ø to denote that the main function has no responsibility to free since it has lost ownership of . (3) Because the old owner cannot free , a new unsafe patch , where z is now an old owner of , is generated.
Next, we consider the true branch at line 24, where the function call updates the state
as follows:
The ownership of object is moved into foo1, so the update for the state of is the same as the state of in (2). Note that foo1 returns a pointer that points to a valid object, so we create a new state: the allocation site is , its owner is the receiver t and the safe patch is .
At line 25, the states are updated as follows:
In , because is used by at line 25, we remove the safe patch from the state and declare it as unsafe to avoid UAF. Now, the only safe patch for is . In , a new unsafe patch is generated. This is because p is an old owner of and thus cannot free after line 25.
At the join point line 33, our analysis maintains each state separately for each different branch. With the states in (2) and (4) as input, the analysis produces the following states as output:
The return statement returns a value of 0 instead of moving any object’s ownership, so the states are the same as . The analysis finishes with the states in (5).
Step 2: Finding Correct Patches by Solving Exact Cover Problem. After we collect patch candidates in step 1, step 2 is to find correct patches that safely deallocate all objects (i.e., no memory leaks) while not introducing UAF and DF. Finding a correct patch can be reduced to solve an exact cover problem.
First, due to ownership, we collect the valid states whose is not ⊥. The valid states denote that the current procedure has responsibility to free the objects. From the owner information of states in (5), we obtain the valid states
Then, from the patch information of states in (5), we collect the safe patches from
and unsafe patches from
from all states:
Thus, candidate patches are those in
but not in
:
The patches in
cannot incur UAF because the patches that may cause UAF are collected in
. However, using all patches in
may cause DF. For example, using both
and
will incur double-free for
in the false branch. So, we have to find a subset of the candidate patches that does not introduce DF while deallocating all memory objects. This can be solved by an exact cover problem over valid states, which is represented by the following incidence matrix:
| | |
| 1 | 0 |
| 1 | 0 |
| 0 | 1 |
Each row r in matrix represents a patch in and each column represents a valid state in . The entry in row r and column is 1 if patch r is included in of state and 0 otherwise. For example, contains in , so the entry in row and column is 1. Solving an exact cover problem represented by the above incidence matrix is achieved by the selection of rows such that each column contains only a single 1 among selected rows. In this example, the correct patches are computed as . Both the patches and cover all states (i.e., no memory leaks) and each state is covered by at most one patch (i.e., no DF). In addition, these two patches satisfy exclusive ownership constraints. Considering that the objects are deallocated as early as possible, we choose to free the objects in the main function, as shown in Listing 5.
SafeMD will generate the correct patch for foo1. The four patches generated by SafeMD for the code in Listing 4 can safely fix the memory leaks across functions. MemFix fails to safely fix this code: it will generate free(t) and free(p) at line 26 in the main function, which may introduce DF. Other fixing techniques can safely fix this code, but the patches generated by them may violate exclusive ownership.
5. Approach Details
This section presents the algorithm of SafeMD, which modifies the algorithm of MemFix to serve only C programs that satisfy exclusive ownership constraints.
Section 5.1 defines a core language. The two steps of SafeMD are presented in
Section 5.2 and
Section 5.3, respectively.
5.1. Language
For simplicity, we formalize SafeMD on top of a simple pointer language. Let
P be an input program for SafeMD. A program
P is represented by a CFG (
,↪,
,
), where
denotes the set of program points, (↪) ⊆
×
is the set of flow edges,
↪
indicates that there is a possible flow of execution from
to
and
and
are the unique entry and exit nodes of
P. A program point
is associated with a command, denoted
, as defined by the following grammar:
A pointer expression p can be a variable x, a pointer dereference or a NULL pointer. Allocation command creates a new memory object pointed to by p. Assignment command assigns the expression e to p. The command describes a function signature, where , , and denote the function name, parameter list, parameter type list and return type, respectively. The command describes the call on function , where , and represent the argument list, argument type list and a variable who receives the return value, respectively. “none” means empty; for example, if , and equal none, it represents a function call with no arguments and return values. The deallocation statements are ignored because we remove them from the program P before analysis. In addition, the program P must also satisfy exclusive ownership.
5.2. Step 1: Collecting Patch Candidates by Ownership Tracking
The first step of SafeMD is to statically analyze the procedure to collect patch candidates. The idea of this step is to track ownership of the object and free the object where the owner last used it.
5.2.1. Abstract Domain
The abstract domain of the analysis is defined as follows:
A | ∈ = |
s | ∈ = |
o | ∈ ⊆ |
| ∈ = |
| ∈ = |
| ∈ = |
| ∈ = |
p | ∈ = |
is the finite set of program variables in
P.
is the finite set of allocation sites in
P, i.e., the nodes whose associated commands are alloc(
p). A domain element
is a finite map that maps each program point to a set of reachable states. A state
describes an abstract object with owner and patch information, where
is the allocation site of the object,
is an access path that points to the object via a unique owner,
is a set of access paths that points to the object via old owners,
is a set of patches that can safely deallocate the object and
is a set of unsafe patches.
denotes the set of access paths that can be generated for the given program
P. For the language in
Section 5.1,
equals the set of pointer expression
p, i.e.,
can be a variable x, a pointer dereference
, a NULL pointer or an invalid access path ⊥. Each element in
and
is a pair
that consists of a program point
c and an access path
p. A patch
represents a
free(
p) statement that can be inserted right after the program point
c.
5.2.2. Abstract Semantics
SafeMD collects patch candidates for each function individually. For each function, the analysis starts from the program point
c where the function is defined, i.e., the node whose command is
, and computes an initial set
that consists of initial states.
is defined as follows:
If the function
has no parameter,
is empty; otherwise, it is computed by
which generates an initial state for a parameter
of
. The initial states for all parameters form
. For the computation of
, if a parameter
points to a heap object (i.e., the function
equals true), a new state is created since the ownership of objects can be transferred by parameter passing: the allocation site of the object is the program point
c, the owner of the object is
and the safe patch is
. For example, in
Figure 3,
for
foo1 definition at line 1 only contains a state, i.e.,
.
Start with
: SafeMD updates the states at each node based on the command associated with that node until reaching the exit node. In other words, the analysis computes a least fixed point lf
p of the semantics function
:
where
and
is the transfer function at a program point
c:
where
.
updates the states in
S according to different commands. For alloc(
p),
not only updates the existing states in
S to
but also creates a new state
. Similarly, for the command call
, where
p is a pointer, a new state for the object pointed by
p is created.
The set is updated by two transfer functions and : For a state s, we first update owner information by and then patch information by . Next, we define and for different commands.
Given a state
s at the program point
c,
- (1)
When
= alloc(
p),
makes
and
unchanged:
Then,
updates
and
as follows:
where
,
.
contains a safe patch that is newly generated at
c via a unique owner.
is the set of unsafe patches that are newly generated at
c via old owners, because an object cannot be deallocated via old owners. Also, we exclude
from
to ensure that
and
are disjoint.
- (2)
When
= assign(
p,
e),
updates
and
as follows:
An assignment p = e can transfer the ownership of object o from e to p, making p and e become the new owner and old owner, respectively. We also exclude from to ensure that and are disjoint.
Then,
updates
and
as follows:
The condition “
o is used at
c” contains two cases. In the first case
p = e, where
o is used by expression
e and transfers the ownership of
o from
e to
p, the only safe patch is
generated by
. For
, it adds a set of new unsafe patches related to old owners that contains
and
. The pointer assignment
q = m at line 7 in
Figure 3 is such an example.
Consider the second case
p = e, where
o is used by expression
e but does not transfer the ownership of
o: to prevent UAF, the safe patches are removed from
and added to
, and the only safe patch is generated by
. Also,
is included in
. For example, the assignment
... = *q in
Figure 3 uses the object pointed to by
q via dereference expression
, but it does not transfer the ownership of the object.
For the otherwise case, where o is not used at c, we only merge new safe patches generated by with and unsafe patches generated by with .
- (3)
When
= return(
p),
updates
and
as follows:
The ownership of objects can be transferred by return values. If return value p equals the owner of object o in state s (i.e., ), then the ownership of o is moved out from the callee and p becomes the old owner. We use the symbol ⊥ to indicate that the ownership of o is moved.
Then,
updates
and
as follows:
If
of object
o is updated to ⊥ by
, the safe patches in
become unsafe because
o is returned to the caller. We reset
to Ø to indicate that the callee cannot free
o since it has lost ownership of
o. The responsibility for deallocating
o falls on the caller. Consider the
foo1 in
Figure 3: SafeMD will merge the states computed from the if-else branch as follows:
For the objects in state
and
, because their ownership is moved out from
foo1 by return statement “
return q”, their
equals ⊥ and
q becomes the old owner. Also, their
is reset to Ø. For state
, it remains unchanged. The update result is shown below.
- (4)
When
= call(
,
,
,
updates
and
as follows:
The ownership of objects can be transferred to the callee by parameter passing. If of o is passed to as an argument, i.e., , then becomes the old owner. We mark with ⊥ to indicate that the ownership of o is moved.
Then,
updates
and
as follows:
We discuss three cases. The first case is , which indicates that the ownership of object is moved. For the , it adds a set of new unsafe patches related to old owners that contains and . We reset the to Ø to denote that the caller is not responsible for deallocating the object since the ownership of the object is moved to the callee.
The second case is “”, which indicates that the ownership of object o is not transferred but o is used (e.g., by pointer dereference) in arguments. Because o is used, the safe patches in are added to to avoid UAF, and the only safe patch is generated by . For example, function call foo(*p), where p is a pointer that points to a valid object, belongs to this case.
For the case, where the ownership of object o is not transferred and o is also not used, we merge new safe patches generated by with and unsafe patches generated by with .
5.3. Step 2: Finding Correct Patches by Solving Exact Cover Problem
Once the owner and patch information for each object are collected in step 1, the second step of SafeMD is to find the correct patches that can safely deallocate all allocated objects (no ML) while not introducing UAF and DF. MemFix found correct patches by solving an exact cover problem. We use this method but modify it because ownership is considered.
Let be the set of reachable states computed by step 1 at the exit node of the program. We first give the definition of candidate correct patches and valid states from R.
Definition 1 (Candidate Correct Patches)
. The set of candidate correct patches collects the possible safe patches for each object, which are defined as follows: collects the patches that can safely deallocate an object, and collects the unsafe patches that can cause UAF and DF (caused by pointer aliasing). We exclude from to obtain the set , which is used to find correct patches.
The patches in cannot cause UAF since these unsafe patches are all collected in and thus already excluded from . In addition, the patches in also satisfy exclusive ownership constraints; that is, old owners cannot be used to deallocate objects. After all, the C programs satisfying exclusive ownership are more safe than normal C programs. However, the patches in may cause DF. Thanks to ownership, plenty of unsafe patches caused by pointer aliasing are excluded from , since SafeMD collects the unsafe patches related to old owners (aliases) in . This can noticeably reduce the search space for finding the correct patches as we avoid verifying patch combinations containing pointer aliasing. However, cannot exclude DF caused by freeing the memory multiple times using the same pointer. For example, if an object can be safely deallocated at line 4 and line 5, then and may be in , and using both of them will incur DF. Therefore, this step aims to find the correct patches that do not cause such a kind of DF.
Definition 2 (Valid States)
. The valid states indicate that the ownership of objects has not been moved and thus the objects should be deallocated in the current function, which is defined as follows: Next, we present the definition of the problem for finding the correct patches, which can be reduced into solving an exact cover problem over valid states.
Definition 3 (The Problem of Finding Correct Patches)
. Let be the function from candidate correct patches to the valid states that can be safely deallocated by the corresponding patches:From M, find a subset such that
, which means that covers all valid states;
for all ,, which means that the chosen subsets in (where ) are pairwise disjoint.
M describes the incidence matrix in
Figure 2 and
Figure 3. The first condition means that all allocated objects must be deallocated, which guarantees the absence of memory leaks. The second condition means that every allocated object is deallocated no more than once, which guarantees the absence of DF. Recall that UAF is avoided in
.
6. Evaluation
We evaluated SafeMD with two different benchmark sets and compared it with MemFix, a static-based approach for fixing memory leaks in C/C++ programs. Our experiments were performed on a PC with Intel Core i7-7700 CPU (3.60 GHZ) and 8 GB RAM running 64-bit Ubuntu 18.04.3 LTS.
6.1. Implementation
We implemented SafeMD as a stand-alone tool [
35]. The framework of SafeMD is shown in
Figure 4. We first make use of the open-source code analysis platform for C/C++ based on code property graphs, Joern [
36], to extract CFGs for all functions in our benchmarks. Then, all CFGs (dot files) are loaded into NetworkX for graph traversal to collect patch candidates (
Section 5.2) and calculate correct patches (
Section 5.3). Our implementation supports the C standard memory allocators
malloc and
calloc except for
realloc, since
realloc may be fixed safely by adding conditional statements, which is beyond the scope of the current algorithm of SafeMD.
In step 1, recall that, in the generation of initial states
mentioned in
Section 5.2.2, a new state is created if the parameter points to a heap object. Although SafeMD analyzes each function individually, to improve the accuracy of
, SafeMD starts from the
main function and proceeds according to the call graph, and sets a flag to guide the generation of
for each callee function. In step 2, the exact cover problem is NP-complete. Our implementation uses existing DFS-based search algorithm to solve the exact cover problem. This algorithm takes optimization strategies to improve the search speed for the exact cover problem. Also, our algorithm of finding correct patches will not find all solutions; instead, it will return the first solution that it finds.
6.2. Benchmark
We use two benchmarks to evaluate SafeMD. In
Table 1, the first benchmark is relevant to memory leak (CWE-401) in Juliet Test Suite (JTS) for C, and “int_malloc” and “twoIntsStruct_malloc” mean a memory object pointed to by an integer pointer and struct pointer leaks, respectively. The second benchmark has 26 model programs with memory leaks. These programs are selected from 50 test programs that are provided by [
10], who constructed them from five GitHub open-source C repositories.
To evaluate SafeMD, the programs in these benchmarks are first modified to satisfy exclusive ownership constraints. For the C programs that are difficult to modify, they have been removed from the benchmarks, leaving a total of 102 programs (76 in CWE-401 and 26 in open-source C repositories). Note that the modifications in benchmarks do not guarantee semantics preservation since, in this experiment, we aim to provide the test programs that satisfy exclusive ownership constraints, and how to rewrite C programs to satisfy exclusive ownership is beyond the scope of this paper.
In our modifications for C programs in these benchmarks, we only focus on pointers returned by dynamic memory allocation functions and modify these pointers to make their use satisfy ownership constraints. For a better understanding of the experimental results, we briefly list the main code features of C programs that satisfy exclusive ownership below.
Assignments. For the pointer assignments, such as q = p;, if variable p points to a heap object, then p cannot be used after this assignment until it is re-assigned. Particularly, for the assignment of compound data type (e.g., struct), such as q = p.f;, the struct itself p cannot be used after assignment, but other members of struct p are still available.
Function calls. For the calls that call user-defined functions, such as foo(p), if variable p points to a heap object, then, in the caller, p cannot be used after the call site. For the library functions (except for standard allocation and deallocation functions), such as memcpy(p,...), because we cannot modify the code of library functions, we assume that the ownership of p does not move into library functions, and thus p is still available in the caller.
6.3. Results
Table 1 shows the results on JTS and open-source C repositories. #Loc represents the average number of lines of code (after modification). SafeMD analyzes each function individually, so we count the number of user-defined functions in all test programs and use #Function to list the average number of functions that the programs have. SafeMD/MemFix/#Pgm represents the number of test cases that can be fixed by SafeMD and MemFix. #Time reports the maximum execution time performed by SafeMD or MemFix. For test cases in CWE-401, because these programs have relatively simple structures and data types, they can easily be modified to satisfy exclusive ownership while preserving semantics. Both SafeMD and MemFix can fix a total of 72 programs (out of 76 programs) with an accuracy of 95%. Four test programs including function pointers are not supported by SafeMD and MemFix. The maximum execution time for fixing these test cases is smaller than 1.0 s.
For open-source C repositories, SafeMD and MemFix can fix a total of 18 programs (out of 26 programs) with an accuracy of 69% and 13 programs (out of 26 programs) with an accuracy of 50%, respectively. We manually look at these programs to investigate the shortcomings of SafeMD. We consider the following four scenarios (S1–S4):
S1. Memory leak fixed by both SafeMD and Memfix. For these test cases (e.g., Binutils), we find that the allocated and leaked object is largely limited in scope within a procedure and has limited leaked paths. The code snippet from binutils in Listing 6 shows the leak pattern that can be fixed by both SafeMD and Memfix.
Listing 6. Memory leak pattern fixed by both SafeMD and Memfix. |
|
In the above code snippet that satisfies exclusive ownership, the allocated and leaked object is limited in the current procedure. An error path refers to a program path where an abrupt return happens under abnormal situations. The object allocated at line 1 is leaked on both the error path (when goto FAIL is executed on line 3) and the non-error path, and both SafeMD and Memfix can safely fix such memory leaks by generating patches for the error path and non-error path. However, the patches generated by Memfix may violate exclusive ownership.
S2. Memory leak fixed by SafeMD only. For these test cases, we find that the leaked object is transferred to another procedure. The code snippet in Listing 7 shows the leak pattern that can be fixed only by SafeMD.
Listing 7. Memory leak pattern fixed by SafeMD only. |
|
In the above code snippet that satisfies exclusive ownership, MemFix will perform interprocedural analysis and insert free(q) and free(p) at line 18 and 19 to free memory objects, but it may introduce DF for object (at line 19) when the else branch is true in the f function. In this case, to safely fix memory leaks, the correct patch if(flag == 1) free(q); is required at line 18. But, MemFix fails to fix because it is unable to generate patches with conditional statements. However, SafeMD can safely fix these memory leaks without combining conditional expressions, since it tracks the owner of the leaked object and frees the leaked object from its owner. Specifically, SafeMD analyzes function f and g individually, and generates free(m) and free(q) at line 5 and 18, respectively. The function call at line 17 transfers the ownership of from p to m; therefore, SafeMD concludes that f is responsible for freeing the object if the ownership of is not returned by f; that is, SafeMD will generate free(m) in the true branch of f since this execution path does not return the ownership of to g. Because this code satisfies exclusive ownership, where p is no longer used after line 17, the patch free(m) is safe. On the contrary, in f, the execution path where the false branch is taken returns the ownership of to g; therefore, SafeMD concludes that g is responsible for freeing the object and thus generates free(q) at line 18. Compared with SafeMD, MemFix tries to free in the g function, but it fails since it cannot generate conditional patches.
S3. Memory leak fixed by Memfix only. These are cases that cannot be fixed by SafeMD primarily due to the ownership limitations of SafeMD, which means that the strictness of ownership enforced in C makes it unsafe for SafeMD to free objects in some situations.
One situation that can be fixed by Memfix but not by SafeMD is when ownership constraints are enforced on operations related to arrays and linked lists. Consider the Listing 8’s code pattern found in Git. Line 3 creates an object . This code satisfies the ownership constraints since the old owner is no longer used after assignment ptr2 = msg2. SafeMD will generate free(ptr2) at line 10, leading to an invalid free because does not point to the start address of . However, Memfix does not enforce that the patches satisfy the ownership constraints, so it will generate free(msg2) at line 10, which is a safe patch. One way of addressing this problem for SafeMD is to modify this code (which is beyond the scope of this paper); for example, we can modify the assignment ptr2 = msg2 to ptr2 = &msg2, where the ownership is not transferred to ; thus, SafeMD can generate the safe patch free(msg2) at line 10.
Listing 8. Memory leak pattern fixed by Memfix only. |
|
S4. Memory leak not fixed by both SafeMD and Memfix. For these test cases, they are related to reallocated memory and function pointers, and SafeMD and Memfix cannot deal with these features. For example, the test cases in Git often store memory objects using reallocation, which makes the portion of successful fixing relatively low.
To sum up, our comparison demonstrates that SafeMD can fix more memory-leak patterns than Memfix (see S2) when the input C programs satisfy exclusive ownership. Also, the patches generated by SafeMD make the input C programs still satisfy exclusive ownership (see S1). However, because of the exclusive ownership satisfied by C programs, SafeMD can lead to invalid frees on the array (see S3).