To appear in proceedings of IFL 2011, Springer LNCS, Volume 7267
Property-Based Testing and Verification:
a Catalog of Classroom Examples
Rex Page⋆
University of Oklahoma, Norman OK 73019
Abstract. For the past a decade, John Hughes of Chalmers University
and Quviq has pioneered property-based testing in debugging industrial
software. Three undergraduate courses at the University of Oklahoma
have paralleled this work, but on the pedagogical side and with the additional goal of verification of properties by mathematical proof, using
both paper-and-pencil methods and mechanized logic. An essential part
of both efforts, and probably the most important part, is the formalization in predicate logic of the expectations of software developers. This is
central to engineering software quality in the defect dimension. This report discusses property-based testing in course work and provides some
examples from the classroom. Hughes has observed that software properties useful for testing functional software often (1) compare the results
of different ways of computing the same thing or (2) check that forward
and inverse transformations restore the original data. What percentage
of useful software properties fall into these categories? Our collection of
examples from course records may help shed some light on how closely
pedagogy on property-based testing matches observations of relative frequencies of various categories of properties in industry and may also serve
as an educational aid in learning to use property-based testing effectively
in software development.
1
Property-Based Testing in the Classroom
Students in the computer science program at the University of Oklahoma are
required to take three courses in which expected software behaviors are expressed
as formulas in predicate logic. In one of the courses (applied logic), the focus
is on paper-and-pencil proofs that the properties hold. In the others (a twocourse sequence in software engineering), the focus is on software development,
including testing and verification. For the past three years, all three courses have
included project experience with automated testing on random data generated
by the DoubleCheck tool in the Dracula programming environment [1], a facility
similar to QuickCheck [2], but for software expressed in ACL2 [3], which includes
a programming language based on Common Lisp.
Used in its native form, the ACL2 mechanized logic requires properties to
be expressed as theorems, and programmers can sometimes succeed in getting
⋆
Partially supported by National Science Foundation Award Number 1016532.
2
Rex Page
ACL2 to verify those theorems by mathematical proof. In some cases ACL2 fully
automates the proof process. For more complex properties, ACL2 needs help in
the form of lemmas or hints about proof strategy.
The Dracula environment acts as an interface to the ACL2 programming language and logic, adding to it a testing environment with generators for random
data. On request, Dracula converts DoubleCheck properties to ACL2 theorems
and submits them to the mechanized logic for verification. Students can carry out
testing based on formal predicates about behaviors they expect in software they
are developing. When testing convinces them that their software works properly,
they have the option to proceed to full, mechanized verification of the theorems
expressed in those predicates. Property-based testing with random data becomes
part of the software development process, and full verification also plays a role.
For example consider the following DoubleCheck test for the associativity of
list concatenation.
(defproperty append-assoc :repeat 1000
(xs :value (random-list-of (random-symbol))
ys :value (random-list-of (random-symbol))
zs :value (random-list-of (random-symbol)))
(equal (append xs (append ys zs))
(append (append xs ys) zs)))
Dracula checks the property specified by the equation at the end of the definition against a thousand random choices for the lists involved in the equation.
(The :repeat specification controls the number of tests, which defaults to fifty
if unspecified.) Dracula selects both the lengths of the lists and the symbols
comprising their elements from independent, random distributions. Programmers can control those distributions through special directives or by writing
their own generators. The random data generators provided by DoubleCheck
are adequate for most of the testing students do in their course work, but some
software engineering course projects call for the design of generators.
When asked to prove that the associative property of concatenation holds for
all choices of the lists, Dracula converts the property definition to the following
theorem and submits it to the ACL2 mechanized logic.
(defthm append-assoc-thm
(equal (append xs (append ys zs))
(append (append xs ys) zs)))
The theorem is a copy of the predicate specified in the DoubleCheck property, without consideration of the kinds of data it specifies. However, the specified property can restrict the test in certain ways, and when it does this, the
restrictions are incorporated in the theorem as hypotheses in an implication. For
example, nthcdr is an intrinsic ACL2 operator that delivers a list like its second
argument, but without an initial segment of elements, the length of which is
specified by the first argument. Predicates about nthcdr, to be proven as ACL2
theorems, must restrict the first argument to the natural numbers. A property
Property-Based Testing: A Catalog of Classroom Examples
3
definition can specify such a restriction through a :where clause, as in the following definition.
(defproperty nthcdr-len
(n :value (random-integer)
:where (natp n)
xs :value (random-list-of (random-symbol)))
(<= (len (nthcdr n xs)) n))
When random test data is restricted by a :where clause, Dracula includes the
restriction as a hypothesis in an implication that has the specified property as
its conclusion. When there are multiple :where clauses, their logical conjunction
forms the hypothesis. If asked to call for a proof that the length property of
nthcdr holds for all data, Dracula would submit the following theorem to the
ACL2 mechanized logic.
(defthm nthcdr-len-thm
(implies (natp n)
(<= (len (nthcdr n xs)) n)))
Through :where restrictions and other annotations, DoubleCheck provides
full access to the mechanized logic of ACL2. Dracula also allows direct definition of ACL2 theorems, which are submitted to the mechanized logic without
modification, but students generally find it convenient to stay within the DoubleCheck property-definition syntax because they usually go through the testing
stage before attempting full verification of properties as theorems.
We find that as students proceed through courses using Dracula and DoubleCheck, they gradually evolve from the point of view that testing software
components is a matter of running a few tests on selected, specific cases to the
conclusion that, while running thousands of random tests based on formal predicates may not provide full guarantees, it goes much further in that direction
than the one-off tests they previously relied on. They also gain an appreciation
of the difference between testing a program and knowing for certain some of the
properties of a program.
We are now in the process of cataloging property-based testing examples from
lectures, homework projects, and examinations, expressing them in the common
framework provided by Dracula, and, in many cases, using the ACL2 mechanized
logic to carry out proofs of the properties. This report describes some of the
examples and categories, specifies a few in detail, presents some preliminary
data about categories of property-based tests appearing in the examples, and
comments on the distribution of those examples across the observed categories.
2
Property Categories and Examples
John Hughes has for years used property-based testing in industrial environments
[4] [5] [6] [7] [8]. He has found that, while industrial software testing usually calls
for checking output against post-conditions on models of state, tests of functional
4
Rex Page
programs most often fall into one of two categories: (1) comparing the results of
computing a value in two different ways (function equality tests)1 (2) checking
that an inverse function reproduces the original input when applied to the results
of the forward computation (round-trip tests) [9].
We have categorized so far about half of over 300 properties defined in logic
and software engineering courses over the past decade. About a quarter of these
properties are either not characterized (miscellaneous) or can be viewed as type
specifications, a category that would not arise for properties expressed in statically typed languages. About one in eight fall into a special kind of function
equality that we refer to as property preservation, such as preservation of length
by functions that rearrange lists into increasing order.
2.1
Function Equality Properties
The property expressing the associativity of append (Section 1) is an example
of a function equality test. Figure 1 samples typical function equality tests from
lectures in my software engineering courses over the past decade.
The first three tests in Figure 1 involve only intrinsic functions in ACL2:
(defproperty cdr-reduces-len-by-1
(xs :value (random-list-of (random-symbol))
:where (consp xs))
(= (len(cdr xs))
(- (len xs) 1)))
(defproperty cons-as-append
(x :value (random-symbol)
xs :value (random-list-of (random-symbol)))
(equal (append (list x) xs)
(cons x xs)))
(defproperty law-of-added-exponents
(m :value (random-integer)
:where (integerp m)
n :value (random-integer)
:where (integerp n)
x :value (random-integer)
:where (and (acl2-numberp x) (/= x 0)))
(= (expt x (+ m n)) (* (expt x m) (expt x n))))
Other tests express relationships among functions defined by the programmer.
1
Hughes uses the term “commutative diagram” to describe this type of test because in
his experience such a test usually involves an abstract data type with corresponding
transformations between concrete and abstract versions of the same data. One route
on the diagram transforms concrete data and then moves the result to the abstract
domain. The other route moves the data to the abstract domain, then transforms it.
Our term, “function equality”, connotes a somewhat broader category of tests.
Property-Based Testing: A Catalog of Classroom Examples
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
5
removing the first element of a non-empty list reduces its length by 1
inserting an element at the beginning of a list is a special case of concatenation
law of added exponents
conventional exponentiation and the Russian peasant algorithm
the nth element of the product of a scalar and a vector is the same as the product
of the scalar and the nth element of the vector
evaluating a polynomial conventionally and by Horner’s rule
the first element of a list is the last element of its reverse
various tail recursions match a corresponding nested recursion (list reversal or
Fibonacci numbers, for example)
the number of elements in a flattened tree is the number of nodes in the tree
the sum of the numbers in a list constructed from n copies of a number x is the
product of n and x
DeMorgan’s laws for lists of Booleans
various Boolean identities, such as the absorption laws
associative law of concatenation
demultiplexing a list splits it into its even-indexed and odd-indexed elements
multiplexing two lists puts the elements of one of them in even-indexed positions
in the result list and the elements of the other in the odd-indexed positions
the low-order bit in the binary representation of an even number is zero
inserting n zeros at the low-order end of a binary numeral produces the numeral
of a number 2n times larger
the length of a binary numeral with no leading zeros is the ceiling of the base-2
logarithm of the next integer after the one the numeral denotes
a ripple-carry binary adder delivers the sum of the input numerals
a shift-and-add binary multiplier delivers the product of the input numerals
Fig. 1. Function Equality Properties (43%): (f x) = (g x)
6
Rex Page
(defun s*v (s v) ; (s*v_1 s*v_2 s*v_3 ... )
(if (consp v)
(cons (* s (first v))
(s*v s (rest v)))
v))
(defproperty s*v-elements
(s :value (random-integer)
v :value (random-list-of (random-integer))
:where (consp v)
n :value (random-between 0 (1- (len v)))
:where (and (natp n) (< n (len v))))
(= (nth n (s*v s v)) (* s (nth n v))))
The ACL2 mechanized logic fully automates all but a few the proofs of all
of the function equality properties in Figure 1. However, some of the proofs (the
law of exponents for example) depend on theorems of numeric algebra imported
from a standard ACL2 library.
2.2
Round-Trip Properties
Many property-based tests involve first encoding some information, then decoding it with an inverse operator. A classic example of such a property applies
the list-reversal operator twice. Since reverse is its own inverse, two applications
in succession reproduce the original list. Figure 2 samples typical round-trip
properties from course work.
The following linear encryption and decryption functions set the stage for
an examination of the second example from the list of round-trip properties in
Figure 2. This particular encryption method simply adds adjacent numbers in
a list of integers in the range 0 through m − 1 using arithmetic modulo m. It
inserts a fixed number at the end of the list to encode the last element.
If the numbers were, for example, ASCII character codes, the encoding would
handle text messages. More complex linear encodings that transform adjacent
pairs with a two-by-two matrix of coefficients could be handled similarly.
(defun encrypt-pair (m x x-nxt)
(mod (+ x x-nxt) m))
(defun decrypt-pair (m x-encrypted y-decrypted)
(mod (- x-encrypted y-decrypted) m))
(defun encrypt (m xs) ; linear encryption
(if (consp (cdr xs))
(cons (encrypt-pair m (car xs) (cadr xs))
(encrypt m (cdr xs)))
(list (encrypt-pair m (car xs) (1- m)))))
(defun decrypt (m ys)
(if (consp (cdr ys))
Property-Based Testing: A Catalog of Classroom Examples
7
1. double reverse
2. use linear encryption to encrypt a message, then decrypt it
3. compute the quotient and remainder, given a divisor and dividend, then reconstruct
the dividend from the quotient, remainder, and divisor
4. convert a number to a binary numeral, then back to a number
5. convert a binary numeral with no leading zeros to a number, then back to a binary
numeral
6. convert a number to a binary numeral, insert some leading zeros, then convert back
to a number
7. a list is the concatenation of its first n elements and its remaining elements
8. demultiplex a list of signals into the even-numbered and odd-numbers ones, then
multiplex them to reproduce the original list
9. multiplex two lists that contain the same number of signals, then demultiplex to
reproduce the original lists
10. log2 (2n ) = n
11. pad a list to a specified length, then compute its length
12. chop a list to a specified length, then compute its length
13. concatenate two lists, then reproduce the second list by dropping the number of
elements in the first from the beginning of the concatenation
14. concatenate two lists, then reproduce the first list by extracting the number of
elements it contains from the beginning of the concatenation
15. split a list into contiguous sublists of a specified length, then append all the sublists
together
16. extract the packets in a list that are separated by a given delimiter, then reconstruct
the list from the packets
17. insert a key into a search tree, then extract the key
Fig. 2. Round-Trip Properties (17%): (f (g x)) = x
8
Rex Page
(let* ((decrypted-cdr (decrypt m (cdr ys))))
(cons (decrypt-pair m (car ys) (car decrypted-cdr))
decrypted-cdr))
(list (decrypt-pair m (car ys) (1- m)))))
With these definitions decrypt inverts encrypt. The following property implements a round-trip test of this fact.
(defun
(and
(defun
(and
codep (m x)
(natp x) (< x m)))
code-listp (m xs)
(natp m) (> m 1)
(if (consp xs)
(and (codep m (car xs)) (code-listp m (cdr xs)))
(null xs))))
(defproperty decrypt-inverts-encrypt
(m :value (+ (random-natural) 2)
n :value (random-natural)
xs :value (random-list-of (random-between 0 (- m 1)))
:where (and (natp m) (> m 1)
(consp xs) (code-listp m xs)))
(equal (decrypt m (encrypt m xs)) xs))
This example reinforces the importance of running many tests and the value
of verification by mathematical proof. If a bug is inserted in the test, allowing
the values in the message to range up to m + 1 instead of m − 1, DoubleCheck
runs almost 700 tests before uncovering the error. Serious testing usually requires
thousands of tests, and careful consideration of the distribution of random values
generated by the tests may also be necessary.
The ACL2 mechanized logic succeeds in fully automating the proof of the
inversion property for this linear encryption/decryption implementation when
given access to some algebraic properties of modular arithmetic that are supplied
in a standard library. Again, the :where clause states the hypothesis in the
implication being verified, and the equation at the end of the property definition
specifies the conclusion. An error in the value specification for random test data
does not affect the theorem, since Dracula uses only :where clauses and the
Boolean formula specifying the property to generate the theorem it submits to
the mechanized logic.
The mechanized logic fully automates the proofs of all the round-trip properties in Figure 2 except the last three. It needs a little help to work its way
through the properties of the list-splitting and packets functions and substantial
help to verify search tree properties.
2.3
Other Categories of Properties
Function equality, round-trip, and the properties we think of as preserving some
characteristic of the input are all characterized by equations: (f x) = (g x), (f
Property-Based Testing: A Catalog of Classroom Examples
9
(g x)) = x, and (p (f x)) = (p x), respectively. All three categories could be
viewed as versions of function equality, in a sense. We view them as distinct
categories because we think they provide useful guidelines to students looking
for properties that express testable expectations of the software they develop.
Two examples of preservation properties involve list concatenation. Concatenation preserves total length of the input lists and also preserves the elements
that occur in the input.
(defproperty append-preserves-len
(xs :value (random-list-of (random-symbol))
ys :value (random-list-of (random-symbol)))
(= (len (append xs ys)) (+ (len xs) (len ys))))
(defproperty append-conserves-elements
(x :value (random-symbol)
xs :value (random-list-of (random-symbol))
ys :value (random-list-of (random-symbol)))
(iff (member-equal x (append xs ys))
(or (member-equal x xs) (member-equal x ys)))
Sorting functions, reverse, various forms of list merge, whether order-preserving
or simple, two-way multiplexing (lock-step selection from one list or the other)
are other operations that preserve length and list elements. Figure 3 summarizes
the idea of preservation properties with such examples.
1. various “conservation of list-length” properties (for example, length of the list
delivered by a sorting function is the same as the length of the input list; similar statements about concatenation, reverse, multiplexing, demultiplexing, and
ordered merge)
2. various “conservation of values” properties (for example, a value is in the list
delivered by a sorting function if and only if it is in the input list; similar properties
for concatenation, reverse, multiplexing, demultiplexing, and ordered merge)
3. the cosine of the angle between two vectors is not affected by multiplying all the
elements of one of the vectors by the same scalar
Fig. 3. Preservation Properties (14%): (p (f x)) = (p x)
Since formulas in ACL2 are untyped, it is often useful to define properties to
test expectations about type. Figure 4 displays some properties of this kind.
Finally, some properties that we have found useful in the classroom do not
seem to fall into any of the categories we have described. Figure 5 describes some
of these miscellaneous properties.
10
1.
2.
3.
4.
5.
6.
7.
Rex Page
the length of a list is a natural number
the square of a rational number is non-negative
the sum (or product) of two positive numbers is positive
the inner product of vectors of rational numbers is a rational number
the maximum (or minimum) of a list of rational numbers is a rational number
a permutation generator delivers a list of lists
quicksort delivers an ordered list (same goes for insertion sort and merge-sort)
Fig. 4. Type Properties (14%)
1.
2.
3.
4.
5.
6.
the maximum (or minimum) of a list is an element of the list
no element of a list is greater than (less than) its maximum (minimum)
demultiplexing a list with more than one element produces two shorter lists
a suffix (or prefix) of a list cannot be longer than the list
removing a positive number of elements from a non-empty list makes it shorter
the square of the inner product of two vectors does not exceed the square of the
product of their norms
7. the cosine of the angle between two different vectors is strictly less than 1
Fig. 5. Miscellaneous Properties (12%)
3
Tips for Converting Properties to Theorems
Property-based tests are Boolean formulas with value specifications for random
data generation. Expressing software expectations as Boolean formulas focuses
attention on correctness issues, and that, by itself, has a positive affect on software reliability. A mechanized logic like ACL2 offers an opportunity to go beyond
testing to verification of properties by mathematical proof. Guiding the mechanized logic through this extra step often requires substantial effort, but there
are some things that can be done to facilitate the verification process and help
students succeed with mechanized proofs.
Many of our observations about facilitating verification are specific to the
ACL2 theorem prover. Consequently, they will be of greatest value to people
using the ACL2 system, especially within the Dracula environment. Other theorem provers will require other tricks. However, part of the following discussion
includes a conjecture about the potential usefulness of some of these ideas in
other systems of mechanized logic.
3.1
Dealing with Data Types
The ACL2 logic requires all functions to be total. That is, they must specify
values for all kinds of input. Value specifications in tests usually restrict themselves to data within the expected domain of operations. For example, suppose
add is a function designed to compute the sum of two binary numerals, and nat
is a function that converts binary numerals to numbers. If binary numerals are
Property-Based Testing: A Catalog of Classroom Examples
11
represented by lists of zeros and ones, the following function equality test would
make sense.
(defproperty add-computes-sum-of-binary-numerals
(xs :value (random-list-of (random-between 0 1))
ys :value (random-list-of (random-between 0 1)))
(= (nat (add xs ys))
(+ (nat xs) (nat ys))))
One then might want to prove the corresponding theorem.
(defthm add-computes-sum-thm
(= (nat (add xs ys))
(+ (nat xs) (nat ys))))
This might or might not be a theorem. It depends on details in the representation of binary numerals. If nat and add require their arguments to be lists of
zeros and ones, as they are in the value specifications of the add-computes-sum
property, then the theorem will need a hypothesis constraining the arguments
to lists of zeros and ones.
(defthm add-computes-sum-thm-with-type-constraints
(implies (and (bit-listp xs) (bit-listp ys))
(= (nat (add xs ys))
(+ (nat xs) (nat ys)))))
That seems reasonable, especially to people accustomed to statically typed
languages, but it means that every theorem whose proof cites this one (theorems about a a shift-and-add multiplier for example) will have to confirm the
hypotheses. That can lead to a tangle of complications as the system grows.
One way to reduce complexity, making it easier to build a theory, is to reduce the number of required hypotheses by designing functions that interpret
unexpected data in ways that are consistent with the desired theory.2 For example, the following definition of nat interprets lists of zeros and ones as binary
numerals. It assumes that the low-order bit comes first, then the 2s bit, 4s bit,
and so on up to the high-order bit.
(defun nat (xs) ; number from little-endian binary numeral
(if (consp xs)
(if (equal (car xs) 1)
(+ 1 (* 2 (nat (cdr xs))))
(* 2 (nat (cdr xs))))
0))
In addition to dealing with the expected data (lists of zeros and ones), this
definition interprets any ACL2 data structure as a binary numeral, and it does so
in a way that is consistent with its interpretation of the expected form of input.
2
I learned this trick from Ruben Gamboa. He says he learned it from Robert Boyer.
12
Rex Page
The trick this example employs is to denote one-bits by the number 1. Anything
other than the number 1 denotes a zero-bit. Furthermore, the arguments may
take any form, not just lists. This definition of nat interprets any atom in the
same way that it interprets the empty list (namely, as a numeral for the number
zero).
This typeless approach makes the theorem hold without hypotheses. When
Dracula submits the add-computes-sum property to ACL2 in the form of the
corresponding theorem (add-computes-sum-thm, above), the mechanized logic
proves it, given the definition of nat that interprets 1’s as one-bits and everything
else as zero-bits.
If we go on to define a function mul that computes the product of two binary
numerals by a shift-and-add algorithm, a proof that the multiplication function
works properly avoids a complicated nest of hypothesis verification. The proof
goes through, unfettered. It’s not so easy when the definitions place restrictions
on the form of the arguments.
(defthm mul-computes-product-of-binary-numerals
(= (nat (mul xs ys))
(* (nat xs) (nat ys))))
Many modern programming languages incorporate type consistency theorems in the compilation process. The mechanized logic of ACL2 can verify type
consistency along with more advanced properties, but the programmer must attend to all of these issues in the development process. The ACL2 logic requires
total functions to improve its success in finding proofs without always relying on
detailed assistance from the programmer ([3], page 89). Reasoning about types
in the development process is part of the price paid for the effectiveness of ACL2.
However, I suspect that reducing the network of hypotheses in a collection
of related properties would simplify verification, even in typeful systems. For
example, in a statically typed system, one might constrain numerals to be lists
of integers, but not to lists of zeros and ones. The integer zero could represent
a zero-bit, and any other integer could represent a one-bit. The type system
would implicitly verify the list-of-integers constraint automatically, so a theorem
prover could ignore that part and the chosen bit-representation semantics would
avoid unnecessary hypotheses. Of course, in this case, using lists of Booleans
to represent numerals might be a better choice. That would relegate the entire
network of hypotheses to the domain of type consistency and leave the theorem
prover free to operate without verifying additional constraints.
3.2
Verifying Termination
Since ACL2 requires all functions to be total, it must verify termination before
admitting a function to its logic. Its ability to do this is impressive, but making
use of certain idioms can simplify the process.
Count Down. For example, ACL2 is always able to admit functions whose
termination depends on an inductive parameter that is a natural number counting down to a particular value. It is often not successful if the counting goes the
Property-Based Testing: A Catalog of Classroom Examples
13
other direction. As a result, most ACL2 programmers define such functions in
ways that take advantage of the strengths of ACL2 in this area. For example the
following code defines two functions that deliver a list of integers between given
lower and upper limits. However, one of the functions (between-partial) fails
when its arguments are complex numbers or non-numeric values.
(defun between-partial (m n) ; (m m+1 ... n)
(if (<= m n)
(cons m (between-partial (+ m 1) n))
nil))
(defun between-append (m n xs) ; (append (m m+1 ... n) xs)
(if (posp n)
(between-append m (- n 1) (cons (+ m n) xs))
(cons m xs)))
(defun between (m n) ; (m m+1 ... n)
(between-append m (- n m) nil))
Even if the definition of between-partial is converted to a total function by
choosing the non-recursive formula when one or both of its arguments are not
integers, ACL2 fails to admit it because ACL2 is not good at finding a workable
induction measure when the count goes up instead of down. Rearranging the
definition so that the count goes down plays to the strengths of ACL2.
In addition, the function must be made total by dealing with non-numeric
values for the parameters. This example uses the predicate posp, which is true
for integers exceeding zero and false for all other inputs, including non-numeric
values, to accomplish the goal of defining a function that terminates, regardless
of what values are supplied as arguments.
Induction on Suffixes and Proof Guidance. Another pattern that facilitates proofs of termination arises when a list in an inductive reference is
shortened by applying the cdr function. When the shortening operation has,
instead, the form (nthcdr n xs), where n is a positive integer, ACL2 needs a
little help. The following theorem, which ACL2 succeeds in proving when an induction strategy is suggested, is often helpful when nthcdr is the operator that
makes the induction work.
(defthm nthcdr-shortens-lists
(implies (and (posp n) (consp xs))
(< (len (nthcdr n xs)) (len xs)))
:hints (("Goal" :induct (len xs))))
ACL2 applies this theorem to prove termination of the following function,
which constructs a list from every nth element of its second argument. In this
case ACL2 automatically uses the theorem about nthcdr in the termination
proof even without the :use hint in the definition. The hint is included here
primarily to illustrate its form. ACL2 occasionally needs a directive like this to
find its way to a proof.
(defun every-nth (n xs) ; (x_1 x_n+1 x_2n+1 ...)
14
Rex Page
(declare (xargs
:measure (len xs)
:hints (("Goal" :use (:instance nthcdr-shortens-lists)))))
(if (and (consp xs) (posp n))
(cons (car xs) (every-nth n (nthcdr n xs)))
xs))
To succeed, students need to follow a few guidelines in designing functions.
In addition to the guidelines already discussed, clean code helps a lot, but that is
always a worthwhile goal, so the fact that ACL2 encourages it is a plus. Nested
recursion simplifies reasoning more often then tail recursion, so that is another
useful heuristic. When tail recursion is needed for efficient computation, it is
sometimes helpful to write two versions of a function, one with nested recursion
and one with tail recursion, prove them to be equal in the extensional sense,
then use one for computation and the other for reasoning.
When the ACL2 logic needs guidance, hints suggesting an induction strategy
and/or theorems that might be useful in the proof are often sufficient. The ACL2
system includes directives for dozens of other kinds of guidance, but students
can go a long way with nothing beyond these two.
When numbers are involved, importing prepackaged bundles of theorems
about numeric algebra often gets the theorem prover around road blocks. ACL2
refers to such bundles as “books”. The “arithmetic-3/top” book provides many
theorems about the associative, commutative, and distributive laws of addition and multiplication, linear inequalities, and the like, and “arithmetic-3/floormod/floor-mod” provides theorems useful when reasoning about formulas that
employ modular arithmetic.
Surprisingly little specialized knowledge is needed to use ACL2 productively.
One reason for choosing ACL2 in a course where both property-based testing
and full verification play a role, but are not the only topics of study, is that ACL2
so often succeeds in verifying properties with little or no help. That makes it
possible to introduce verification without spending the entire course bringing
students up to speed on guiding the mechanized logic.
4
Summary and Future Work
Properties for testing that correspond to neither function equality nor round-trip
tests may express expectations that bear on correctness or consistency with the
environments in which the software components are used.
We have categorized 146 of over 300 properties defined in courses over the
past decade. Sixty-three fell into the function equality category, 25 were roundtrip properties, 21 preservation properties, 20 type properties, and 17 miscellaneous properties. The Hughes categories (function equality and round-trip)
comprise 60% of the classroom examples in our current database. This expands
to 74% when preservation properties are recognized as a special case of function
equality properties and not placed in a separate category. In this view the promi-
Property-Based Testing: A Catalog of Classroom Examples
15
nence of function equality and round-trip properties in software development is
to some extent confirmed by our experience in the classroom.
A catalog of course projects that has many descriptions of software properties
can be found in the “20 Projects” web site [10]. We have an ongoing effort to
extend this to a database of properties from course work, including property
definitions from lectures, projects, and examinations. The database will include
function definitions and formal property statements in Dracula/ACL2 logic. We
anticipate over 300 property specifications and over 200 function definitions,
indexed by category and web accessible.
We also plan to apply the ACL2 mechanized logic to these properties and to
develop the lemmas and hints required for mechanized proof of the properties.
We hope that this database will be useful in future educational efforts and will
help students learn to formulate useful properties.
Acknowledgments. The author thanks Marco T. Morazán for suggesting a
section on tips for full verification of prpperties and John Hughes for helpful
discussions of property categories. Caleb Eggensperger and Allen Smith provided
valuable assistance in developing and categorizing example properties.
References
1. Eastlund, C.: Doublecheck your theorems. In: Proceedings of the 8th International
Workshop on the ACL2 Theorem Prover and its Applications. (2009) 42–46
2. Classen, K., Hughes, J.: QuickCheck: A lightweight tool for random testing of
Haskell programs. In: Proceedings of the 5th ACM SIGPLAN International Conference on Functional Programming. (2000) 268–279
3. Kaufmann, M., Manolios, P., Moore, J.S.: Computer-Aided Reasoning: An Approach. Kluwer Academic Publishers (2000)
4. Arts, T., Hughes, J., Johansson, J., Wiger, U.: Testing telecoms software with QuviQ
QuickCheck. In: Proceedings of the 2006 ACM SIGPLAN Workshop on Erlang.
(2006) 2–10
5. Derrick, J., Walkinshaw, N., Arts, T., Earle, C.B., Cesarini, F., Fredlund, L., Guilas,
V., Hughes, J., Thompson, S.: Property-based testing—the ProTest project. In:
Lecture Notes in Computer Science 6286: Formal Methods for Components and
Objects, Springer (2009) 250–271
6. Claessen, K., Palka, M., Smallbone, N., Hughes, J.: Finding race conditions in Erlang with QuickCheck and PULSE. In: Proceedings of the 14th ACM SIGPLAN
International Conference on Functional Programming. (2009) 149–160
7. Claessen, K., Smallbone, N., Hughes, J.: QuickSpec: Guessing formal specifications
using testing. In: Proceedings of the 4th International Conference on Tests and
Proofs. (2010) 6–21
8. Hughes, J., Norell, U., Sautret, J.: Using temporal relations to specify and test an
instant messaging server. In: Proceedings of the 32nd International Conference on
Software Engineering. (2010) 95–102
9. Hughes, J., Bolinder, H.: Testing a database for race conditions with QuickCheck.
In: Proceedings of the 10th ACM SIGPLAN Erlang Workshop. (2011) 72–77
10. Page, R.: 20 projects: http://www.cs.ou.edu/~rlpage/SEcollab/20projects/
(2008)