Property-Based Testing and Verification: A Catalog of Classroom Examples

2012, Lecture Notes in Computer Science

To appear in proceedings of IFL 2011, Springer LNCS, Volume 7267 Property-Based Testing and Verification: a Catalog of Classroom Examples Rex Page⋆ University of Oklahoma, Norman OK 73019 Abstract. For the past a decade, John Hughes of Chalmers University and Quviq has pioneered property-based testing in debugging industrial software. Three undergraduate courses at the University of Oklahoma have paralleled this work, but on the pedagogical side and with the additional goal of verification of properties by mathematical proof, using both paper-and-pencil methods and mechanized logic. An essential part of both efforts, and probably the most important part, is the formalization in predicate logic of the expectations of software developers. This is central to engineering software quality in the defect dimension. This report discusses property-based testing in course work and provides some examples from the classroom. Hughes has observed that software properties useful for testing functional software often (1) compare the results of different ways of computing the same thing or (2) check that forward and inverse transformations restore the original data. What percentage of useful software properties fall into these categories? Our collection of examples from course records may help shed some light on how closely pedagogy on property-based testing matches observations of relative frequencies of various categories of properties in industry and may also serve as an educational aid in learning to use property-based testing effectively in software development. 1 Property-Based Testing in the Classroom Students in the computer science program at the University of Oklahoma are required to take three courses in which expected software behaviors are expressed as formulas in predicate logic. In one of the courses (applied logic), the focus is on paper-and-pencil proofs that the properties hold. In the others (a twocourse sequence in software engineering), the focus is on software development, including testing and verification. For the past three years, all three courses have included project experience with automated testing on random data generated by the DoubleCheck tool in the Dracula programming environment [1], a facility similar to QuickCheck [2], but for software expressed in ACL2 [3], which includes a programming language based on Common Lisp. Used in its native form, the ACL2 mechanized logic requires properties to be expressed as theorems, and programmers can sometimes succeed in getting ⋆ Partially supported by National Science Foundation Award Number 1016532. 2 Rex Page ACL2 to verify those theorems by mathematical proof. In some cases ACL2 fully automates the proof process. For more complex properties, ACL2 needs help in the form of lemmas or hints about proof strategy. The Dracula environment acts as an interface to the ACL2 programming language and logic, adding to it a testing environment with generators for random data. On request, Dracula converts DoubleCheck properties to ACL2 theorems and submits them to the mechanized logic for verification. Students can carry out testing based on formal predicates about behaviors they expect in software they are developing. When testing convinces them that their software works properly, they have the option to proceed to full, mechanized verification of the theorems expressed in those predicates. Property-based testing with random data becomes part of the software development process, and full verification also plays a role. For example consider the following DoubleCheck test for the associativity of list concatenation. (defproperty append-assoc :repeat 1000 (xs :value (random-list-of (random-symbol)) ys :value (random-list-of (random-symbol)) zs :value (random-list-of (random-symbol))) (equal (append xs (append ys zs)) (append (append xs ys) zs))) Dracula checks the property specified by the equation at the end of the definition against a thousand random choices for the lists involved in the equation. (The :repeat specification controls the number of tests, which defaults to fifty if unspecified.) Dracula selects both the lengths of the lists and the symbols comprising their elements from independent, random distributions. Programmers can control those distributions through special directives or by writing their own generators. The random data generators provided by DoubleCheck are adequate for most of the testing students do in their course work, but some software engineering course projects call for the design of generators. When asked to prove that the associative property of concatenation holds for all choices of the lists, Dracula converts the property definition to the following theorem and submits it to the ACL2 mechanized logic. (defthm append-assoc-thm (equal (append xs (append ys zs)) (append (append xs ys) zs))) The theorem is a copy of the predicate specified in the DoubleCheck property, without consideration of the kinds of data it specifies. However, the specified property can restrict the test in certain ways, and when it does this, the restrictions are incorporated in the theorem as hypotheses in an implication. For example, nthcdr is an intrinsic ACL2 operator that delivers a list like its second argument, but without an initial segment of elements, the length of which is specified by the first argument. Predicates about nthcdr, to be proven as ACL2 theorems, must restrict the first argument to the natural numbers. A property Property-Based Testing: A Catalog of Classroom Examples 3 definition can specify such a restriction through a :where clause, as in the following definition. (defproperty nthcdr-len (n :value (random-integer) :where (natp n) xs :value (random-list-of (random-symbol))) (<= (len (nthcdr n xs)) n)) When random test data is restricted by a :where clause, Dracula includes the restriction as a hypothesis in an implication that has the specified property as its conclusion. When there are multiple :where clauses, their logical conjunction forms the hypothesis. If asked to call for a proof that the length property of nthcdr holds for all data, Dracula would submit the following theorem to the ACL2 mechanized logic. (defthm nthcdr-len-thm (implies (natp n) (<= (len (nthcdr n xs)) n))) Through :where restrictions and other annotations, DoubleCheck provides full access to the mechanized logic of ACL2. Dracula also allows direct definition of ACL2 theorems, which are submitted to the mechanized logic without modification, but students generally find it convenient to stay within the DoubleCheck property-definition syntax because they usually go through the testing stage before attempting full verification of properties as theorems. We find that as students proceed through courses using Dracula and DoubleCheck, they gradually evolve from the point of view that testing software components is a matter of running a few tests on selected, specific cases to the conclusion that, while running thousands of random tests based on formal predicates may not provide full guarantees, it goes much further in that direction than the one-off tests they previously relied on. They also gain an appreciation of the difference between testing a program and knowing for certain some of the properties of a program. We are now in the process of cataloging property-based testing examples from lectures, homework projects, and examinations, expressing them in the common framework provided by Dracula, and, in many cases, using the ACL2 mechanized logic to carry out proofs of the properties. This report describes some of the examples and categories, specifies a few in detail, presents some preliminary data about categories of property-based tests appearing in the examples, and comments on the distribution of those examples across the observed categories. 2 Property Categories and Examples John Hughes has for years used property-based testing in industrial environments [4] [5] [6] [7] [8]. He has found that, while industrial software testing usually calls for checking output against post-conditions on models of state, tests of functional 4 Rex Page programs most often fall into one of two categories: (1) comparing the results of computing a value in two different ways (function equality tests)1 (2) checking that an inverse function reproduces the original input when applied to the results of the forward computation (round-trip tests) [9]. We have categorized so far about half of over 300 properties defined in logic and software engineering courses over the past decade. About a quarter of these properties are either not characterized (miscellaneous) or can be viewed as type specifications, a category that would not arise for properties expressed in statically typed languages. About one in eight fall into a special kind of function equality that we refer to as property preservation, such as preservation of length by functions that rearrange lists into increasing order. 2.1 Function Equality Properties The property expressing the associativity of append (Section 1) is an example of a function equality test. Figure 1 samples typical function equality tests from lectures in my software engineering courses over the past decade. The first three tests in Figure 1 involve only intrinsic functions in ACL2: (defproperty cdr-reduces-len-by-1 (xs :value (random-list-of (random-symbol)) :where (consp xs)) (= (len(cdr xs)) (- (len xs) 1))) (defproperty cons-as-append (x :value (random-symbol) xs :value (random-list-of (random-symbol))) (equal (append (list x) xs) (cons x xs))) (defproperty law-of-added-exponents (m :value (random-integer) :where (integerp m) n :value (random-integer) :where (integerp n) x :value (random-integer) :where (and (acl2-numberp x) (/= x 0))) (= (expt x (+ m n)) (* (expt x m) (expt x n)))) Other tests express relationships among functions defined by the programmer. 1 Hughes uses the term “commutative diagram” to describe this type of test because in his experience such a test usually involves an abstract data type with corresponding transformations between concrete and abstract versions of the same data. One route on the diagram transforms concrete data and then moves the result to the abstract domain. The other route moves the data to the abstract domain, then transforms it. Our term, “function equality”, connotes a somewhat broader category of tests. Property-Based Testing: A Catalog of Classroom Examples 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 5 removing the first element of a non-empty list reduces its length by 1 inserting an element at the beginning of a list is a special case of concatenation law of added exponents conventional exponentiation and the Russian peasant algorithm the nth element of the product of a scalar and a vector is the same as the product of the scalar and the nth element of the vector evaluating a polynomial conventionally and by Horner’s rule the first element of a list is the last element of its reverse various tail recursions match a corresponding nested recursion (list reversal or Fibonacci numbers, for example) the number of elements in a flattened tree is the number of nodes in the tree the sum of the numbers in a list constructed from n copies of a number x is the product of n and x DeMorgan’s laws for lists of Booleans various Boolean identities, such as the absorption laws associative law of concatenation demultiplexing a list splits it into its even-indexed and odd-indexed elements multiplexing two lists puts the elements of one of them in even-indexed positions in the result list and the elements of the other in the odd-indexed positions the low-order bit in the binary representation of an even number is zero inserting n zeros at the low-order end of a binary numeral produces the numeral of a number 2n times larger the length of a binary numeral with no leading zeros is the ceiling of the base-2 logarithm of the next integer after the one the numeral denotes a ripple-carry binary adder delivers the sum of the input numerals a shift-and-add binary multiplier delivers the product of the input numerals Fig. 1. Function Equality Properties (43%): (f x) = (g x) 6 Rex Page (defun s*v (s v) ; (s*v_1 s*v_2 s*v_3 ... ) (if (consp v) (cons (* s (first v)) (s*v s (rest v))) v)) (defproperty s*v-elements (s :value (random-integer) v :value (random-list-of (random-integer)) :where (consp v) n :value (random-between 0 (1- (len v))) :where (and (natp n) (< n (len v)))) (= (nth n (s*v s v)) (* s (nth n v)))) The ACL2 mechanized logic fully automates all but a few the proofs of all of the function equality properties in Figure 1. However, some of the proofs (the law of exponents for example) depend on theorems of numeric algebra imported from a standard ACL2 library. 2.2 Round-Trip Properties Many property-based tests involve first encoding some information, then decoding it with an inverse operator. A classic example of such a property applies the list-reversal operator twice. Since reverse is its own inverse, two applications in succession reproduce the original list. Figure 2 samples typical round-trip properties from course work. The following linear encryption and decryption functions set the stage for an examination of the second example from the list of round-trip properties in Figure 2. This particular encryption method simply adds adjacent numbers in a list of integers in the range 0 through m − 1 using arithmetic modulo m. It inserts a fixed number at the end of the list to encode the last element. If the numbers were, for example, ASCII character codes, the encoding would handle text messages. More complex linear encodings that transform adjacent pairs with a two-by-two matrix of coefficients could be handled similarly. (defun encrypt-pair (m x x-nxt) (mod (+ x x-nxt) m)) (defun decrypt-pair (m x-encrypted y-decrypted) (mod (- x-encrypted y-decrypted) m)) (defun encrypt (m xs) ; linear encryption (if (consp (cdr xs)) (cons (encrypt-pair m (car xs) (cadr xs)) (encrypt m (cdr xs))) (list (encrypt-pair m (car xs) (1- m))))) (defun decrypt (m ys) (if (consp (cdr ys)) Property-Based Testing: A Catalog of Classroom Examples 7 1. double reverse 2. use linear encryption to encrypt a message, then decrypt it 3. compute the quotient and remainder, given a divisor and dividend, then reconstruct the dividend from the quotient, remainder, and divisor 4. convert a number to a binary numeral, then back to a number 5. convert a binary numeral with no leading zeros to a number, then back to a binary numeral 6. convert a number to a binary numeral, insert some leading zeros, then convert back to a number 7. a list is the concatenation of its first n elements and its remaining elements 8. demultiplex a list of signals into the even-numbered and odd-numbers ones, then multiplex them to reproduce the original list 9. multiplex two lists that contain the same number of signals, then demultiplex to reproduce the original lists 10. log2 (2n ) = n 11. pad a list to a specified length, then compute its length 12. chop a list to a specified length, then compute its length 13. concatenate two lists, then reproduce the second list by dropping the number of elements in the first from the beginning of the concatenation 14. concatenate two lists, then reproduce the first list by extracting the number of elements it contains from the beginning of the concatenation 15. split a list into contiguous sublists of a specified length, then append all the sublists together 16. extract the packets in a list that are separated by a given delimiter, then reconstruct the list from the packets 17. insert a key into a search tree, then extract the key Fig. 2. Round-Trip Properties (17%): (f (g x)) = x 8 Rex Page (let* ((decrypted-cdr (decrypt m (cdr ys)))) (cons (decrypt-pair m (car ys) (car decrypted-cdr)) decrypted-cdr)) (list (decrypt-pair m (car ys) (1- m))))) With these definitions decrypt inverts encrypt. The following property implements a round-trip test of this fact. (defun (and (defun (and codep (m x) (natp x) (< x m))) code-listp (m xs) (natp m) (> m 1) (if (consp xs) (and (codep m (car xs)) (code-listp m (cdr xs))) (null xs)))) (defproperty decrypt-inverts-encrypt (m :value (+ (random-natural) 2) n :value (random-natural) xs :value (random-list-of (random-between 0 (- m 1))) :where (and (natp m) (> m 1) (consp xs) (code-listp m xs))) (equal (decrypt m (encrypt m xs)) xs)) This example reinforces the importance of running many tests and the value of verification by mathematical proof. If a bug is inserted in the test, allowing the values in the message to range up to m + 1 instead of m − 1, DoubleCheck runs almost 700 tests before uncovering the error. Serious testing usually requires thousands of tests, and careful consideration of the distribution of random values generated by the tests may also be necessary. The ACL2 mechanized logic succeeds in fully automating the proof of the inversion property for this linear encryption/decryption implementation when given access to some algebraic properties of modular arithmetic that are supplied in a standard library. Again, the :where clause states the hypothesis in the implication being verified, and the equation at the end of the property definition specifies the conclusion. An error in the value specification for random test data does not affect the theorem, since Dracula uses only :where clauses and the Boolean formula specifying the property to generate the theorem it submits to the mechanized logic. The mechanized logic fully automates the proofs of all the round-trip properties in Figure 2 except the last three. It needs a little help to work its way through the properties of the list-splitting and packets functions and substantial help to verify search tree properties. 2.3 Other Categories of Properties Function equality, round-trip, and the properties we think of as preserving some characteristic of the input are all characterized by equations: (f x) = (g x), (f Property-Based Testing: A Catalog of Classroom Examples 9 (g x)) = x, and (p (f x)) = (p x), respectively. All three categories could be viewed as versions of function equality, in a sense. We view them as distinct categories because we think they provide useful guidelines to students looking for properties that express testable expectations of the software they develop. Two examples of preservation properties involve list concatenation. Concatenation preserves total length of the input lists and also preserves the elements that occur in the input. (defproperty append-preserves-len (xs :value (random-list-of (random-symbol)) ys :value (random-list-of (random-symbol))) (= (len (append xs ys)) (+ (len xs) (len ys)))) (defproperty append-conserves-elements (x :value (random-symbol) xs :value (random-list-of (random-symbol)) ys :value (random-list-of (random-symbol))) (iff (member-equal x (append xs ys)) (or (member-equal x xs) (member-equal x ys))) Sorting functions, reverse, various forms of list merge, whether order-preserving or simple, two-way multiplexing (lock-step selection from one list or the other) are other operations that preserve length and list elements. Figure 3 summarizes the idea of preservation properties with such examples. 1. various “conservation of list-length” properties (for example, length of the list delivered by a sorting function is the same as the length of the input list; similar statements about concatenation, reverse, multiplexing, demultiplexing, and ordered merge) 2. various “conservation of values” properties (for example, a value is in the list delivered by a sorting function if and only if it is in the input list; similar properties for concatenation, reverse, multiplexing, demultiplexing, and ordered merge) 3. the cosine of the angle between two vectors is not affected by multiplying all the elements of one of the vectors by the same scalar Fig. 3. Preservation Properties (14%): (p (f x)) = (p x) Since formulas in ACL2 are untyped, it is often useful to define properties to test expectations about type. Figure 4 displays some properties of this kind. Finally, some properties that we have found useful in the classroom do not seem to fall into any of the categories we have described. Figure 5 describes some of these miscellaneous properties. 10 1. 2. 3. 4. 5. 6. 7. Rex Page the length of a list is a natural number the square of a rational number is non-negative the sum (or product) of two positive numbers is positive the inner product of vectors of rational numbers is a rational number the maximum (or minimum) of a list of rational numbers is a rational number a permutation generator delivers a list of lists quicksort delivers an ordered list (same goes for insertion sort and merge-sort) Fig. 4. Type Properties (14%) 1. 2. 3. 4. 5. 6. the maximum (or minimum) of a list is an element of the list no element of a list is greater than (less than) its maximum (minimum) demultiplexing a list with more than one element produces two shorter lists a suffix (or prefix) of a list cannot be longer than the list removing a positive number of elements from a non-empty list makes it shorter the square of the inner product of two vectors does not exceed the square of the product of their norms 7. the cosine of the angle between two different vectors is strictly less than 1 Fig. 5. Miscellaneous Properties (12%) 3 Tips for Converting Properties to Theorems Property-based tests are Boolean formulas with value specifications for random data generation. Expressing software expectations as Boolean formulas focuses attention on correctness issues, and that, by itself, has a positive affect on software reliability. A mechanized logic like ACL2 offers an opportunity to go beyond testing to verification of properties by mathematical proof. Guiding the mechanized logic through this extra step often requires substantial effort, but there are some things that can be done to facilitate the verification process and help students succeed with mechanized proofs. Many of our observations about facilitating verification are specific to the ACL2 theorem prover. Consequently, they will be of greatest value to people using the ACL2 system, especially within the Dracula environment. Other theorem provers will require other tricks. However, part of the following discussion includes a conjecture about the potential usefulness of some of these ideas in other systems of mechanized logic. 3.1 Dealing with Data Types The ACL2 logic requires all functions to be total. That is, they must specify values for all kinds of input. Value specifications in tests usually restrict themselves to data within the expected domain of operations. For example, suppose add is a function designed to compute the sum of two binary numerals, and nat is a function that converts binary numerals to numbers. If binary numerals are Property-Based Testing: A Catalog of Classroom Examples 11 represented by lists of zeros and ones, the following function equality test would make sense. (defproperty add-computes-sum-of-binary-numerals (xs :value (random-list-of (random-between 0 1)) ys :value (random-list-of (random-between 0 1))) (= (nat (add xs ys)) (+ (nat xs) (nat ys)))) One then might want to prove the corresponding theorem. (defthm add-computes-sum-thm (= (nat (add xs ys)) (+ (nat xs) (nat ys)))) This might or might not be a theorem. It depends on details in the representation of binary numerals. If nat and add require their arguments to be lists of zeros and ones, as they are in the value specifications of the add-computes-sum property, then the theorem will need a hypothesis constraining the arguments to lists of zeros and ones. (defthm add-computes-sum-thm-with-type-constraints (implies (and (bit-listp xs) (bit-listp ys)) (= (nat (add xs ys)) (+ (nat xs) (nat ys))))) That seems reasonable, especially to people accustomed to statically typed languages, but it means that every theorem whose proof cites this one (theorems about a a shift-and-add multiplier for example) will have to confirm the hypotheses. That can lead to a tangle of complications as the system grows. One way to reduce complexity, making it easier to build a theory, is to reduce the number of required hypotheses by designing functions that interpret unexpected data in ways that are consistent with the desired theory.2 For example, the following definition of nat interprets lists of zeros and ones as binary numerals. It assumes that the low-order bit comes first, then the 2s bit, 4s bit, and so on up to the high-order bit. (defun nat (xs) ; number from little-endian binary numeral (if (consp xs) (if (equal (car xs) 1) (+ 1 (* 2 (nat (cdr xs)))) (* 2 (nat (cdr xs)))) 0)) In addition to dealing with the expected data (lists of zeros and ones), this definition interprets any ACL2 data structure as a binary numeral, and it does so in a way that is consistent with its interpretation of the expected form of input. 2 I learned this trick from Ruben Gamboa. He says he learned it from Robert Boyer. 12 Rex Page The trick this example employs is to denote one-bits by the number 1. Anything other than the number 1 denotes a zero-bit. Furthermore, the arguments may take any form, not just lists. This definition of nat interprets any atom in the same way that it interprets the empty list (namely, as a numeral for the number zero). This typeless approach makes the theorem hold without hypotheses. When Dracula submits the add-computes-sum property to ACL2 in the form of the corresponding theorem (add-computes-sum-thm, above), the mechanized logic proves it, given the definition of nat that interprets 1’s as one-bits and everything else as zero-bits. If we go on to define a function mul that computes the product of two binary numerals by a shift-and-add algorithm, a proof that the multiplication function works properly avoids a complicated nest of hypothesis verification. The proof goes through, unfettered. It’s not so easy when the definitions place restrictions on the form of the arguments. (defthm mul-computes-product-of-binary-numerals (= (nat (mul xs ys)) (* (nat xs) (nat ys)))) Many modern programming languages incorporate type consistency theorems in the compilation process. The mechanized logic of ACL2 can verify type consistency along with more advanced properties, but the programmer must attend to all of these issues in the development process. The ACL2 logic requires total functions to improve its success in finding proofs without always relying on detailed assistance from the programmer ([3], page 89). Reasoning about types in the development process is part of the price paid for the effectiveness of ACL2. However, I suspect that reducing the network of hypotheses in a collection of related properties would simplify verification, even in typeful systems. For example, in a statically typed system, one might constrain numerals to be lists of integers, but not to lists of zeros and ones. The integer zero could represent a zero-bit, and any other integer could represent a one-bit. The type system would implicitly verify the list-of-integers constraint automatically, so a theorem prover could ignore that part and the chosen bit-representation semantics would avoid unnecessary hypotheses. Of course, in this case, using lists of Booleans to represent numerals might be a better choice. That would relegate the entire network of hypotheses to the domain of type consistency and leave the theorem prover free to operate without verifying additional constraints. 3.2 Verifying Termination Since ACL2 requires all functions to be total, it must verify termination before admitting a function to its logic. Its ability to do this is impressive, but making use of certain idioms can simplify the process. Count Down. For example, ACL2 is always able to admit functions whose termination depends on an inductive parameter that is a natural number counting down to a particular value. It is often not successful if the counting goes the Property-Based Testing: A Catalog of Classroom Examples 13 other direction. As a result, most ACL2 programmers define such functions in ways that take advantage of the strengths of ACL2 in this area. For example the following code defines two functions that deliver a list of integers between given lower and upper limits. However, one of the functions (between-partial) fails when its arguments are complex numbers or non-numeric values. (defun between-partial (m n) ; (m m+1 ... n) (if (<= m n) (cons m (between-partial (+ m 1) n)) nil)) (defun between-append (m n xs) ; (append (m m+1 ... n) xs) (if (posp n) (between-append m (- n 1) (cons (+ m n) xs)) (cons m xs))) (defun between (m n) ; (m m+1 ... n) (between-append m (- n m) nil)) Even if the definition of between-partial is converted to a total function by choosing the non-recursive formula when one or both of its arguments are not integers, ACL2 fails to admit it because ACL2 is not good at finding a workable induction measure when the count goes up instead of down. Rearranging the definition so that the count goes down plays to the strengths of ACL2. In addition, the function must be made total by dealing with non-numeric values for the parameters. This example uses the predicate posp, which is true for integers exceeding zero and false for all other inputs, including non-numeric values, to accomplish the goal of defining a function that terminates, regardless of what values are supplied as arguments. Induction on Suffixes and Proof Guidance. Another pattern that facilitates proofs of termination arises when a list in an inductive reference is shortened by applying the cdr function. When the shortening operation has, instead, the form (nthcdr n xs), where n is a positive integer, ACL2 needs a little help. The following theorem, which ACL2 succeeds in proving when an induction strategy is suggested, is often helpful when nthcdr is the operator that makes the induction work. (defthm nthcdr-shortens-lists (implies (and (posp n) (consp xs)) (< (len (nthcdr n xs)) (len xs))) :hints (("Goal" :induct (len xs)))) ACL2 applies this theorem to prove termination of the following function, which constructs a list from every nth element of its second argument. In this case ACL2 automatically uses the theorem about nthcdr in the termination proof even without the :use hint in the definition. The hint is included here primarily to illustrate its form. ACL2 occasionally needs a directive like this to find its way to a proof. (defun every-nth (n xs) ; (x_1 x_n+1 x_2n+1 ...) 14 Rex Page (declare (xargs :measure (len xs) :hints (("Goal" :use (:instance nthcdr-shortens-lists))))) (if (and (consp xs) (posp n)) (cons (car xs) (every-nth n (nthcdr n xs))) xs)) To succeed, students need to follow a few guidelines in designing functions. In addition to the guidelines already discussed, clean code helps a lot, but that is always a worthwhile goal, so the fact that ACL2 encourages it is a plus. Nested recursion simplifies reasoning more often then tail recursion, so that is another useful heuristic. When tail recursion is needed for efficient computation, it is sometimes helpful to write two versions of a function, one with nested recursion and one with tail recursion, prove them to be equal in the extensional sense, then use one for computation and the other for reasoning. When the ACL2 logic needs guidance, hints suggesting an induction strategy and/or theorems that might be useful in the proof are often sufficient. The ACL2 system includes directives for dozens of other kinds of guidance, but students can go a long way with nothing beyond these two. When numbers are involved, importing prepackaged bundles of theorems about numeric algebra often gets the theorem prover around road blocks. ACL2 refers to such bundles as “books”. The “arithmetic-3/top” book provides many theorems about the associative, commutative, and distributive laws of addition and multiplication, linear inequalities, and the like, and “arithmetic-3/floormod/floor-mod” provides theorems useful when reasoning about formulas that employ modular arithmetic. Surprisingly little specialized knowledge is needed to use ACL2 productively. One reason for choosing ACL2 in a course where both property-based testing and full verification play a role, but are not the only topics of study, is that ACL2 so often succeeds in verifying properties with little or no help. That makes it possible to introduce verification without spending the entire course bringing students up to speed on guiding the mechanized logic. 4 Summary and Future Work Properties for testing that correspond to neither function equality nor round-trip tests may express expectations that bear on correctness or consistency with the environments in which the software components are used. We have categorized 146 of over 300 properties defined in courses over the past decade. Sixty-three fell into the function equality category, 25 were roundtrip properties, 21 preservation properties, 20 type properties, and 17 miscellaneous properties. The Hughes categories (function equality and round-trip) comprise 60% of the classroom examples in our current database. This expands to 74% when preservation properties are recognized as a special case of function equality properties and not placed in a separate category. In this view the promi- Property-Based Testing: A Catalog of Classroom Examples 15 nence of function equality and round-trip properties in software development is to some extent confirmed by our experience in the classroom. A catalog of course projects that has many descriptions of software properties can be found in the “20 Projects” web site [10]. We have an ongoing effort to extend this to a database of properties from course work, including property definitions from lectures, projects, and examinations. The database will include function definitions and formal property statements in Dracula/ACL2 logic. We anticipate over 300 property specifications and over 200 function definitions, indexed by category and web accessible. We also plan to apply the ACL2 mechanized logic to these properties and to develop the lemmas and hints required for mechanized proof of the properties. We hope that this database will be useful in future educational efforts and will help students learn to formulate useful properties. Acknowledgments. The author thanks Marco T. Morazán for suggesting a section on tips for full verification of prpperties and John Hughes for helpful discussions of property categories. Caleb Eggensperger and Allen Smith provided valuable assistance in developing and categorizing example properties. References 1. Eastlund, C.: Doublecheck your theorems. In: Proceedings of the 8th International Workshop on the ACL2 Theorem Prover and its Applications. (2009) 42–46 2. Classen, K., Hughes, J.: QuickCheck: A lightweight tool for random testing of Haskell programs. In: Proceedings of the 5th ACM SIGPLAN International Conference on Functional Programming. (2000) 268–279 3. Kaufmann, M., Manolios, P., Moore, J.S.: Computer-Aided Reasoning: An Approach. Kluwer Academic Publishers (2000) 4. Arts, T., Hughes, J., Johansson, J., Wiger, U.: Testing telecoms software with QuviQ QuickCheck. In: Proceedings of the 2006 ACM SIGPLAN Workshop on Erlang. (2006) 2–10 5. Derrick, J., Walkinshaw, N., Arts, T., Earle, C.B., Cesarini, F., Fredlund, L., Guilas, V., Hughes, J., Thompson, S.: Property-based testing—the ProTest project. In: Lecture Notes in Computer Science 6286: Formal Methods for Components and Objects, Springer (2009) 250–271 6. Claessen, K., Palka, M., Smallbone, N., Hughes, J.: Finding race conditions in Erlang with QuickCheck and PULSE. In: Proceedings of the 14th ACM SIGPLAN International Conference on Functional Programming. (2009) 149–160 7. Claessen, K., Smallbone, N., Hughes, J.: QuickSpec: Guessing formal specifications using testing. In: Proceedings of the 4th International Conference on Tests and Proofs. (2010) 6–21 8. Hughes, J., Norell, U., Sautret, J.: Using temporal relations to specify and test an instant messaging server. In: Proceedings of the 32nd International Conference on Software Engineering. (2010) 95–102 9. Hughes, J., Bolinder, H.: Testing a database for race conditions with QuickCheck. In: Proceedings of the 10th ACM SIGPLAN Erlang Workshop. (2011) 72–77 10. Page, R.: 20 projects: http://www.cs.ou.edu/~rlpage/SEcollab/20projects/ (2008)

Log In

Property-Based Testing and Verification: A Catalog of Classroom Examples

Related papers

Related papers