Many programmers, especially in introductory classes, might think of programming as a task largely involving debugging. So it's worthwhile to take a step back and think about everything that comes before debugging.
Entire classes of bugs can be eradicated by choosing to program in languages that guarantee memory safety (that no part of memory can be accessed except through a pointer (or reference) that is valid for that region of memory) and type safety (that no value can be used in a way inconsistent with its type). The OCaml type system, for example, prevents programs from buffer overflows and meaningless operations (like adding a boolean to a float), whereas the C type system does not.
There are automated source-code analysis tools, like FindBugs, which can find many common kinds of bugs in Java programs, and SLAM, which is used to find bugs in device drivers. The subfield of CS known as formal methods studies how to use mathematics to specify and verify programs, that is, how to prove that programs have no bugs. We'll study verification later in this course. Social methods such as code reviews and pair-programming are also useful at finding bugs. Studies at IBM in the 1970s-1990s suggested that code reviews were the most effective tools at finding bugs; see the Appendix below.
The earlier a bug appears, the easier it is to diagnose and fix. If computation instead proceeds past the point of the bug, then that further computation might obscure where the failure really occurred. Assertions in the source code make programs "fail fast" and "fail loudly", so that bugs appear immediately, and the programmer knows exactly where in the source code to look.
How can you know whether a piece of code has a particular bug? Write tests that would expose the bug, then confirm that your code doesn't fail those tests. Unit test for a relatively small piece of code, such as an individual function or module, are especially important to write at the same time as you develop that code. Running of those tests should be automated, so that if you ever break the code, you find out as soon as possible. (That's really Defense 3 again.)
Debugging is really the last resort of a programmer.
In the rest of this recitation, we'll look at defenses 3 through 5.
The Assertions
module contains a function assert_true : bool -> unit
,
which returns unit if passed true
as an argument,
and raises an exception if passed false
. For example:
open Assertions let t = assert_true true let inc x = x+1 let u = assert_true ((inc 0) = 0) (* raises an exception *)(The
open Assertions
line makes it possible to refer to
any function f
declared inside Assertions
simply as
f
, rather than Assertions.f
.)
If we save this code in a file test.ml then run cs3110 compile test.ml; cs3110 run test.ml we see the exception being raised:
Fatal error: exception Assertions.Assert_true("false is not true") Raised at file "assertions/assertions.ml", line 4, characters 18-49 Called from file "test.ml", line 4, characters 8-33 Error (exit code 2)We'll see later in the course how to declare, raise, and handle exceptions ourselves. For now, it's enough to know that
Assertions
uses them.
The Assertions
module contains many other functions we can
use to write assertions. You can read the documentation at the GitHub repo
for the module.
This course has developed its own unit-testing framework, which is part
of the cs3110
tool distributed on the VM.
Our framework is quite similar to (and in fact is built on) an
open-source library Pa_ounit
.
Suppose you're writing a function to determine whether a number is prime:
let is_prime : int -> bool = fun x -> x=3 || x=5 || x=7(Obviously, you're not yet done writing the function...) And suppose you've saved this function in a file named
prime.ml
.
To write unit tests, create a new file prime_test.ml
.
(The choice of filename is a convention, not a strict requirement.)
Inside prime_test.ml
, you can write unit tests such
as the following:
open Prime open Assertions TEST_UNIT "test_prime_one" = assert_true (is_prime 3) TEST_UNIT "test_prime_two" = assert_true (is_prime 7) TEST_UNIT "test_not_prime_eight" = assert_false (is_prime 8) let () = Pa_ounit_lib.Runtime.summarize()The first two lines open the modules for the code being tested and for the assertions module. Note that the first letter of the module name is capitalized, regardless of whether or not it was capitalized in the actual filename. The strings associated with each unit test are simply descriptive names. The last line (the call to
summarize
)
should remain the final line of the file if you add
any new tests. This function call exits with an error and a
summary of the tests that failed.
If all tests pass, it will exit normally.
To run your unit tests, do the following:
cs3110 compile prime_test.ml cs3110 test prime_test.mlAs output, you will see:
3 tests ran, 0 test_modules ran
The following file contains some buggy code:
Turn on Javascript to see the program.
Here is a unit test file:
Turn on Javascript to see the program.
Do the above test cases pass? If not, which tests fail? If so, does this guarantee correctness of the above functions? Can you think of some more cases that we did not test? Add some cases of your own to the test file, and fix the implementation of the functions so that your tests pass.
Testing doesn't have to happen strictly after you write code. In test-driven development, testing comes first!
Suppose we are working with a datatype for days:
type day = Sunday | Monday | Tuesday | Wednesday | Thursday | Friday | SaturdayAnd we want to write a function
next_weekday : day -> day
that returns the next weekday after a given day. We start by writing
the most basic, broken version of that function we can:
let next_weekday (d:day) = SundayThen we begin writing tests. For example, we know that the next weekday after Monday is Tuesday. So we add a test:
TEST_UNIT "next_weekday_test1" = assert_true (next_weekday(Monday) = Tuesday)We compile and run the test; it fails. That's good! Now we know that we have some functionality to add. Let's go back and make this test pass:
let next_weekday (d:day) = match d with Monday -> Tuesday | _ -> SundayWe compile and run the test; it passes. Time to add some more tests:
TEST_UNIT "next_weekday_test2" = assert_true (next_weekday(Tuesday) = Wednesday) TEST_UNIT "next_weekday_test3" = assert_true (next_weekday(Wednesday) = Thursday) TEST_UNIT "next_weekday_test4" = assert_true (next_weekday(Thursday) = Friday)We compile and run the tests; many fail. That's good! We add new functionality:
let next_weekday (d:day) = match d with Monday -> Tuesday | Tuesday -> Wednesday | Wednesday -> Thursday | Thursday -> Friday | _ -> SundayWe compile and run the tests; they pass. Perhaps overly emboldened by our success in adding functionality, we notice the pattern that has evolved, and we decide to finish writing the function:
let next_weekday (d:day) = match d with Monday -> Tuesday | Tuesday -> Wednesday | Wednesday -> Thursday | Thursday -> Friday | Friday -> Saturday | Saturday -> Sunday | Sunday -> Monday(Of course we got it wrong.) We add some new tests to finish testing all the possible cases:
TEST_UNIT "next_weekday_test5" = assert_true (next_weekday(Friday) = Monday) TEST_UNIT "next_weekday_test6" = assert_true (next_weekday(Saturday) = Monday) TEST_UNIT "next_weekday_test7" = assert_true (next_weekday(Sunday) = Monday)We compile and run the tests; tests 5 and 6 fail! Surprised, we look at test 5, notice that the code for
next_weekday(Friday)
must be wrong, and fix the bug. After fixing the same bug for test 6,
we end up with the following complete piece of code and unit tests:
let next_weekday (d:day) = match d with Monday -> Tuesday | Tuesday -> Wednesday | Wednesday -> Thursday | Thursday -> Friday | Friday -> Monday | Saturday -> Monday | Sunday -> Monday TEST_UNIT "next_weekday_test1" = assert_true (next_weekday(Monday) = Tuesday) TEST_UNIT "next_weekday_test2" = assert_true (next_weekday(Tuesday) = Wednesday) TEST_UNIT "next_weekday_test3" = assert_true (next_weekday(Wednesday) = Thursday) TEST_UNIT "next_weekday_test4" = assert_true (next_weekday(Thursday) = Friday) TEST_UNIT "next_weekday_test5" = assert_true (next_weekday(Friday) = Monday) TEST_UNIT "next_weekday_test6" = assert_true (next_weekday(Saturday) = Monday) TEST_UNIT "next_weekday_test7" = assert_true (next_weekday(Sunday) = Monday)
Test-driven development is related to extreme programming (XP), is a collection of ideas about how to structure team programming projects. It emphasizes incremental development of code: there is always something that can be tested. Testing is not something that happens after implementation; instead, continuous testing is used to catch errors early. Thus, it is important to develop unit tests immediately when the code is written. Automating test suites is crucial so that continuous testing requires essentially no effort.
Extreme programming emphasizes bottom-up development of code modules for which units tests can be developed. Extreme programming also emphasizes social process on programming teams. Programmers work closely, communicating more verbally than with formal design documents. Specifications are short and programmers rely on common knowledge among the development team. Extreme programming is a good way to quickly development software, but it has been criticized for not producing high-level design documents. In the long run a project developed this way might be hard to manage, because the code is wedded to the engineers.
When writing tests, keep in mind the acronym FIRST:
In addition:
funcname_testi
works fairly well.In the next_weekday
example, it's relatively obvious
which unit tests to write. But for more complicated functions,
it can be difficult to decide what tests and how many to write.
A good strategy for developing unit tests is to write them by looking
only at the specification for the function being tested. The
function's specification actually gives us a lot of information about
what to test:
Testing based solely on the function's specification, as opposed to its implementation, is called black-box testing. The idea is that the function is a box that we cannot see into.
As an example of black-box testing, consider writing a function to compute the amount of income tax owed in the US's progressive income tax brackets. The postcondition for this function might stipulate that the tax rate is determined by the following table:
Income | Tax rate |
$0–$8,925 | 10% |
$8,926–$35,250 | 15% |
$35,251–$87,850 | 25% |
... | ... |
There is structure in this postcondition that suggests values to test. Rather than test every dollar amount from $0 to $87,850, we could test the partitions suggested by the postcondition: one test for the partition $0–$8,925, another test for $8,926–$35,250, etc. This testing strategy is called equivalence-class partitioning. The values that fall at the extreme end of each partition are often particularly fruitful test cases. For example, we might write a test for $0, for $8,925, for $8,926, for $35,250, etc. This testing strategy is called boundary-value analysis.
The specification for a function does not necessarily give us all the information we need to write a good test suite. By looking at the function's implementation, we might be inspired to write additional tests. Testing based on the function's implementation is called white-box testing (or perhaps better, glass-box testing). The idea is that the function is a box that we can see into.
As an example of white-box testing, consider writing a function to determine the real roots (if any) of a quadratic equation ax2 + bx + c = 0. The specification for the function might read as follows:
(* requires: a <> 0.0 * returns: SOME (f1,f2) if there exists an f1 and f2 such that * each fi is a root of ax^2 + bx + c. Or NONE if no real * roots exist. *) let qroots((a:int),(b:int),(c:int)) : (float*float) option = ...In which case, we might be inspired by the precondition and equivalence-class partitioning to generate 27 test cases for when each of the coefficients is negative, zero, or positive. (And that's not a bad idea!) But if we go on to read the function's implementation, we get new inspiration:
let qroots((a:float),(b:float),(c:float)) : (float*float) option = let d = (b *. b) -. (4.0 *. a *. c) in if d < 0.0 then None else Some ((~-. b +. (sqrt d)) /. (2.0 *. a), (~-. b -. (sqrt d)) /. (2.0 *. a))In the body, we see that the value of
d
is relevant
as to whether the then
branch or the else
branch
of an if
expression is evaluated. So it makes
sense to write test cases that explore both branches:
TEST_UNIT "qroots_test1" = assert_true ((qroots(1.0,3.0,-4.0) = Some (1.0,-4.0))) TEST_UNIT "qroots_test2" = assert_true (qroots(3.0,4.0,2.0) = None)
Code coverage is a metric for how much of the code is being
tested by a test suite. There are various ways to measure coverage:
by the number of functions being tested, by whether every branch
of an if
or match
expression is tested,
by whether enough tests exist to cause every Boolean subexpression
to take on both true
and false
, etc.
By writing both tests above, we are increasing code coverage
as measured by branches.
So you've discovered a bug. What next?
Here are a couple tips on how to debug—if you are forced into it—in OCaml.
x
is in the following function:
let inc x = x+1Just add the line below to print that value:
let inc x = let () = print_int(x) in (* added *) x+1The Pervasives module contains many other printing statements you can use.
#trace
directive:
let rec fib x = if x<=1 then 1 else fib(x-1) + fib(x-2) #trace fib;;If you evaluate
fib 2
, you will now see the following output:
fib <-- 2 fib <-- 0 fib --> 1 fib <-- 1 fib --> 1 fib --> 2To stop tracing, use the
#untrace
directive.
Some of the material in these notes is based on slides on debugging by Rob Miller from MIT 6.005.
An extremely effective way to find faults in software systems is code review, in which the programmer goes through the code and documentation to explain how it works and why it is correct. It is the programmer's job to convince an expert review team (usually 1–3 people) who act as skeptical devil's advocates.
There are different ways to use code review effectively:
Code review can be remarkably effective. In one study conducted at IBM (Jones, 1991), code inspection found 65% of the known coding errors and 25% of the known documentation errors, whereas testing found only 20% of the coding errors and none of the documentation errors. The more formal inspection process may be more effective than walkthroughs. One study (Fagan, 1976) found that code inspections resulted in code with 38% fewer failures, compared to code walkthroughs.
Through code review can be expensive, however. Jones found that preparing for code inspection took one hour per 150 lines of code, and the actual inspection covered 75 lines of code per hour. Having up to three people on the inspection team improves the quality of inspection; beyond that, more inspectors doesn't seem to help. Spending a lot of time preparing for inspection did not seem to be useful, either. Perhaps this is because much of the value of inspection lies in the interaction with the coders.