We have identified two pieces of implementation-side specification: the abstraction function and the representation invariant. They need to be provided in every module implementation so that implementers can use local reasoning to figure out whether the code they are reading or writing is correct. Ordinary function specifications that appear in the module interface (signature) are a contract between the implementer and the user of the module. By contrast, the abstraction function and rep invariant are a contract between the implementer and other implementers or maintainers of the code.
Suppose that we added a nondeterministic operation to our NATSET signature:
(* choose(s) is an element of s. * Checks: s non-empty *) val choose: set -> int
Here is a possible implementation for the NatSet
structure:
fun choose(s: set) = case s of [] => raise Fail "empty" | h::t => h
Is this a correct implementation? We said that an implementation is correct if it satisfies a commutation diagram. However, this specification is nondeterministic, which complicates our thinking about the diagram. Rather than mapping a set to a natural number, this specification maps a set to a set of natural numbers: all the possible natural numbers that might be returned according to the spec. The implementation of course returns the very first number in the list that represents the set.
Let's write choose to mean the abstraction operation described by the
specification, and choose
to mean the actual operation implemented
above. If choose is applied to the set {1,2,3}, the possible results are
1, 2, and 3. If choose is applied to a representation of {1,2,3}, the possible
results are also 1, 2, and 3, since we don't know which representation of
{1,2,3} we got. If the representation were [1,2,3]
, then choose
would return 1
. If the representation were [2,1,3,2,3]
,
which also represents {1,2,3}, then choose
would return 2
.
Regardless of the representation of {1,2,3}, choose
always returns
one of the values (1,2,3) that choose does. That is why we say that choose
is a correct implementation of its specification, choose.
Let AF be the abstraction function that maps concrete values to abstract
values. Let f be the specification function that maps an
abstract value to a set of abstract outputs. Let f
be the actual
implementation that maps a single concrete value to an output. Let x
be a concrete input. Then the implementation is correct as long as AF(f
(x
))
is a subset of f (AF(x
)):
f
(x
)) ⊆
f (AF(x
))
(if you see a little box it should really be Í)That is, the commutation holds with the proviso that the specification may permit more behaviors than the implementation actually exhibits, as illustrated in this figure:
In the case of our example function choose
applied to some
concrete list h::t
, the abstract view is some set AF(h::t)
=
{h} U AF(t
)
. The set of values permitted by the specification is this entire set; the
set of values produced by the implementation is just {h},
which is clearly a subset.
A rep invariant is a condition that is intended to hold for all values of an abstract type. The abstraction barrier ensures that the module is the only place that the rep invariant can be broken; it is the only place that the concrete type of the values is known.
Therefore, in implementing one of the operations of the abstract data type, it can be assumed that any arguments of the abstract type satisfy the rep invariant. This assumption restores local reasoning about correctness, because we can use the rep invariant and abstraction function to judge whether the implementation of a single operation is correct in isolation from the rest of the module. It is correct if, assuming that:
we can show that
The rep invariant makes it easier to write code that is provably
correct, because it means that we don't have to write code that works
for all possible incoming concrete representations--only those that
satisfy the rep invariant. This is why NatSetNoDups.union
doesn't have to
work on lists that contain duplicate elements, for example. On return
from each operation there is a corresponding responsibility to
produce only values that satisfy the rep invariant, ensuring that the
rep invariant is in fact an invariant.
Let us consider the rep invariant for the vector implementation of NATSET
.
There is some question about what we should write. One possibility is to write
the strongest possible specification of the possible values that can be created
by the implementation. It happens that the vector representing the set never has
trailing false
values:
structure NatSetVec :> NATSET = struct type set = bool vector (* Abstraction function: the vector v represents the set of all natural numbers i such that sub(v,i) = true Representation invariant: the last element of v is true *) val empty:set = Vector.fromList []
This representation invariant describes an interesting property of the
implementation that may be useful in judging its performance. However, we don't
need this rep invariant in order to show that the implementation is correct. If
there were no rep invariant, we could still argue that the implementation works
properly. All of the operations of NatSetVec
will work even if sets
are somehow introduced that violate the no-trailing-false property. It is not
necessary to have the rep invariant in order to argue that the operations of NatSetVec
are correct according to the 4-point plan above.
Further, a strong rep invariant is not always the best choice, because it restricts future changes to the module. We described interface specifications as a contract between the implementer of a module and the user. A rep invariant is a contract between the implementer and herself, or among the various implementers of the module, present and future. According to assumption 2, above, operations may be implemented assuming that the rep invariant holds. If the rep invariant is ever weakened (made more permissive), some parts of the implementation may break. It makes sense to avoid unnecessarily strengthening the invariant, because as the code evolves, it might later be necessary to weaken it -- and in that case, the entire module might have to be re-examined for correctness.
One of the most important purposes of the rep invariant is to document exactly what may and what may not be safely changed about a module implementation. A weak rep invariant forces the implementer to work harder to produce a correct, efficient implementation, because less can be assumed about concrete representation values, but conversely it gives maximum flexibility for future changes to the code.
When implementing a complex abstract data type, it is often helpful to write
a function internal to the module that checks that the rep invariant holds. This
function can provide an additional level of assurance about your reasoning the
correctness of the code. By convention we will call this function repOK
;
given an abstract type (say, set
) implemented as a concrete type
(say, int list
) it always has the same specification:
(* Returns whether x satisfies the representation invariant *) fun repOK(x: int list): bool = ...
The repOK
can be used to help us implement a module and be sure
that each function is independently correct. The trick is to bulletproof
each function in the module against all other functions by having it apply repOK
to any values of the abstract type that come from outside. In addition, if it
creates any new values of the abstract type, it applies repOK
to
them to ensure that it isn't breaking the rep invariant itself. With this
approach, a bug in one function is less likely to create the appearance of a bug
in another.
A more convenient way to write repOK
is to make it an identity
function that
raises an exception if the rep invariant doesn't hold. Making it an identity
function lets us conveniently test the rep invariant in various ways, as shown
below.
(* The identity function. Checks: whether x satisfies the rep. invariant. *) fun repOK(x: int list): int list = ...
Here is an example of we might use repOK
for the NatSetNoDups
implementation of sets given in lecture:
structure NatSetNoDups :> NATSET = struct type set = int list (* AF: the list [a1,...,an] represents the set {a1,...,an}. * RI: list contains no negative elements or duplicates. *) fun repOK(s: int list): int list = case s of [] => s | h::t => if h < 0 orelse contains'(h, repOK t) then raise Fail "RI failed" else s and contains'(x:int,s:int list) = List.exists (fn y => y=x) s val empty :set = repOK [] fun single(x:int):set = repOK [x] fun contains(x:int, s:set):bool = contains'(x, repOK s) fun union(s1:set, s2:set):set = repOK (foldl (fn (x,s) => if contains'(x,s) then s else x::s) (repOK s1) (repOK s2)) fun size(s:set):int = length(repOK s) end
Here, repOK
is implemented using contains'
rather than the function contains
, because using contains would
result in a lot of extra repOK
checks. This is a common
pattern when implementing a repOK
check.
Calling repOK
on every argument can be too expensive for
the production version of a program. The repOK
above is quite
expensive (though it could be implemented more cheaply). For production code it
may be more appropriate to use a version of repOK
that only checks
the parts of the rep invariant that are cheap to check. When there is a
requirement that there be no run-time cost, repOK
can be changed to an identity function (or
macro) so the compiler optimizes away the calls to it. It is a good idea to keep
around the full code of repOK
(perhaps in a comment) so it can be
easily reinstated during future debugging.
For those who really want to be careful about enforcing and checking the rep invariant, there is a way to use the SML type system to impose more discipline. The trick is to define the abstract type as a singleton datatype, e.g.
datatype set = Rep of int list
This definition means that you cannot treat a list accidentally as a set; you
have to write case...of Rep(lst)
to convert a list to its abstract view.
If the Rep
constructor is only used within a special function up
that performs the mapping of the abstraction function, then you cannot construct
a set without checking that it satisfies the rep invariant:
fun up(l: int list): set =Rep
(repOK l) fun down(s: set): int list = case s ofRep
(l) => repOK l
The down
function is used to map the other way and obtain the
concrete representation for an abstract value while checking that the
representation satisfies the invariant.
The previous implementation can now be rewritten to use up
and down
in the various places that repOK
was used formerly:
structure NatSetNoDups2 :> NATSET = struct datatype set = Rep of int list (* AF: Rep [a1,...,an] represents the set {a1,...,an}. * RI: list contains no negative elements or duplicates. *) fun repOK(s: int list): int list = case s of [] => s | h::t => if h < 0 orelse contains'(h, repOK t) then raise Fail "RI failed" else s and contains'(x:int,s:int list) = List.exists (fn y => y=x) s fun up(l: int list): set = Rep(repOK l) fun down(s: set): int list = case s of Rep(l) => repOK l val empty :set = up [] fun single(x:int):set = up [x] fun contains(x:int, s:set):bool = contains'(x, down s) fun union(s1:set, s2:set):set = up (foldl (fn (x,s) => if contains'(x,s) then s else x::s) (down s1) (down s2)) fun size(s:set):int = length(down s) end