Outline: * PS 2 * how to encode pairs using lambda * how to write a proof -- see Brandon's notes for induction * (sections discussed how to use the substitution model) * pay attention to style guide that's online * PS 3 * it's on the web * manipulate logical formulae (a lot like what happens in hardware design tools.) * Data Abstraction * contracts/specifications vs. implementations * why abstract? * example of data abstraction ------------------------------------------------------------------------ PS2: how to write a proof by induction It's a formula: 1. Write down the property you are to prove as P(n). Your goal is to show P(n) is true (usually an equation) for all n >= 0. 2. Write down the base case -- P(0). Construct a proof that P(0) is true. 3. Now you must show that whenever P(n) is true, P(n+1) is true. That is, P(n) => P(n+1) (read P(n) implies P(n+1)). When you're proving an implication, you write down what you're assuming and label it as the induction hypothesis: (I.H.) Assume P(n) Now you must show P(n+1). You are allowed to use the induction hypothesis as a fact. An example: ------------------------------------------------------------------ Q: Show that sum(i=1,n,i^3) = (sum(i=1,n,i))^2 is true for all n >= 0. Let P(n) =def= sum(i=1,n,i^3) = (sum(i=1,n,i))^2 We will prove by induction on the natural numbers that for all n >= 0, P(n) is true. Base case: n=0 We must show that P(0) =def= sum(i=1,0,i^3) = (sum(i=1,n,i))^2 is true. (1) sum(i=1,0,i^3) = 0 (by definition of sum) (2) sum(i=1,0,i) = 0 (by definition of sum) (3) 0^2 = 0 (4) Thus, by (1), (2), and (3), P(0) is true. Inductive case: We must show that whenever P(n) is true, P(n+1) is true. Assume P(n) is true. That is, assume: (I.H.) sum(i=1,n,i^3) = (sum(i=1,n,i))^2 We must show P(n+1) is true, that is: sum(i=1,n+1,i^3) = (sum(i=1,n+1,i))^2 (1) sum(i=1,n+1,i^3) = sum(i=1,n,i^3) + (n+1)^3 (by definition of sum) (2) = (sum(i=1,n,i))^2 + (n+1)^3 (by I.H.) (3) = (n*(n+1)/2)^2 + (n+1)^3 (by lemma) (4) = (n+1)^2*(n/2)^2 + (n+1)^2*(n+1) (5) = (n+1)^2*((n/2)^2+(n+1)) (factoring out (n+1)^2) (6) = (n+1)^2*(n^2/4 + n + 1) (squaring n/2) (7) = (n+1)^2*((n^2 + 4n + 4)/4) (4n/4 = n & 4/4 = 1) (8) = (n+1)^2*((n+2)^2/4) ((n+2)^2 = n^2+4n+4) (9) = ((n+2)*(n+2)/2)^2 (x^2*y^2/4 = (x*y/2)^2) (10) = (sum(i=1,n+1,i))^2 (by lemma) Therefore, P(n+1) is true. ------------------------------------------------------------------------ Notice that I work from the left-hand-side, using only facts that I know from math, the induction hypothesis, or previously proved lemmas to achieve the right-hand side. Any non-trivial step should be justified to help the reader understand the proof. There are much uglier proofs of this (we saw most of them in your homework.) The first proof you find is not necessarily the proof that you should turn in, any more than the first code you write should be the code that you turn in. Often, you construct a proof backwards, starting from the goal and working back to truths. Do that on scratch paper and turn in the "forward" proof. It should be a beautiful proof that makes it easy for the reader to understand your reasoning. Otherwise, the reader will tell you you're full of bull and that you don't have a proof (even if in your mind, the proof is correct.) ------------------------------------------------------------------------ Why does induction work? If you have a proof of P(0), and you have a proof that P(n)=>P(n+1) then you've proven that P(n) for all n>=0. Why? Well suppose not. In particular, suppose P(i) is false for some numbers. Pick the least number j such that P(j) is false. Well, j cannot be 0 because you have a proof that P(0) is true. So, j must be greater than 0. In particular, j=k+1 for some k. Now P(k) has to be true because j is the _least_ number such that P(j) is false. But you have a proof that whenever P(n) is true, P(n+1) is also true. Since P(k) is true, that implies that P(k+1) = P(j) is true. This is a contradiction since P(j) was assumed to be false! Therefore, if P(0) is true, and P(n)=>P(n+1), then P(n) is true for all n >= 0. ------------------------------------------------------------------------ PS2: how to encode pairs using just lambda: What's the contract: Three operations: pair: creates a new pair out of two expressions first : returns the first value of a pair second: returns the second value of a pair In summary: (first (pair e1 e2)) = e1 (second (pair e1 e2)) = e2 This is the _contract_ or specification that a user of pairs expects to hold true, regardless of how pair, first, and second are implemented. As an implementor of an abstraction, it pays to _not_ tell a user how you implemented something, rather, just describe the contract. * development of code that uses the abstraction can happen in parallel * change the implementation later when a new algorithm or better data structure is discovered, as long as the contract is maintained. Good programming languages provide a means to enforce abstraction. Bad ones do not. Examples: * Year 2K bug: If we had an abstract type of years, with operations on them, then we could change the implementation from 2-digits to 4-digits. * In C, strings are not abstract: just use array of characters. And characters are not abstract. They're just bytes. So string = array of char = array of byte, and much code took advantage of this. For instance, to calculate "the length" of a string, you could call a function: void length(char *buf) { int i=0; while (buf[i] != 0) i=i + 1; return(i); } byte_length("foo") We could think of length as returning either the # of characters or the number of bytes. But as we move from ASCII to Unicode (8-bit to 16-bit), there are two different kinds of sizes: the size in the number of characters and the size in the number of bytes (2* # of chars). Last semester, we heard a talk about how a certain Microsoft product had on the order of 700 bugs related to just this. Consider for example: void copy(string x) { int len = length(x); // # of bytes string y = (string)malloc(len); int i; for(i=0; i (first ((lambda (x y) (lambda (f) (f x y))) e1 e2)) => (first (lambda (f) (f e1 e2))) => ((lambda (p) (p (lambda (x y) x))) (lambda (f) (f e1 e2))) => ((lambda (f) (f e1 e2)) (lambda (x y) x)) => ((lambda (x y) x) e1 e2) => e1 What about (second (pair e1 e2)) = e2? (second (pair e1 e2)) => (second ((lambda (x y) (lambda (f) (f x y))) e1 e2)) => (second (lambda (f) (f e1 e2))) => ((lambda (p) (p (lambda (x y) y))) (lambda (f) (f e1 e2))) => ((lambda (f) (f e1 e2)) (lambda (x y) y)) => ((lambda (x y) y) e1 e2) => e2 So the contract is satisfied. Here's another solution involving "if", "#t", and "#f" which I showed how to eliminate earlier: (define pair (lambda (x y) (lambda (b) (if b x y)))) (define first (lambda (p) (p #t))) (define second (lambda (p) (p #f))) You should check for yourself that this satisfies the contract. Can you think of other solutions? ---------------------------------------------------------------------- DATA ABSTRACTION: * Contracts and Implementations. * WHAT versus HOW This is probably the most important single programming technique you'll learn. Ever. * Good data abstractions can save you time writing code. * For debugging, maintaining, and changing code, data abstraction is absolutely critical. Without it, programs are very hard to understand/modify (even by the original author!). * You can do it in any halfway-decent language. * Critical for writing any LARGE program So far we've used only built-in primitive types of objects: - , , , , etc. But suppose you want some other data structure? e.g., stack or queue - Most good algorithms use some other kinds of data. - No language can have *all* the built-in types you could ever want. - Data Abstraction: building new data types suitable for an application - a way of "extending" the language. - for instance, we showed how, even if you only had lambda's, you could extend Scheme with if, multi-argument functions, pairs, numbers, etc. Scheme is particularly good at allowing you to define application-specific abstractions or datatypes. What's important about a type? * There are some *operations* on it which do the right thing. That is, there's a *specification* or *contract* about how the type behaves. * Anything meeting that contract is OK as an implementation of the data type We have already seen the concepts of abstraction and specification for procedures: - e.g, different multiplication procdedures times-1, times-2, fast-times, etc meet the same contract (have the same INPUT/OUTPUT behavior): a ---> +-------+ | TIMES |----> ab b ---> +-------+ That's WHAT they do. HOW they do it is totally different, but that doesn't matter as long as they meet the specification. Contract/Specification = WHAT the program does * Black box description Implementation = HOW the program does it We're going to do the same for data: * Give a specification * Hide the implementation. This gives us two BIG advantages: * We can think about the data clearly. * We can change the implementation if we ever need to. This is a real win: * We can throw together a nice simple (and inefficient) implementation of a datatype - Fast programming - Get the rest of the program working - Find out where the slow spots are * When we need to, we can replace it with a more complicated but faster one. It's called an ABSTRACTION BARRIER: * A few things are visible outside - You (and others) can use them freely. * The rest is hidden - Nobody depends on it (just the external stuff) so you can change it freely. ---------------------------------------------------------------------- We're going to start with a simple abstract data type, rational numbers: 1/2 + 3/4 = 5/4 2/3 * 3/4 = 1/2 Note: 5/4 is NOT the same as 1.25 * Different types. * 1/3 is very different from 0.33333333. - Multiply by 3: (* 1/3 3) is 1, (* 0.33333333 3) is 0.9999999, which isn't quite 1. The rules for adding and multiplying rationals are familiar: >>> Keep these on the board <<< a/x + b/y = (ay + bx)/xy a/x * b/y = ab / xy We will define an abstract data type called which represents rational numbers and supports some operations and tests. CONSTRUCTOR (make-rat n d) given n,d s, d not = 0, returns a ACCESSORS (numer r) takes a , returns an (denom r) takes a , returns an with the following specification: (numer (make-rat n d)) n ---------------------- = --- (denom (make-rat n d)) d with the usual rule for equality of rational numbers: n1/d1 = n2/d2 if n1*d2 = n2*d1. Note the specification does NOT say (numer (make-rat n d)) = n or (denom (make-rat n d)) = d. What operations and tests do we typically want to do with rational numbers? ADDITION (rat-add r1 r2), given two s returns a MULTIPLICATION (rat-mul r1 r2), given two s returns a EQUALITY TEST (rat-eq r1 r2), given two s returns a boolean INEQUALITY TEST (rat-leq r1 r2), given two s returns a boolean with specifications (rat-eq (make-rat n1 d1) (make-rat n2 d2)) => #t if the rational numbers n1/d1 and n2/d2 are equal, that is, if n1*d2=n2*d1, => #f otherwise (rat-leq (make-rat n1 d1) (make-rat n2 d2)) => #t if n1/d1 <= n2/d2 as rational numbers, => #f otherwise (rat-eq (rat-add (make-rat n1 d1) (make-rat n2 d2)) (make-rat n3 d3)) => #t if n1/d1 + n2/d2 = n3/d3 as rational numbers (Note: this does NOT say that d3 = d1*d2 and n3 = n1*d2 + n2*d1 !), => #f otherwise (rat-eq (rat-mul (make-rat n1 d1) (make-rat n2 d2)) (make-rat n3 d3)) => #t if n1/d1 * n2/d2 = n3/d3 as rational numbers, => #f otherwise This specification gives us some flexibility in the implementation. We'll see a few different implementations. But we don't need to know the implementation to work with the data type --it's enough to know the specification. It makes perfectly good sense to write a SPECIFICATION or CONTRACT that you don't know how to implement. * Get used to it, * We'll do it repeatedly * And eventually you'll write large programs using that method. Let's get back to earth and actually implement the abstract data type. ---------------------------------------------------------------------- We'll implement s using cons cells (pairs) which in turn we can implement using cons cells, lambdas, etc. * Basically an ordered pair a la mathematics. CONSTRUCTOR: cons ACCESSORS: head, tail The specification is: (head (cons v1 v2)) => v1 (tail (cons v1 v2)) => v2 ---------------------------------------------------------------------- We can represent rationals as pairs of integers. (define (make-rat n d) (if (and (number? n) (number? d) (not (= d 0))) (cons n d) (error "make-rat expects numbers with denom not zero"))) The error function here terminates the program because of some exception situation -- in this case we called make-rat with 0. (In practice, we should really check that n and d are numbers.) You could use cons directly, i.e. (define make-rat cons) but this has serious disadvantages, as we'll see. Similarly, can define numerator and denominator: (define (numer r) (head r)) (define (denom r) (tail r)) or just (define numer head) (define denom tail) Its easy to see that this meets the spec: (numer (make-rat x y)) => (numer (if (and (number? x) (number? y) (not (= d 0))) (cons x y) (error ..))) => (numer (if (and #t (number? y) (not (= d 0))) (cons x y) (error ..))) => (numer (if (and #t #t (not (= d 0))) (cons x y) (error ..))) => (numer (if (and #t #t #t) (cons x y) (error ..))) => (numer (if #t (cons x y) (error ..))) => (numer (cons x y)) => (first (cons x y)) => (head (cons x y)) => x Similarly, (denom (make-rat x y)) evaluates to y. So, (numer (make-rat x y)) x ---------------------- = --- (denom (make-rat x y)) y as the specification demanded. Implementing things this way, our rationals are actually of type cons-cell rather than of their own type, . In general it is better to use DEFCLASS when defining abstract data types. In that way there is a distinct type. That way, we can check that when performing numer or denom, the thing that we're passing in is a instead of any old pair. Then we can _assume_ that since make-rat is the only way to make a that the elements are numbers and the denominator is non-zero. But some languages don't support it. Scheme does, and we'll cover this later. ---------------------------------------------------------------------- Here's how we might implement the arithmetic operations and tests. (define (rat-add r1 r2) (let ((n1 (numer r1)) (d1 (denom r1)) (n2 (numer r2)) (d2 (denom r2))) (make-rat (+ (* n1 d2) (* n2 d1)) (* d1 d2))))) (define (rat-mul r1 r2) (let ((n1 (numer r1)) (d1 (denom r1)) (n2 (numer r2)) (d2 (denom r2))) (make-rat (* n1 n2) (* d1 d2))))) (define (rat-eq r1 r2) (let ((n1 (numer r1)) (d1 (denom r1)) (n2 (numer r2)) (d2 (denom r2))) (= (* n1 d2) (* n2 d1))))) (define (rat-leq r1 r2) (let ((n1 (numer r1)) (d1 (denom r1)) (n2 (numer r2)) (d2 (denom r2))) (if (>= (* d1 d2) 0) ;;if d1 and d2 have the same sign (<= (* n1 d2) (* n2 d1)) (>= (* n1 d2) (* n2 d1)))))) Note how rat-add and rat-mul tear down their arguments using the ACCESSORS, then build up their result using the CONSTRUCTOR. Also note how rat-eq and rat-leq tear down their arguments using the ACCESSORS, then apply the appropriate tests on the constituent parts. These implementation details are hidden in the definitions, and users do not have to know how they work. ---------------------------------------------------------------------- Now (rat-eq (make-rat 10 8) (make-rat 5 4)) => #t and this is correct, since 10/8 = 5/4. But (numer (make-rat 10 8)) => 10 (denom (make-rat 10 8)) => 8 Suppose we don't like this and want to represent rationals in lowest terms. Doing this would allow us to save time in the equality test; we could just compare numerators and denominators. We could have rat-eq reduce to lowest terms. But that would be just as inefficient. We could have rat-add, rat-mul, etc. do it after every operation. * That's a lot of work * What if we forget somewhere? If we want to reduce to lowest terms, the right place to do it is in make-rat, since that's the only place s get created. We only do it once for each we create. (define (make-rat n d) (if (and (number? n) (number? d) (not (= d 0))) (let ((g (gcd n d))) (cons (/ n g) (/ d g))) (error "..."))) Note that this still satisfies the spec: (numer (make-rat x y)) x/g x ----------------------- = ----- = --- (denom (make-rat x y)) y/g y Now (numer (make-rat 10 8)) => 5 (denom (make-rat 10 8)) => 4 With the new def of make-rat, we can always depend on s being in lowest terms. You can even make it part of the specification if you like. Then equality testing becomes simpler: (define (rat-eq r1 r2) (and (= (numer r1) (numer r2)) (= (denom r1) (denom r2))))) Or, we can just leave rat-eq as is for now and change it later if we like, since the spec is still satisfied. The point is: ABSTRACTION ALLOWED US TO MAKE THIS CHANGE EASILY. ---------------------------------------------------------------------- We could simplify rat-leq if we knew the denominator would always be positive. Let's make it so. Again, the best place to make the change is in make-rat: (define (make-rat n d) (if (and (number? n) (number? d) (not (= d 0))) (let* ((n2 (if (> d 0) n (- n))) (d2 (if (> d 0) d (- d))) (g (gcd n2 d2))) (cons (/ n2 g) (d2 g))) (error "..."))) Now since denominators are always guaranteed to be positive, we can simplify rat-leq: (define (rat-leq r1 r2) (let ((n1 (numer r1)) (d1 (denom r1)) (n2 (numer r2)) (d2 (denom r2))) (<= (* n1 d2) (* n2 d1))))) We don't need to change anything else, because the spec is still satisfied. ---------------------------------------------------------------------- Suppose that we had explicitly used `cons' instead of `make-rat'. * We would also have used `pair' for everything else Scheme uses it for - Which is a lot * Then we'd have to go look at every single use of pair in the program, - See if it looks like a make-rat - Add a gcd computation * We'd surely miss some or get some that aren't make-rats, and it would be a COSMIC HORROR. But ABSTRACTION made it easy to make these changes. The trick: * Build your program with layers of abstraction. * HIDE the implementation. * Then your life will be a LOT easier later when you need to change it. * You just change it in one place. Think of rat-add and rat-mul as if they were Scheme primitives -- they don't look any different, anyways -- and use 'em freely. ---------------------------------------------------------------------- Today's words and concepts: * Data Abstraction * contract (WHAT) / specification (HOW) * pair