Recitation 4
Datatype pitfalls, polymorphism, lists

Datatype Constructors

(* Note that by convention, data constructors (Penny, Nickel, etc.)
 * start with Capital letters.  (There are a few exceptions such as
 * true and false.) *)

type coin = Penny | Nickel | Dime | Quarter;;

(* Note that by convention, variables (e.g., value, c) start
 * with lower-match letters.
 *)

let value(c:coin):float =
    match  c with
        Penny -> 0.01
      | Nickel -> 0.05
      | Dime -> 0.10
      | Quarter -> 0.25;;

(* What happens when we call this function?  Why does
 * OCaml complain that there are too many patterns?
 *)

let bad_value(c:coin):float =
    let  penny = Penny in
      match  c with
            penny -> 0.01
          | Nickel -> 0.05
          | Dime -> 0.10
          | Quarter -> 0.25;;

(* After all, isn't this equivalent to the following? *)
let bad_value2(c:coin):float =
    let penny = Penny in
	if  (c = penny) then 0.01
	else if  (c = Nickel) then  0.05
	else if  (c = Dime) then  0.10
	else if  (c = Quarter) then  0.25
       else raise (Failure("impossible!"));;

(* NO!!!!!!!!
 *
 * It's more like: *)

let bad_value2(c:coin):float =
    let penny = Penny in
      match  c with
	    random_variable_name -> 0.01
          | Nickel -> 0.05
          | Dime -> 0.10
          | Quarter -> 0.25;;


(* or even *)
let bad_value3(c:coin):float =
    let penny = Penny in
       match  c with
            _ -> 0.01
          | Nickel -> 0.05
          | Dime -> 0.10
          | Quarter -> 0.25;;


(* WHY?  In an expression of the form:
      match e with
        ...
      | id -> e2
        ...

   We have to be careful:  If id is a data constructor (e.g.,
   Penny), then we're comparing e to see if it has the same value
   as the data constructor.  But if id is not a data constructor
   (e.g., penny -- even if it's value is a data constructor),
   then we're declaring a NEW, FRESH, variable and binding it to
   the value of e so that e2 can refer to it via id.  So any
   patterns below the identifier are redundant, because this
   match will never fail.

   Moral:  pattern matching sometimes USES identifiers (namely
   those that are data constructors) and sometimes DECLARES
   or BINDS identifiers (namely those that are not data
   constructors.) This is why it's a good idea when writing
   variables in a pattern to add ": type" to establish clearly
   that this is intended to be a binding occurrence.

   Similarly, id is bound (i.e, declared) in the declaration:

       let id = e    (i.e., a value declaration)

   or
       let id(...):t = e    (i.e., a function declaration)

   or
       let f(...,id:t,...):t = e   (i.e., a function argument)

   In contrast, for an expression of the form:
       if (e1 = id) then ...

   or
       id + id

   or just about anything else, the occurrence of the identifier
   id is a USE or FREE occurrence that evaluates to the value bound
   to the nearest binding of id, be it from a function argument,
   function declaration, value declaration, or pattern match.

   That is, OCaml does not look to see that penny was previously
   declared and bound to Penny in the pattern match.  So why
   not?  After all, it can test to see if "c" is equal to
   the data constructors Nickel, Dime, Quarter, etc.  Why
   can't it check to see if c is equal to penny or
   random_variable_name?

   Short answer:  the designers could've done things this
   way (Prolog tries to), but there are good reasons why they
   didn't. 

   Longer answer:

   (1) We'd need some way to distinguish in patterns when we
       want to USE the value of a variable instead of BINDING
       the variable to a value.
   (2) By limiting the expressive power of tests, we can check
       for exhaustiveness or overlapping patterns.  If we allow
       arbitrary tests, then we can't do that (remember the if-example?)
   (3) It's not clear what equals should do on some types.
       Testing the (mathematical) equality of two functions
       is impossible, as we'll see later in the course.
   (4) By limiting the tests, we enable certain compiler
       optimizations.  In particular, an ML compiler will
       avoid testing something more than once for a given
       set of patterns.
*)

(******************************************)
(* Scope, definitions, and uses           *)
(******************************************)

(* Q:  what value does the following function yield when
       we pass it two arguments, say (2,3)?
*)

(*1*) let f((x:int),(y:int)):int =
(*2*)       let x = y in
(*3*)       let y = x in
(*4*)       let (y,x) = (x,y*y) in
(*5*)               match (y,x) with
(*6*)                   (x,1) -> 0
(*7*)                 | (x,y) -> x;;

(* Figuring this out isn't easy:  Let's refer to the different
 * occurrences of x and y by their line numbers.  So x.1 means
 * the variable x occurring on line 1.
 *
 * (1) introduces three new variables:  f.1,x.1, and y.1. 
 *     Initially, x.1 = 2, and y.1 = 3.
 * (2) introduces a new variable x.2 (different from x.1)
 *     and binds it to the value that was in y.1 (namely 3).
 * (3) introduces a new variable y.3 and binds it to the value that was
 *     in x.2 (namely 3).
 * (4) introduces two new variables y.4 and x.4 and binds them
 *     to the values x.2 (= 3) and y.3 * y.3 (= 3 * 3 = 9).
 * (5) does a pattern match on the tuple (y.4,x.4) = (3,9).
 * (6) introduces a new variable x.6 within the pattern (x,~1) and
 *     attempts to match this pattern against (3,9).  The x.7 matches
 *     the 3, but the 9 fails to match ~1.  So we proceed to the next
 *     match.
 *)

Using Polymorphism

The list type

Because lists are so useful, ML provides some a builtin parameterized list type called list. It acts just like the List type that we defined in lecture except that the names of the constructors are changed. The constructor nil makes an empty list (compare to Nil) and the constructor :: builds a new list by prepending a first element to another list (compare to Cons). Thus, list could be declared as:

type 'a list = [] | :: of 'a * 'a list;;

The constructor :: is an infix operator, which is notationally convenient. The OCaml interpreter knows how to print out lists nicely as well. The empty list is printed as [], and non-empty lists are printed using brackets with comma-separated items. In fact, these forms may be used to write lists as well. Note that nil is a polymorphic value; it is the empty list for all types T list. In fact, it is given the polymorphic type 'a list. Here are some examples that show how lists work:

[];;
- : 'a list = []
let it=2::[];;
val it : int list = [2]
let both = 1::it;;
val both : int list = [1; 2]
let both2=
match both with
  x :: lst -> lst 
| [] -> [];;
val both2 : int list = [2]
let both3=
match both2 with
  x :: lst -> lst 
| [] -> [];;
(* we don't "recover polymorphism" here; it would be unsafe in general *)
val both3 : int list = []
both = 1::2::[];; 
(* we can test lists for equality if we can test their elements *)
- : bool = true
match both with
  [x; y] -> x + y (* we can use bracket notation for patterns too. *)
| _ -> 0;;
- : int = 3
[[]];;
- : 'a list list = [[]]

Just like with types, we have to make sure that we write exhaustive patterns when using match:

match ["hello"; "goodbye"] with
s :: _ -> s ^ " hello";;
Warning P: this pattern-matching is not exhaustive.
Here is an example of a value that is not matched:
[]

Built-in lists come with lots of useful predefined OCaml Library functions, such as the following and many more:

val List.length : 'a list -> int
val @ : ('a list * 'a list) -> 'a list		(* append two lists *)
val List.hd : 'a list -> 'a
val List.tl : 'a list -> 'a list
val List.nth : ('a list * int) -> 'a

Of course, all of these functions could also be easily implemented for the List type that we defined ourselves!

Multiple type parameters

We saw two related features of OCaml in class: the ability to produce polymorphic values whose type mentions a type variable and the ability to parameterize types with respect to an arbitrary type variable. As we have seen, polymorphic values are typically function values but other polymorphic values exist, such as nil (and also Nil, as we defined it). Datatypes can actually be parameterized with respect to multiple type parameters; for example the following type, ortype, is a type-level function that accepts a pair of types and yields a new type:

type ('a, 'b) ortype = Left of 'a | Right of 'b | Both of 'a * 'b;;
Left(2);;
- : (int, 'a) ortype = Left 2
Right("hi");;
- : ('a, string) ortype = Right "hi"
Both(true, "a");;
- : (bool, string) ortype = Both (true, "a")

Note that the values Left(2) and Right("hi") are still polymorphic with respect to one type! Note also that ML always starts counting type variables from 'a, hence the - : (int, 'a) ortype rather than - : (int, 'b) ortype in the match for Left(2) above.

The option parameterized type

Another important standard parameterized type is option, which represents the possible presence of a value. It is defined as follows:

type 'a option = Some of 'a | None;;