Problem Set 2: Even More SML

Issued: September 10, 2002
Due: September 19, 1:00AM

Instructions: you will do this problem set by modifying the source files in ps2.zip and submitting the program that results. The program that you submit must compile without any warnings. Programs that do not compile or compile with warnings will receive an automatic zero.

Beginning with this project, you'll be using the SML Compilation Manager (CM) to compile all of your files together. It will also let you get at some of the libraries that were previously inaccessible. It's actually very easy, but the directions are a bit complicated at first. There are several things to do:

Make sure SML is running in the same directory as your code (in particular, file sources.cm);
If you're using SML inside Emacs, this should occur by default. Simply launch Emacs, open any of the SML files in the right directory (say, part1.sml, for example), and then start SML.
If you need to change the directory inside SML, do the following: OS.FileSys.chDir "C:\\Courses\\CS312\\PS2"; or whatever the path is to your PS 2 files. Note that on Windows you have to use double backslashes (or single forward slashes instead).
On some versions of Windows (95 or 98, probably ME too), for each project, you will need to open up the sources.cm file (in Emacs) and change the paths from smlnj-lib.cm and ml-yacc-lib.cm to "C:\\sml\\lib\\smlnj-lib.cm" and "C:\\sml\\lib\\ml-yacc-lib.cm", respectively. Do not forget to customize these path so that they correspond to your installation.
Although you can still copy/paste or use the Emacs menu's Send Region (or even Send Buffer) commands to test small functions, to fully compile your code you'll want to get in the habit of issuing the command: CM.make();. There are at least two reasons for this:
- We will test your code in this fashion, so you want to be sure it works (since failing to compile, or compiling with warnings, is cause for getting a zero under course policy).
- When you use CM, the compiler can give you better (sometimes much better!) error messages.
- To make this process efficient, CM often hides old Warnings when you repeatedly do CM.make(). To be sure your code has no warnings, remove the CM directory that will be created under ps2, and then re-issue CM.make()

Update 9/16

Clarifications on the order of arguments:

Part 1/Question 1: The first argument of insert is the key, while the second one is the value.
Part 3/Question 8: The first argument of match is the regular expression, the second one is the tested string.

Update 9/15

Clarification for insert and lookup: however you do this problem, your solution must satisfy the following invariants:

lookup "foo" (insert "foo" "bar" x) returns "bar" where x is a (string * string) list
lookup "foo" x returns raise Fail (exception) iff the key "foo" is not in x and x is a (string * string) list

Part 1: Dictionaries, Lists, and Polymorphic Fun

We often need a data structure to group related pairs of values. Such a data structure is called a dictionary. Operations on a dictionary include inserting a key/value pair and looking up the value that corresponds to a given key.

We can implement dictionaries in ML by using a list of pairs, i.e. an associative list. For the moment, assume that the keys and values are both strings.

To insert into a associative list, we can simply add a key/value pair to the head of the list. By appending to the head and by searching the list from its head toward its tail, we have an easy way to "overwrite" old entries with new ones. Write function insert with type string -> string -> (string * string) list -> (string * string) list that implements the insertion method described above.
Write function lookup with type string -> (string * string) list -> string. Given a key and an associative list, this function returns the value corresponding to that key. For example, given a phonebook, we can retrieve Garfield's phone number with expression lookup "Garfield" phonebook. If the key is not present in the list, your code should raise a Fail exception. (Yes, our phone numbers can contain dashes and special characters like * and #, so they can't be simple integers.)
Why restrict ourselves to strings? We can think of other kinds of key/value pairs. For example, we may want a data structure that maps Cornell ids to student names. Instead of rewriting the lookup function for each possible type of key/value pair, we will create its polymorphic version, polymorphicLookup, whose type will be 'a -> ('a * 'a -> bool) -> ('a * 'b) list -> 'b.

Here the function of type 'a * 'a -> bool is a comparison function that returns true if and only if both arguments are equal.
Use your polymorphicLookup to write function lookupCUID : int -> (int*string) list -> string that takes an id number and an associative list of (int * string) key/value pairs and returns the corresponding string.

Part 2: More Recursive Types

Consider the following declaration for nested lists:

datatype 'a llist = LList of 'a llist list | Elem of 'a

A nested list thus consists of an element of unspecified type, or of a list of nested lists. Consider some examples:

- List([]);
- Elem(3);
- LList([Elem(1), LList([Elem(2), LList([]), Elem(3)]), Elem(4)]);

Write function flatten that takes a nested list as its unique argument and returns a "flat" list of all elements in the nested list. For example: flatten(LList([Elem(1), LList([Elem(2), LList([]), Elem(3)]), Elem(4)])) returns [1, 2, 3, 4]. Note that the elements in the resulting list are in the same order as in the nested list. Your solution must have the same property.
Write function depth that takes a nested list as its unique argument and return the highest level of nesting in the list. For example: depth(Elem(5))= 0, depth(LList([])) = 1, depth(LList([Elem(1), LList([Elem(2), LList([]), Elem(3)]), Elem(4)])) = 3.
Write function cutOff that takes a non-negative integer n, and a nested list, and returns only those parts of the nested list that are at a level equal to or less than n in the original nested list. Thus we must have depth(cutOff(n, lst)) = Int.min(n, depth(lst)). Some examples:
```
cutOff(0, Elem(3)) = Elem(3)
cutOff(1, LList([Elem(1), LList([Elem(2)])])) = LList([Elem(1)])
cutOff(100, LList([Elem(1), LList([Elem(2)])])) = LList([Elem(1), LList([Elem(2)])])
```
Given the above, what should cutOff(0, List([])) return? Well, a nested list of depth 0, i.e. an Elem. But there is no obvious value we could use to build an Elem, and we can't easily pick one (say, 0), as the nested list is a polymorphic type. Rather, this case and all the analogous ones should end with the function indicating a failure. This can be achieved by evaluating the expression raise Fail "message" for all the cases identified as impossible.

Part 3: Pattern Matching & Regular Expressions

A regular expression is a string containing a mixture of ordinary characters and operators. There are only three operators: +, *, and ?. All the other characters are ordinary. Except for ?, operators must immediately follow an ordinary character. Operators do no appear in ordinary strings (i.e. in strings that are not regular expressions).

Regular expressions encode sets of strings; a given string matches the regular expression if and only if the respective string is in the set of strings denoted by the regular expression.

We define the meaning of regular expressions recursively:

An ordinary character matches only itself.
Operator ? matches any single character.
If operator + immediately follows character c, then regular expression c+ matches one or more occurences of character c.
If operator * immediately follows character c, then regular expression c* matches zero or more occurences of character c.
Given regular expressions r₁ and r₂, the set of strings that match regular expression r₁r₂ consists of all strings s such that string s₁ matches r₁, string s₂ matches r₂, and s = s₁ ^ s₂.

You can assume that all regular expressions are correct, and that ordinary strings do not contain any regular expression operators.

Consider the following examples:

Regular Expression	String	Matches?
"abc"	"abc"	yes
"a?c"	"abc"	yes
"ab"	"abc"	no
"a+b?c*"	"aaaabx"	yes
"a+b?c*"	"abqcccc"	yes
"a+b?c*"	"bx"	no

Write a function match that takes a regular expression and an ordinary string, and returns a boolean indicating whether the string matches the regular expression or not. Make sure that you do not forget to check degenerate cases!

Prepared by Tibor and Abhishek for CS312 Fall 2002.

Problem Set 2: Even More SML

Issued: September 10, 2002 Due: September 19, 1:00AM

Part 1: Dictionaries, Lists, and Polymorphic Fun

Part 2: More Recursive Types

Part 3: Pattern Matching & Regular Expressions

Issued: September 10, 2002
Due: September 19, 1:00AM