A0: Warmup
Welcome to the first assignment in CS 3110!
This assignment is a warmup: it’s some gentle practice that’s meant to prepare you for the rest of the assignments, which will be more intense. It will also give you the opportunity to verify that your OCaml environment is set up correctly, and to experience the 3110 assignment workflow.
Please track the time that you spend working on this assignment. I ask you to report it in your submission, and I will make a statistical summary available afterwards. On last year’s warmup assignment, the mean was 6.5 hours, and the standard deviation was 2.5 hours. Although the problems are different this year, I estimate that the difficulty is similar.
Collaboration policy: This assignment is to be completed as an individual. You may have only limited collaboration with other students as described in the syllabus.
What you’ll do: Implement a few functions; document them; and test them with OUnit.
Objectives:
- Use OCaml operators;
let
,if
, andmatch
expressions; functions; recursion; and lists. - Write OUnit test cases.
- Write specification comments in OCaml.
- Work on good OCaml style.
Lectures covered: This assignment relies on material from weeks 1 and 2.
FAQs
-
Q: May I change the names, types, or specifications of required functions?
A: No. -
Q: May I add or remove the
rec
keyword from a function definition without changing the function’s type?
A: Yes. -
Q: May I add helper functions?
A: Yes. -
Q: May I use standard library functions—for example, the
List
module?
A: Yes. -
Q: May I use some additional library that would require changing the Makefile or tags file?
A: No. -
Q: May I use loops?
A: No. -
Q: May I use imperative features?
A: No. See the Imperative Features Prohibitions page linked from the Canvas front page. -
Q: Why am I getting “Unbound module” errors in VS Code?
A: Make sure you have runmake build
and you have the.merlin
file from the release code in your source directory.
Getting Started
Download the release code from CMS. Watch the A0 GIST in Canvas for a demo of how to get the code into Ugclinux by dragging and dropping it into the Explorer pane in VS Code. Or, if you’re using a local installation, just put the code somewhere on your own local drive.
Run make check
from the folder containing the release code. You should get
output that includes the following (other lines might be interspersed):
===========================================================
Your OCaml environment looks good to me. Congratulations!
===========================================================
Your function names and types look good to me.
Congratulations!
===========================================================
If instead you get warnings about your OCaml environment, seek help immediately
from a consultant. Assuming your environment is good, you should not get
warnings about about your function names and types. After completing each
function, you should re-run make check
to ensure that you haven’t changed any
required names or types.
There are a few other targets provided in the release code:
- Running
make
will open the toplevel and automatically load your code. That should be useful for any interactive testing you want to do. - Running
make test
will build and non-interactively run your OUnit test suite. - Running
make clean
will delete files generated from your source code. - Running
make finalcheck
will do the same asmake check
as well as two other things, which are described under the submission instructions at the end of this handout.
Academic Integrity
Please review and complete the Academic Integrity statement provided in a
comment at the top of the release code file warmup.ml
.
Problem 1: I No Longer Know What Day It Is
Define a function day_of_week : int -> int -> int -> int
that takes integers
year
, month
, and day
, in that order, as input; and outputs the day of the
week on which that date falls.
-
Use the ISO 8601 weekday number standard to represent the day of the week: Monday is 1, Tuesday is 2, …, Sunday is 7.
-
Also per ISO 8601, assume that the proleptic Gregorian calendar is in use. That is, don’t bother thinking about whether the Gregorian calendar was actually in use on the input date—which anyway would depend in part on the location in the world. Just assume that the Gregorian calendar always has and always will be in use, everywhere.
-
The behavior of the function is unspecified if the input date is not a valid date in the proleptic Gregorian calendar. Valid dates have a year that is strictly greater than 0, a month in the range 1..12, representing Jan..Dec; and a day in the range 1..{28,29,30,31}, depending on what the number of days is in the year and month. Since the behavior is unspecified, your implementation is free to do anything you like on an invalid input—for example, raise an exception with a helpful error message, return a bogus output, or even set the computer on fire. 🔥🔥🔥 Though, truth be told, the graders would prefer you didn’t set their laptops on fire. Note that “asserting the precondition” is helpful but not required. See section 2.4.2 of the textbook for details.
-
Use Zeller’s congruence to solve this problem. Be careful about coding the computions correctly, especially the modulo operations.
Specification: [UPDATED 09/09/20] Before writing any code, read the provided specification comment for the function. Please do not change it. Note the precondition: it requires validity. Also, Python programmers take note: there is no precondition about the types of values; section 2.3.8 of the textbook explains why such a precondition would be unidiomatic in statically-typed languages (e.g., Java and OCaml).
Testing: File warmup_test.ml
has two OUnit test cases to get you started.
Make sure your solution passes those, without any modification to them. Add
eight more test cases, for a total of ten.
-
Section 3.1.3 of the textbook covers the mechanics of unit testing with OUnit.
-
If you want to test for an exception being raised, section 3.2.5 covers that, but it’s getting a little ahead of where we are so far in the course. There’s no real need to test for exceptions on this assignment if you require validity of the date as a precondition.
-
Later in the semester we’ll learn about more principled approaches to testing; for now, this “big round number of 10” is just to get practice with writing OUnit test cases.
-
If you’re having trouble inventing test cases for dates far away from today, consider cross-validating against another language and library you already know. For example, both Python and Java have libraries for computing this same function. With a little research, you should be able to find them.
One of the big differences between CS 2110 and 3110 is that, starting now, development of test suites becomes your responsibility. It was a luxury that they were provided for you in 2110. So please take this part of the assignment seriously.
Code quality: After you finish testing, take extra time to make your code as
readable and elegant as you can. See the code quality rubric at the end of this
handout for suggestions. For this problem, concentrate especially on (i) using
whitespace and indentation well, (ii) using let
and if
expressions to
organize the computation, and (iii) using parentheses only where necessary. If
you’re not sure whether parentheses are necessary, try removing them and
re-running your test suite.
Problem 2: How Do I Live Without Loops?
Sigma notation, credited to Leonhard Euler, compactly represents long arithmetic expressions. For example,
\[
\sum_{n=1}^{10} n^2
\]
represents
\[ 1^2 + 2^2 + 3^2 + 4^2 + 5^2 + 6^2 + 7^2 + 8^2 + 9^2 + 10^2. \]
To an imperative programmer, sigma notation immediately suggests a loop. For example, a Java programmer could implement
\[ \sum_{n=a}^{b} f(n) \]
as
int sum= 0;
int n= a;
while (n <= b) {
sum+= f(n);
n++;
}
But to a functional programmer, sigma notation suggests a recursive definition:
\[
\begin{align}
\sum_{n=a}^{b} f(n) & = f(a) + \sum_{n=a+1}^{b} f(n) & & \text{if}~a \leq b \\
& = 0 & & \text{otherwise}
\end{align}
\]
Problem 2.1: Implement a function
sigma : (int -> int) -> int -> int -> int
that reduces an integer summation to
its final numeric value, such that sigma f a b
is \(\sum_{n=a}^{b} f(n)\).
\(\blacksquare\)
Pi notation is like sigma notation, but it represents a product rather than a sum. For example, a Java programmer could implement
\[ \prod_{n=a}^{b} f(n) \]
as
int prod= 1;
int n= a;
while (n <= b) {
prod*= f(n);
n++;
}
Note that the product over an empty range is 1, as suggested by the Java implementation above. The range is the set of values \(a..b\). In mathematics, any repeated operation like \(\Sigma\) or \(\Pi\) based on a binary operator is defined to be the identity element of that operator when taken over an empty range. The identity element is the value that causes the operator to leave its other operand unchanged. So the identity element of \(+\) is 0, and the identity element of \(\times\) is 1.
Problem 2.2: Implement a function pi : (int -> int) -> int -> int -> int
that reduces an integer product to its final numeric value, such that pi f a b
is \(\prod_{n=a}^{b} f(n)\).
\(\blacksquare\)
Your implementations of sigma
and pi
should be nearly identical. Let’s
factor out their commonality into a single function reduce
:
let rec reduce inc test op base f x =
if test x then base
else op (f x) (reduce inc test op base f (inc x))
You may not change the provided implementation of reduce
.
Problem 2.3: Write a specification comment for reduce
. Imagine that you
were trying to explain the function to a friend. What would you say? Write that
down in the comment. Then, re-implement sigma
and pi
as sigma_r
and
pi_r
, with the same types, using just a single application of reduce
in
each. You should not use the rec
keyword in the definitions of sigma_r
and
pi_r
; all the recursion should be in reduce
. Your two new function bodies
should contain just a single line of code each. Hint: make use of the OCaml
syntax for treating infix operators as functions, e.g., ( + )
.
\(\blacksquare\)
The implementation of reduce
that we gave above is not tail recursive. That
means your implementation of sigma_r
is going to struggle with sums over
large ranges:
# sigma_r (fun x -> x) 0 1_000_000;;
Stack overflow during evaluation (looping recursion?).
Problem 2.4: Re-implement reduce
as a tail-recursive function reduce_tr
,
with the same type. Re-implement sigma_r
and pi_r
as sigma_tr
and
pi_tr
; the only change should be replacing identifier reduce
with
reduce_tr
.
The tail-recursive version might end up applying op
in a different order. For
example, computing
\[ \sum_{n=1}^{2} n \]
could cause sigma_r
to evaluate 1 + (2 + 0)
, whereas sigma_tr
might
instead evaluate 2 + (1 + 0)
. This is fine. You may assume that op
is commutative and associative, and that base
is an identity element w.r.t.
op
.
[UPDATE 09/14/20] You might decide to use a helper function that takes an
additional accumulator argument to implement reduce_tr
. If so, please nest the
definition of that helper inside the definition of reduce_tr
. It will be good
practice, and more importantly it will enable you to write an elegant helper
function that does not take any unnecessary arguments. Hint: it needs only
two. Note that in the future we generally won’t care in this course whether you
use let
expressions to nest the definitions of helper functions; this
situation is special in that we’re trying to give a little more guidance as you
get used to the language. If you can find a way to code this without a helper
function, that’s fine too.
\(\blacksquare\)
Now you will be able to compute sums over larger ranges:
# sigma_tr (fun x -> x) 0 1_000_000;;
- : int = 500000500000
Of course, sums and products of large numbers will still be problematic because of the limited range of integers, but that’s not your fault.
Specification: The only specification comment we will grade in this problem
is the one you wrote for reduce
. Your comment needs to explain each input, as
well as the output.
Testing: Include at least two test cases each for sigma
and pi
, testing
the base case and the recursive case for each. Include at least one test case
each for sigma_r
, pi_r
, sigma_tr
, and pi_tr
. For the tail-recursive
functions, those test cases need to test on large ranges that would overflow the
stack on the non-tail-recursive implementations—e.g., a million elements.
In total, that’s at least eight test cases.
Code quality: Again, after you finish testing, take extra time to make your code as readable and elegant as you can. Focus on the same aspects that are listed above in the first problem.
Problem 3: So Simple A Child Could Do It
Martha and Mike (M&M) are picking teams of Managing Vice Provosts (MVPs) for the annual Administrivia Tournament. MVPs are rated by their Bureaucratic Aptitude for Endowment Raising (BEAR). Because M&M are so fair-minded, they want to form teams whose total BEAR scores are as equal as possible. How to do it?
Martha, being a computer scientist, recognizes this task as the partition problem. (You might too, if you took CS 2110 in Spring 2020. It was covered as an example of a recursive backtracking algorithm.) Solving the problem perfectly is computationally expensive, seemingly requiring an exponential-time algorithm. But there is a simple polynomial-time algorithm that is easy to describe and implement, even though it does not produce perfect solutions. Children around the world know this algorithm. For that, among other even more compelling reasons, the partition problem has been called “the easiest hard problem.”
We’ll state the partition problem more carefully as follows. You are given a bag \(b\) of strictly positive integers.
A bag (aka multiset) is like a set, in that order does not matter. But unlike a set, bags may contain multiple copies of an element. So, {1, 1, 2, 3} is a bag but not a set, and it is considered to be the same bag as {3, 2, 1, 1}.
You want to partition \(b\) into \(n\) non-empty bags \(b_1, \ldots, b_n\) whose sums are close to one another. In particular, let \(s_i\) be the sum of the integers in \(b_i\), let \(M\) be largest element of \(\{s_i \mathrel{|} i \in 1..n \}\), and let \(m\) be the smallest. You want to make \(M-m\), which is called the discrepancy, as small as possible. For example, if \(n\) is 2, you could partition {8, 7, 6, 4} into
-
two bags {8, 7} {6, 4} with a discrepancy of 5, or
-
two bags {8, 4} {7, 6} with a discrepancy of 1 (which is the best achievable for the given bag),
-
and so forth.
Or if \(n\) is 3, you could partition it into three bags {8} {7} {6, 4} with a discrepancy of 3, and so forth.
The polynomial-time algorithm is simple. Start with \(n\) empty bags. Sort the input bag \(b\) in descending order. Remove the largest element and place it in the bag with the smallest sum (or any such bag, if there is a tie for the smallest sum). Keep doing that until the input bag is empty. If \(n\) is 2 and the input bag is {4, 8, 7, 6}, the algorithm would:
-
initialize two empty bags
-
sort {4, 8, 7, 6} into the sequence 8, 7, 6, 4.
-
put 8 in the first bag
-
put 7 in the second bag
-
put 6 in the second bag
-
put 4 in the first bag
That yields output bags {4, 8} and {6, 7}, with a discrepancy of 1. A discrepancy of 0 isn’t achievable, because the sum of the input bag is odd.
This algorithm is an example of a greedy algorithm, which is a class of algorithms that make locally-optimal choices at the risk of not finding a globally-optimal solution. Dijkstra’s algorithm, which you might recall from CS 2110, is a greedy algorithm that always finds a globally-optimal solution. But the algorithm we just gave does not necessarily find globally-optimal solutions. For example, when asked to partition input {8, 7, 6, 5, 4} into two bags, the algorithm outputs {8, 5, 4} {7, 6} or {8, 5} {7, 6, 4} with a discrepancy of 4. But there is a perfect solution with a discrepancy of 0, which is {8, 7} {6, 5, 4}.
Imperfect though it may be, the greedy algorithm for the partition problem is nonetheless easier to understand than more optimal algorithms. And on a slight variant of the problem, the greedy algorithm is provably not far away from optimal [Graham 1969, Theorem 2].
Problem 3.1: Implement a function
partition : int list -> int list * int list
that partitions its input bag into
two output bags using the greedy algorithm above. Despite the simplicity of the
algorithm, implementing it is not child’s play.
-
We use lists here to represent bags, and a pair to represent the two output bags. Of course, since order matters in a list and in a pair, that complicates testing. In grading of your solution, we will ignore the order of elements in lists and the order of components in the pair. That makes the course staff’s job harder; but yours, easier.
-
If there’s a tie in which bag you could put an element into, we’ll accept either as a correct output.
-
As a precondition, you may require that the input bag has at least 2 elements.
-
Use
List.sort
to handle the sorting of the input bag. Code your own comparison function to sort in descending order. -
A few points in the Excellent Scope will be allocated to testing that your solution can handle inputs of up to 1 million elements in fewer than 10 seconds without a stack overflow. We’ve provided an operator
(--)
for generating large lists. Concrete execution time bounds are admittedly somewhat meaningless, because some people have slower or faster machines. But 10 seconds is more than generous here; 2 seconds is probably more accurate. If in doubt about your own machine’s speed, try running your code on ugclinux. If your code isn’t fast enough, think carefully about whether it is needlessly re-computing any values (e.g.,the sums of the output bags) at each recursive call. -
If it’s obvious to you how to solve the next subproblem,
multi_partition
, you could implementpartition
completely in terms of it by reversing the order of their definitions inwarmup.ml
. But in general we recommend solvingpartition
first to get some insight and practice, then moving on tomulti_partition
afterwards. It’s fine for your solutions to the two problems be separate pieces of code: you don’t need to re-implementpartition
in terms ofmulti_partition
.
\(\blacksquare\)
Problem 3.2: Implement a function
multi_partition : int -> int list -> int list list
, such that
multi_partition n b
partitions input bag b
into n
output bags. The length
of the output list should therefore be n
. Each element of that output list is
itself a bag.
-
Really we should think of the output list as itself being a bag, because order does not matter in it. So in grading your solution we will again ignore order there.
-
As preconditions, you may require that
n
is at least 1, and that the input bag contains at leastn
elements. -
For performance-testing purposes, you may assume that
n
will be 1000 or less. We could use a heap data structure (or something similar) to eliminate that assumption, but we haven’t covered enough OCaml yet. The input list could still be up to a million elements.
\(\blacksquare\)
Specification: We will not grade any specification comments in this problem.
Testing: Include at least five test cases each for partition
and
multi_partition
. Make sure to test multi_partition
on some n
that are
not 2. The “Easiest Hard Problem” paper mentioned above has
some examples you could use as test cases if you like.
We won’t grade whether you test on some really large input bags, though our grading suite will include a few such tests for the Excellent Scope. But if you’re interested… Testing on really large input bags could be tedious, because you would seemingly need to hard-code the correct outputs. Instead, try testing whether the output is a partition of the input. That can be done with much less code. This is an example of what’s called property-based testing: checking whether the output has some property, rather than what the exact output is.
Code quality: As always, take extra time to make your code as readable and
elegant as you can. Focus on the same aspects that are listed above in the first
problem. Now you will have match
expressions to beautify as well. Although the
compiler will sometimes permit the omission of parentheses around pairs, you
should leave them in: stylistically, they are not considered unnecessary except
in a couple very specific circumstances. See the
OCaml Community guidelines under the heading “How to write pairs” for
details.
Rubric
- 25 points: submitted and compiles
- 25 points: satisfactory scope
- 25 points: good scope
- 5 points: excellent scope
- 10 points: testing
- 10 points: code quality (including specifications)
Scope Rubric. Your solution earns points by passing the grading suite’s OUnit test cases on the following functions:
- Satisfactory:
day_of_week
,sigma
,pi
. - Good:
sigma_r
,pi_r
,reduce_tr
,sigma_tr
,pi_tr
,partition
- Excellent:
multi_partition
, performance ofpartition
andmulti_partition
on large inputs
Each test case is worth some number of points, so it’s definitely possible to get partial credit on individual functions that way. Sorry, but we can’t provide a more detailed breakdown in advance.
Testing Rubric. Later in the semester we will focus much more on developing
test suites. For this first assignment, though, you just need to provide the
test cases required above. We will grade your test cases independently of
whether you’ve finished a level of scope, which is to say you can get (or lose)
points for testing even if you don’t finish the functions they test. For
example, if you don’t write multi_partition
or test cases for it, you will
lose points both from the excellent scope autograder and from the manual
grading of your test cases. But you could provide those test cases and get
points for them even if you don’t write the function itself. Part of the reason
we grade this way is to incentivize you to write test cases before writing
(or finishing) a function.
Code Quality Rubric. There are many aspects of quality that make code easier to read. We will focus on just a few for this first assignment. The graders will assess the following aspects of your source code:
- The programmer’s name and NetID are supplied at top of file.
- The specification comments described above are provided.
- White space (the space character and empty lines) is used effectively.
- Indentation of
let
,if
, andmatch
follows the OCaml Community Guidelines. - Identifier names are clear and consistent. They use
snake_case
, notcamelCase
. See “How to choose identifiers” and following paragraphs in the Community Guidelines. - Parentheses are used only where necessary. See “when to use parentheses within an expression” in the Community Guidelines for some hints.
- No line is wider than 80 columns. Not only does the OCaml community agree on that, so does Google.
- No tabs characters (ASCII character 9) are used, only spaces. Note that the tab character and the tab key are not the same thing. The configuration instructions we gave for VS Code set up the tab key to enter two spaces.
Submission
Set the hours_worked
variable at the end of warmup.ml
to the number
of hours you spent working on this assignment, rounded to the nearest
integer. Please don’t include the time that you spent installing
OCaml.
Ensure that your solution passes make finalcheck
. That will do the
same as make check
, as well as two other things. First, it will check
to make sure you’ve set hours_worked
. Second, it will compute a short
“fingerprint” of your submission called an MD5 sum.
If you’re not working on a local install, you will need to get your code back on your own machine to upload it from your browser. Right click on the file in the VS Code Explorer pane, choose Download, and save it on your local drive. Don’t drag and drop; that can result in an empty file (which you will be able to detect by looking in it).
Submit your warmup.ml
and warmup_test.ml
files on CMS. Double-check
that the MD5 sums are what you expected. Re-download your submission from CMS
and double-check before the deadline that the contents of the files are what you
intended.
Congratulations! Your warmup is complete!