A0: Warmup

Olympic Hockey Pictogram

Welcome to the first assignment in CS 3110!

This assignment is a warmup: it’s some gentle practice that’s meant to prepare you for the rest of the assignments, which will be more intense. It will also give you the opportunity to verify that your OCaml environment is set up correctly, and to experience the 3110 assignment workflow.

Please track the time that you spend working on this assignment. I ask you to report it in your submission, and I will make a statistical summary available afterwards. On last year’s warmup assignment, the mean was 6.5 hours, and the standard deviation was 2.5 hours. Although the problems are different this year, I estimate that the difficulty is similar.

Collaboration policy: This assignment is to be completed as an individual. You may have only limited collaboration with other students as described in the syllabus.

What you’ll do: Implement a few functions; document them; and test them with OUnit.

Objectives:

Lectures covered: This assignment relies on material from weeks 1 and 2.

FAQs

Getting Started

Download the release code from CMS. Watch the A0 GIST in Canvas for a demo of how to get the code into Ugclinux by dragging and dropping it into the Explorer pane in VS Code. Or, if you’re using a local installation, just put the code somewhere on your own local drive.

Run make check from the folder containing the release code. You should get output that includes the following (other lines might be interspersed):

===========================================================
Your OCaml environment looks good to me.  Congratulations!
===========================================================
Your function names and types look good to me.
Congratulations!
===========================================================

If instead you get warnings about your OCaml environment, seek help immediately from a consultant. Assuming your environment is good, you should not get warnings about about your function names and types. After completing each function, you should re-run make check to ensure that you haven’t changed any required names or types.

There are a few other targets provided in the release code:

Academic Integrity

Please review and complete the Academic Integrity statement provided in a comment at the top of the release code file warmup.ml.

Problem 1: I No Longer Know What Day It Is

Define a function day_of_week : int -> int -> int -> int that takes integers year, month, and day, in that order, as input; and outputs the day of the week on which that date falls.

Specification: [UPDATED 09/09/20] Before writing any code, read the provided specification comment for the function. Please do not change it. Note the precondition: it requires validity. Also, Python programmers take note: there is no precondition about the types of values; section 2.3.8 of the textbook explains why such a precondition would be unidiomatic in statically-typed languages (e.g., Java and OCaml).

Testing: File warmup_test.ml has two OUnit test cases to get you started. Make sure your solution passes those, without any modification to them. Add eight more test cases, for a total of ten.

One of the big differences between CS 2110 and 3110 is that, starting now, development of test suites becomes your responsibility. It was a luxury that they were provided for you in 2110. So please take this part of the assignment seriously.

Code quality: After you finish testing, take extra time to make your code as readable and elegant as you can. See the code quality rubric at the end of this handout for suggestions. For this problem, concentrate especially on (i) using whitespace and indentation well, (ii) using let and if expressions to organize the computation, and (iii) using parentheses only where necessary. If you’re not sure whether parentheses are necessary, try removing them and re-running your test suite.

Problem 2: How Do I Live Without Loops?

Sigma notation, credited to Leonhard Euler, compactly represents long arithmetic expressions. For example,

\[ \sum_{n=1}^{10} n^2
\]

represents

\[ 1^2 + 2^2 + 3^2 + 4^2 + 5^2 + 6^2 + 7^2 + 8^2 + 9^2 + 10^2. \]

To an imperative programmer, sigma notation immediately suggests a loop. For example, a Java programmer could implement

\[ \sum_{n=a}^{b} f(n) \]

as

int sum= 0;
int n= a;
while (n <= b) {
  sum+= f(n);
  n++;
}

But to a functional programmer, sigma notation suggests a recursive definition:

\[ \begin{align} \sum_{n=a}^{b} f(n) & = f(a) + \sum_{n=a+1}^{b} f(n) & & \text{if}~a \leq b \\
& = 0 & & \text{otherwise} \end{align} \]

Problem 2.1: Implement a function sigma : (int -> int) -> int -> int -> int that reduces an integer summation to its final numeric value, such that sigma f a b is \(\sum_{n=a}^{b} f(n)\).

\(\blacksquare\)

Pi notation is like sigma notation, but it represents a product rather than a sum. For example, a Java programmer could implement

\[ \prod_{n=a}^{b} f(n) \]

as

int prod= 1;
int n= a;
while (n <= b) {
  prod*= f(n);
  n++;
}

Note that the product over an empty range is 1, as suggested by the Java implementation above. The range is the set of values \(a..b\). In mathematics, any repeated operation like \(\Sigma\) or \(\Pi\) based on a binary operator is defined to be the identity element of that operator when taken over an empty range. The identity element is the value that causes the operator to leave its other operand unchanged. So the identity element of \(+\) is 0, and the identity element of \(\times\) is 1.

Problem 2.2: Implement a function pi : (int -> int) -> int -> int -> int that reduces an integer product to its final numeric value, such that pi f a b is \(\prod_{n=a}^{b} f(n)\).

\(\blacksquare\)

Your implementations of sigma and pi should be nearly identical. Let’s factor out their commonality into a single function reduce:

let rec reduce inc test op base f x =
  if test x then base
  else op (f x) (reduce inc test op base f (inc x))

You may not change the provided implementation of reduce.

Problem 2.3: Write a specification comment for reduce. Imagine that you were trying to explain the function to a friend. What would you say? Write that down in the comment. Then, re-implement sigma and pi as sigma_r and pi_r, with the same types, using just a single application of reduce in each. You should not use the rec keyword in the definitions of sigma_r and pi_r; all the recursion should be in reduce. Your two new function bodies should contain just a single line of code each. Hint: make use of the OCaml syntax for treating infix operators as functions, e.g., ( + ).

\(\blacksquare\)

The implementation of reduce that we gave above is not tail recursive. That means your implementation of sigma_r is going to struggle with sums over large ranges:

# sigma_r (fun x -> x) 0 1_000_000;;
Stack overflow during evaluation (looping recursion?).

Problem 2.4: Re-implement reduce as a tail-recursive function reduce_tr, with the same type. Re-implement sigma_r and pi_r as sigma_tr and pi_tr; the only change should be replacing identifier reduce with reduce_tr.

The tail-recursive version might end up applying op in a different order. For example, computing

\[ \sum_{n=1}^{2} n \]

could cause sigma_r to evaluate 1 + (2 + 0), whereas sigma_tr might instead evaluate 2 + (1 + 0). This is fine. You may assume that op is commutative and associative, and that base is an identity element w.r.t. op.

[UPDATE 09/14/20] You might decide to use a helper function that takes an additional accumulator argument to implement reduce_tr. If so, please nest the definition of that helper inside the definition of reduce_tr. It will be good practice, and more importantly it will enable you to write an elegant helper function that does not take any unnecessary arguments. Hint: it needs only two. Note that in the future we generally won’t care in this course whether you use let expressions to nest the definitions of helper functions; this situation is special in that we’re trying to give a little more guidance as you get used to the language. If you can find a way to code this without a helper function, that’s fine too.

\(\blacksquare\)

Now you will be able to compute sums over larger ranges:

# sigma_tr (fun x -> x) 0 1_000_000;;
- : int = 500000500000

Of course, sums and products of large numbers will still be problematic because of the limited range of integers, but that’s not your fault.

Specification: The only specification comment we will grade in this problem is the one you wrote for reduce. Your comment needs to explain each input, as well as the output.

Testing: Include at least two test cases each for sigma and pi, testing the base case and the recursive case for each. Include at least one test case each for sigma_r, pi_r, sigma_tr, and pi_tr. For the tail-recursive functions, those test cases need to test on large ranges that would overflow the stack on the non-tail-recursive implementations—e.g., a million elements. In total, that’s at least eight test cases.

Code quality: Again, after you finish testing, take extra time to make your code as readable and elegant as you can. Focus on the same aspects that are listed above in the first problem.

Problem 3: So Simple A Child Could Do It

Martha and Mike (M&M) are picking teams of Managing Vice Provosts (MVPs) for the annual Administrivia Tournament. MVPs are rated by their Bureaucratic Aptitude for Endowment Raising (BEAR). Because M&M are so fair-minded, they want to form teams whose total BEAR scores are as equal as possible. How to do it?

Martha, being a computer scientist, recognizes this task as the partition problem. (You might too, if you took CS 2110 in Spring 2020. It was covered as an example of a recursive backtracking algorithm.) Solving the problem perfectly is computationally expensive, seemingly requiring an exponential-time algorithm. But there is a simple polynomial-time algorithm that is easy to describe and implement, even though it does not produce perfect solutions. Children around the world know this algorithm. For that, among other even more compelling reasons, the partition problem has been called “the easiest hard problem.”

We’ll state the partition problem more carefully as follows. You are given a bag \(b\) of strictly positive integers.

A bag (aka multiset) is like a set, in that order does not matter. But unlike a set, bags may contain multiple copies of an element. So, {1, 1, 2, 3} is a bag but not a set, and it is considered to be the same bag as {3, 2, 1, 1}.

You want to partition \(b\) into \(n\) non-empty bags \(b_1, \ldots, b_n\) whose sums are close to one another. In particular, let \(s_i\) be the sum of the integers in \(b_i\), let \(M\) be largest element of \(\{s_i \mathrel{|} i \in 1..n \}\), and let \(m\) be the smallest. You want to make \(M-m\), which is called the discrepancy, as small as possible. For example, if \(n\) is 2, you could partition {8, 7, 6, 4} into

Or if \(n\) is 3, you could partition it into three bags {8} {7} {6, 4} with a discrepancy of 3, and so forth.

The polynomial-time algorithm is simple. Start with \(n\) empty bags. Sort the input bag \(b\) in descending order. Remove the largest element and place it in the bag with the smallest sum (or any such bag, if there is a tie for the smallest sum). Keep doing that until the input bag is empty. If \(n\) is 2 and the input bag is {4, 8, 7, 6}, the algorithm would:

That yields output bags {4, 8} and {6, 7}, with a discrepancy of 1. A discrepancy of 0 isn’t achievable, because the sum of the input bag is odd.

This algorithm is an example of a greedy algorithm, which is a class of algorithms that make locally-optimal choices at the risk of not finding a globally-optimal solution. Dijkstra’s algorithm, which you might recall from CS 2110, is a greedy algorithm that always finds a globally-optimal solution. But the algorithm we just gave does not necessarily find globally-optimal solutions. For example, when asked to partition input {8, 7, 6, 5, 4} into two bags, the algorithm outputs {8, 5, 4} {7, 6} or {8, 5} {7, 6, 4} with a discrepancy of 4. But there is a perfect solution with a discrepancy of 0, which is {8, 7} {6, 5, 4}.

Imperfect though it may be, the greedy algorithm for the partition problem is nonetheless easier to understand than more optimal algorithms. And on a slight variant of the problem, the greedy algorithm is provably not far away from optimal [Graham 1969, Theorem 2].

Problem 3.1: Implement a function partition : int list -> int list * int list that partitions its input bag into two output bags using the greedy algorithm above. Despite the simplicity of the algorithm, implementing it is not child’s play.

\(\blacksquare\)

Problem 3.2: Implement a function multi_partition : int -> int list -> int list list, such that multi_partition n b partitions input bag b into n output bags. The length of the output list should therefore be n. Each element of that output list is itself a bag.

\(\blacksquare\)

Specification: We will not grade any specification comments in this problem.

Testing: Include at least five test cases each for partition and multi_partition. Make sure to test multi_partition on some n that are not 2. The “Easiest Hard Problem” paper mentioned above has some examples you could use as test cases if you like.

We won’t grade whether you test on some really large input bags, though our grading suite will include a few such tests for the Excellent Scope. But if you’re interested… Testing on really large input bags could be tedious, because you would seemingly need to hard-code the correct outputs. Instead, try testing whether the output is a partition of the input. That can be done with much less code. This is an example of what’s called property-based testing: checking whether the output has some property, rather than what the exact output is.

Code quality: As always, take extra time to make your code as readable and elegant as you can. Focus on the same aspects that are listed above in the first problem. Now you will have match expressions to beautify as well. Although the compiler will sometimes permit the omission of parentheses around pairs, you should leave them in: stylistically, they are not considered unnecessary except in a couple very specific circumstances. See the OCaml Community guidelines under the heading “How to write pairs” for details.

Rubric

Scope Rubric. Your solution earns points by passing the grading suite’s OUnit test cases on the following functions:

Each test case is worth some number of points, so it’s definitely possible to get partial credit on individual functions that way. Sorry, but we can’t provide a more detailed breakdown in advance.

Testing Rubric. Later in the semester we will focus much more on developing test suites. For this first assignment, though, you just need to provide the test cases required above. We will grade your test cases independently of whether you’ve finished a level of scope, which is to say you can get (or lose) points for testing even if you don’t finish the functions they test. For example, if you don’t write multi_partition or test cases for it, you will lose points both from the excellent scope autograder and from the manual grading of your test cases. But you could provide those test cases and get points for them even if you don’t write the function itself. Part of the reason we grade this way is to incentivize you to write test cases before writing (or finishing) a function.

Code Quality Rubric. There are many aspects of quality that make code easier to read. We will focus on just a few for this first assignment. The graders will assess the following aspects of your source code:

Submission

Set the hours_worked variable at the end of warmup.ml to the number of hours you spent working on this assignment, rounded to the nearest integer. Please don’t include the time that you spent installing OCaml.

Ensure that your solution passes make finalcheck. That will do the same as make check, as well as two other things. First, it will check to make sure you’ve set hours_worked. Second, it will compute a short “fingerprint” of your submission called an MD5 sum.

If you’re not working on a local install, you will need to get your code back on your own machine to upload it from your browser. Right click on the file in the VS Code Explorer pane, choose Download, and save it on your local drive. Don’t drag and drop; that can result in an empty file (which you will be able to detect by looking in it).

Submit your warmup.ml and warmup_test.ml files on CMS. Double-check that the MD5 sums are what you expected. Re-download your submission from CMS and double-check before the deadline that the contents of the files are what you intended.

Congratulations! Your warmup is complete!