CS3110 OCaml Style Guide

You have spent many years in secondary school learning English style and usage.  Programming languages are no different.  Every programming language has its own idioms and idiosyncrasies, and forcing one language's style upon another is like trying to speak French using the rules of English grammar.  Of course there are some elements that are shared between all languages, which you will discover in the course of learning OCaml and comparing it with languages you already know.  But there are some fundamental differences that make programming in OCaml quite different from programming in Python or Java.

One of our main goals in this class is for you to develop an appreciation for concise, precise, and elegant code.  This is in part reflected in your programming style.  As you will soon realize, this class takes style very seriously.  Listed below are some stylistic rules for OCaml which we would like you to follow.  We will be quite strict and will deduct points for stylistic infractions, even if your program is functionally correct.


Readability is Priority #1

The most important goal is for your code to be readable by other humans. All the guidelines below are in service of that goal. The course staff has a great deal of experience reading code, and we are happy to consult with you about style issues.


File Submission

Code Must Compile. Any code you submit must compile under OCaml without errors.  Never submit anything that you have changed, no matter how small the change, without checking that it still compiles.  There is no excuse for code that does not compile. Harsh penalties will be applied to submission that do not compile.

File Format

Your code needs to be readable in someone else's editor, which might be configured differently than yours.

80 Column Limit. A good rule to follow is to keep nearly all lines of code under 80 columns. This rule can sometimes be bent for unit tests or for really long strings. But bear in mind that the graders will be reading your unit tests, so you should strive to make them as readable as possible.

No Tab Characters. Do not use the tab character (0x09).  Instead, use spaces to control indenting; we generally recommend 2 spaces per indent, but 4 is also fine.  The width of a tab is not uniform across all editors, and what looks good on your machine might be unreadable on a grader's, especially if there are mixed spaces and tabs.

To convert tabs into spaces in Emacs, paste the following into your .emacs file:

(setq-default indent-tabs-mode nil)

If you are using Vi, then add the following into your .vimrc:

set expandtab

If you are using Sublime, add the following to your user settings:

{
    "tab_size": 2,
    "translate_tabs_to_spaces": true,
    "trim_trailing_white_space_on_save": true,
    "rulers": [80]
}

Comments

Avoid Useless Comments. Avoid comments that merely repeat the code they reference or state the obvious.  Comments should state the invariants, the non-obvious, or any references that have more information about the code.  The preceding example contains a comment that is obvious and should be omitted.

Avoid Over-commenting. Very many or very long comments in the code body are often more distracting than helpful.  Long comments may appear at the top of a file if you wish to explain the overall design of the code or refer to any sources that have more information about the algorithms or data structures.  All other comments in the file should be as short as possible.  A good place for a comment is just before a function declaration.  Descriptive variable names and type annotations can help minimize the need for comments.

Line Breaks. Empty lines should be used to separate definitions inside a file or a module. Within a function, empty lines should be avoided.

Multi-line Commenting. Multiline comments can be distinguished from code by preceding each line of the comment with a * similar to the following:

(* Here's a comment that doesn't quite fit on
 * one line. It could probably be shortened. *)
let rec factorial n = ...

(**
 * This is one of those long comments
 * that need to span multiple lines because
 * the code is unusually complex and requires
 * extra explanation. 
 * Note that the double asterisk at the beginning
 * flags this comment as a docstring.
 *)
let reverse_entropy () = ...

Naming and Declarations

Naming Conventions. Most importantly, be consistent within your own code base, and to the extent possible, with any libraries that you use. The following are the guidelines that are followed by the standard OCaml libraries:

Use Meaningful Names. Another way of conveying information is to use meaningful variable names that reflect their intended use.  Choose words or combinations of words describing the value.  Variable names may be one letter in short let blocks.  Functions used ephemerally in a fold, filter, or map are often bound to the name f.  Here is an example for short variable names:

let d = Unix.localtime(Unix.time()) in
let m = d.Unix.tm_min in
let s = d.Unix.tm_sec in
let f n = (n mod 3) = 0 in
  List.filter f [m;s]

Type Synonyms. You are encouraged to use type synonyms to convey intent. If you are using an (int * int) tuple to represent a point in space, consider using the type declaration and annotation:

type point = int * int

let absolute_value (a : point) : point = ...

Mutable Variables. Mutable variables are at odds with the philosophy of functional programming and should be used sparingly.  They are used primarily to maintain local state in a closure.  Other uses should be avoided.  In particular, global mutable variables cause many problems.  First, it is difficult to ensure that a global mutable variable is in the proper state, since it might have been modified outside the function or by a previous execution of the algorithm.  This is especially problematic with concurrent threads.  Second, and more importantly, having global mutable variables makes it more likely that your code is non-reentrant.  Without proper knowledge of the ramifications, declaring global mutable variables can extend beyond bad style to incorrect code.

Renaming. You should rarely need to rename values; in fact, this is a sure way to obfuscate code.  Renaming should be backed up with a very good reason. One instance where renaming is common and even encouraged is to alias external structures referenced by the current structure.  Here the external structure is aliased to a one- or two-letter name at the top of the struct block. This serves two purposes: it shortens the name of the structure and it documents the structures you use. Here is an example:

struct
  module H = Hashtbl
  module A = Array
  ...
end

Indenting

Most importantly, use indentation to make your code readable. Use horizontal and vertical alignment to help the reader's eye understand the structure of your code.

Indent by two or four spaces.  Be consistent.

Long expressions can be broken up and the parts aligned, as in the second example:

let x = "Long line..."^
  "Another long line."

let x = "Long line..."^
        "Another long line."

Match expressions should be indented as follows:

match expr with
| pat1 -> ...
| pat2 -> ...

If the code for each case is long or requires multiple lines, it should be indented as follows,
using either parenthesis or the begin/end keywords:

match expr with
| pat1 -> begin ... 
          end
| pat2 -> ( ...
               ... )

The right-hand side of each branch could also be indented as follows if it's really long, though perhaps it's worth factoring out a function at that point:

match expr with
| pat1 -> 
    begin 
      ... 
    end
| pat2 -> 
    ( ...
      ... )

If expressions could be indented according to one of the following schemes, as well as several other schemes not shown here:

if exp1 then exp2              if exp1 then
else if exp3 then exp4           exp2
else if exp5 then exp6         else 
else exp8                        exp3

if exp1 then exp2 else exp3    if exp1 then exp2
                               else exp3

Comments should be indented to the level of the line of code that follows the comment.

Parentheses

Over-Parenthesizing.  Parentheses have many semantic purposes in OCaml, including constructing tuples, grouping sequences of side-effect expressions, forcing a non-default parse of an expression, and grouping structures for functor arguments.  Their usage is very different from C or Java.  Avoid unnecessary parentheses when their presence makes your code harder to understand.

Match expressions.  Wrap inner nested match expressions with parentheses or begin/end.  This avoids a common error. If the inner match expression is already wrapped by a let...in block, you can drop the parentheses.

Block Styles. Blocks of code such as let...in should be indented as follows.

let foo bar =
  let p n =
    4 + n in
  let q = 38 in
  bar * (p q)

Blocks of code such as struct...end and sig...end should be indented as follows.

module type S = sig
   type t
   type u
   val x : t
end

Pipeline. A deeply nested chain of function applications such as f4 (f3 (f2 (f1 x))) should be rewritten with the pipeline operator as x |> f1 |> f2 |> f3 |> f4.

Pattern Matching

Inexhaustive Pattern Matches.  Inexhaustive pattern matches are flagged with compiler warnings, which you should take seriously. The compiler is telling you there's probably an error in your code, and you should think about how to fix it, rather than disregarding the warning.

Use Pattern Matching in Function Arguments.  Tuples, records and datatypes can be deconstructed using pattern matching.  If you simply deconstruct the function argument before you do anything useful, it is better to pattern match in the function argument. Consider these examples:

Bad            Good
let f arg1 arg2 =
  let x = fst arg1 in
  let y = snd arg1 in
  let z = fst arg2 in
    ...
 
let f (x,y) (z,_) = ...
let f arg1 = let
  let x = arg1.foo in
  let y = arg1.bar in
  let baz = arg1.baz in
    ...
  
 
let f {foo=x, bar=y, baz=baz} = ...

Avoid Unnecessary Projections.  Prefer pattern matching to projections with function arguments or a value declarations.  Using projections is okay as long as it is infrequent and the meaning is clearly understood from context.  The above rule shows how to pattern-match in the function arguments.  Here is an example for pattern matching with value declarations.

Bad            Good
let v = someFunction() in
let x = fst v in
let y = snd v in
  x+y
 
let (x,y) = someFunction() in
  x+y


Combine nested match Expressions.  Rather than nest match expressions, you can combine them by pattern matching against a tuple, provided the tests in the match expressions are independent.  Here is an example:

Bad
let d = Unix.localtime(Unix.time()) in
  match d.Unix.tm_mon with
  | 0 -> (match d.Unix.tm_mday with
             1 -> print_string "Happy New Year"
           | _ -> ())
  | 5 -> (match d.Unix.tm_mday with
            4 -> print_string "Happy Independence Day"
          | _ -> ())
  | 9 -> (match d.Unix.tm_mday with
             10 -> print_string "Happy Metric Day"
           | _ -> ())

Good
let d = Unix.localtime(Unix.time()) in
  match (d.Unix.tm_mon, d.Unix.tm_day) with
  | (0, 1) -> print_string "Happy New Year"
  | (5, 4) -> print_string "Happy Independence Day"
  | (9, 10) -> print_string "Happy Metric Day"
  | _ -> ()

Avoid the use of List.hd, List.tl, and List.nth.  These functions raise exceptions and are the cause of many coding errors. It is better to avoid them altogether.

s

Factoring

Avoid breaking expressions over multiple lines.  If a tuple consists of more than two or three elements, you should consider using a record instead of a tuple.  Records have the advantage of placing each name on a separate line and still looking good.  Constructing a tuple over multiple lines makes for ugly code.  Other expressions that take up multiple lines should be done with care.  The best way to transform code that constructs expressions over multiple lines to something that has good style is to factor the code using a let expression.  Consider the following:

Bad
     let third = fun (x,y,z) -> z in
     let rec euclid (m,n) : (int * int * int) =
       if n=0 then (b 1, b 0, m)
       else (snd (euclid (n, m mod n)), u - (m div n) *
            (euclid (n, m mod n)), third (euclid (n, m mod n)))
Better
     let third = fun (x,y,z) -> z in
     let rec euclid (m,n) : (int * int * int) =
       if n=0 then (b 1, b 0, m)
       else (snd (euclid (n, m mod n)),
            u - (m div n) * (euclid (n, m mod n)),
            third (euclid (n, m mod n)))
Best
     let rec euclid (m,n) : (int * int * int) =
       if n=0 then 
         (b 1, b 0, m)
       else
         let q = m div n in
         let r = m mod n in
         let (u, v, g) = euclid (n, r) in
         (v, u - q*v, g)

Verbosity

Don't Rewrite Library Functions. The OCaml library has a great number of functions and data structures -- use them!  Make use of the tools available to you.  Often students will recode verbatim List.filter, List.map, and similar functions.  Rewriting standard functions is only acceptable when you seek to provide a specific feature absent from the library.

Misusing if Expressions.  Remember that the type of the condition in an if expression is bool. In general, the type of an if expression is 'a, but in the case that the type is bool, you should not be using if at all. Consider the following:


Bad            Good
if e then true else false   e
if e then false else true   not e
if beta then beta else false   beta
if not e then x else y   if e then y else x
if x then true else y   x || y
if x then y else false   x && y
if x then false else y   not x && y
if x then y else true   not x || y


Misusing match Expressions.  The match expression is misused in two common situations.  First, match should never be used in place of an if expression (that's why if exists).  Note the following:

match e with
  true -> x
| false -> y

if e then x else y

The latter is much better.  The other misuse is using match when pattern matching with a let declaration is enough. Consider the following:

let x = match expr with (y,z) -> y

let (x,_) = expr

The latter is better.

Other Common Misuses.  Here are some other common mistakes to watch out for:

Bad            Good
l::[]   [l]
length + 0   length
length * 1   length
E * E   (E is a big expression)   let x = E in x*x
if x then f a b c1
else f a b c2
  f a b (if x then c1 else c2)


Don't Rewrap Functions.  When passing a function as an argument to another function, don't rewrap the function unnecessarily.  Here's an example:

List.map (fun x -> sqrt x) [1.0, 4.0, 9.0, 16.0]

List.map sqrt [1.0, 4.0, 9.0, 16.0]

The latter is better. Another case for rewrapping a function is often associated with infix binary operators. To prevent rewrapping the binary operator, put parentheses around the operator to refer to the function form, as in the following example:

List.fold_left (fun (x,y) -> x + y) 0

List.fold_left (+) 0

The latter is better. Note that the prefix form of the multiplication operator, *, is ( * ). (*) is the start of a comment.

Avoid Computing Values Twice.  If you compute a value twice, you're wasting CPU time and making your program ugly. The best way to avoid computing values twice is to create a let expression and bind the computed value to a variable name. This has the added benefit of letting you document the purpose of the value with a name.

Do not introduce extra variable bindings, unless they convey useful information.

Bad

     let x = input_line stdin in
     match x with
         ...

Good

     match input_line stdin with
       ...

Bad (provided y is not a large expression)

let x = y*y in x+z 

Good

y*y + z