Loading [MathJax]/jax/output/HTML-CSS/jax.js

Functions


Topics:


Learning a language

One of the secondary goals of this course is not just for you to learn a new programming language, but to improve your skills at learning languages in general—that is, to learn how to learn new languages.

There are five essential components to learning a language:

When it comes to learning OCaml in this class, our focus is primarily on semantics and idioms. We'll have to learn syntax along the way, of course, but it's not the interesting part of our studies. We'll get some exposure to the OCaml standard library and a couple other libraries, notably OUnit (a unit testing framework similar to JUnit, HUnit, etc.). Besides the OCaml compiler and build system, the main tool we'll use is the toplevel, which provides the ability interactively experiment with code.

Expressions

The primary piece of OCaml syntax is the expression. Just like programs in imperative languages are primarily built out of commands, programs in functional languages are primarily built out of expressions. Examples of the kinds of expressions that you saw in the first lab include 2+2, if 3+5 > 2 then "yay!" else "boo!", and increment 21.

The OCaml manual has a complete definition of all the expressions in the language. Though that page starts with a rather cryptic overview, if you scroll down, you'll come to some English explanations. Don't worry about studying that page now; just know that it's available for reference.

The primary task of computation in a functional language is to evaluate an expression to a value. A value is an expression for which there is no computation remaining to be performed. So, all values are expressions, but not all expressions are values. Examples of values include 2, true, and "yay!".

The OCaml manual also has a definition of all the values, though again, that page is mostly useful for reference rather than study.

Sometimes an expression might fail to evaluate to a value. There are two reasons that might happen:

  1. Evaluation of the expression raises an exception.
  2. Evaluation of the expression never terminates (e.g., it enters an "infinite loop").

If expressions

You learned about if expressions in the previous lab. Now let's study their syntax and semantics.

Syntax. The syntax of an if expression:

if e1 then e2 else e3

The letter e is used here to represent any other OCaml expression; it's an example of a syntactic variable aka metavariable, which is not actually a variable in the OCaml language itself, but instead a name for a certain syntactic construct. The numbers after the letter e are being used to distinguish the three different occurrences of it.

Dynamic semantics. The dynamic semantics of an if expression:

We call these evaluation rules: they define how to evaluate expressions. Note how it takes two rules to describe the evaluation of an if expression, one for when the guard is true, and one for when the guard is false. The letter v is used here to represent any OCaml value; it's another example of a metavariable. Later in the semester we will develop a more mathematical way of expressing dynamic semantics, but for now we'll stick with this more informal style of explanation.

Static semantics. The static semantics of an if expression:

We call this a typing rule: it describes how to type check an expression. Note how it only takes one rule to describe the type checking of an if expression. At compile time, when type checking is done, it makes no difference whether the guard is true or false; in fact, there's no way for the compiler to know what value the guard will have at run time. The letter t here is used to represent any OCaml type; the OCaml manual also has definition of all types (which curiously does not name the base types of the language like int and bool).

We're going to be write "has type" a lot, so let's introduce a more compact notation for it. Whenever we would write "e has type t", let's instead write e : t. The colon is pronounced "has type". This usage of colon is consistent with how the toplevel responds after it evaluates an expression that you enter:

# let x = 42;;
val x : int = 42

In the above example, variable x has type int, which is what the colon indicates.

Function definitions

The last example above, let x = 42, has an expression in it (42) but is not itself an expression. Rather, it is a definition. Definitions bind values to names, in this case the value 42 being bound to the name x. The OCaml manual has definition of all definitions (see the third major grouping titled "definition" on that page), but again that manual page is primarily for reference not for study. Definitions are not expressions, nor are expressions definitions— they are distinct syntactic classes. But definitions can have expressions nested inside them, and vice-versa.

We will return to the topic of definitions in general in the next lecture. For now, let's focus on one particular kind of definition, a function definition. You got some practice with these in recitation last time. Now let's study their syntax and semantics.

First, here's an example of a function definition:

(* requires: y>=0 *)
(* returns: x to the power of y *)
let rec pow x y = 
  if y=0 then 1 
  else x * pow x (y-1)

We provided a specification comment above the function to document the precondition (requires) and postcondition (returns) of the function. Note how we didn't have to write any types: the OCaml compiler infers them for us automatically. The compiler solves this type inference problem algorithmically, but we could do it ourselves, too. It's like a mystery that can be solved by our mental power of deduction:

If we did want to write down the types for some reason, we could do that:

(* requires: y>=0 *)
(* returns: x to the power of y *)
let rec pow (x:int) (y:int) : int = 
  if y=0 then 1 
  else x * pow x (y-1)

When we write the type annotations for x and y the parentheses are mandatory. We will generally leave out these annotations, because it's simpler to let the compiler infer them. There are other times when you'll want to explicitly write down types though. One particularly useful time is when you get a type error from the compiler that doesn't make sense. Explicitly annotating the types can help with debugging such an error message.

Syntax. The syntax for function definitions:

let rec f x1 x2 ... xn = e

The f is a metavariable indicating an identifier being used as a function name. These identifiers must begin with a lowercase letter. The remaining rules for lowercase identifiers can be found in the manual. The names x1 through xn are metavariables indicating argument identifiers. These follow the same rules as function identifiers. The keyword rec is required if f is to be a recursive function; otherwise it may be omitted.

Note that syntax for function definitions is actually simplified compared to what OCaml really allows. We will learn more about some augmented syntax for function definition in the next couple weeks. But for now, this simplified version will help us focus.

Mutually recursive functions can be defined with the and keyword:

let rec f x1 ... xn = e1
and g y1 ... yn = e2

For example:

(* [even n] is whether [n] is even.
 * requires: [n >= 0] *)
let rec even n = 
  n=0 || odd (n-1) 

(* [odd n] is whether [n] is odd.
 * requires: [n >= 0] *)
and odd n = 
  n<>0 && even (n-1);;

The syntax for function types:

t -> u
t1 -> t2 -> u
t1 -> ... -> tn -> u

The t and u are metavariables indicating types. Type t -> u is the type of a function that takes an input of type t and returns an output of type u. We can think of t1 -> t2 -> u as the type of a function that takes two inputs, the first of type t1 and the second of type t2, and returns an output of type u. Likewise for a function that takes n arguments.

Dynamic semantics. There is no dynamic semantics of function definitions. There is nothing to be evaluated. OCaml just records that the name f is bound to a function with the given arguments x1..xn and the given body e. Only later, when the function is applied, will there be some evaluation to do.

Static semantics. The static semantics of function definitions:

Note how the type checking rule for recursive functions assumes that the function identifier f has a particular type, then checks to see whether the body of the function is well-typed under that assumption. This is because f is in scope inside the function body itself (just like the arguments are in scope).

Anonymous functions

We already know that we can have values that are not bound to names. The integer 42, for example, can be entered at the toplevel without giving it a name:

# 42;;
- : int = 42

Or we can bind it to a name:

# let x = 42;;
val x : int = 42

Similarly, OCaml functions do not have to have names; they may be anonymous. For example, here is an anonymous function that increments its input: fun x -> x+1. Here, fun is a keyword indicating an anonymous function, x is the argument, and -> separates the argument from the body.

We now have two ways we could write an increment function:

let inc x = x + 1
let inc = fun x -> x+1

They are syntactically different but semantically equivalent. That is, even though they involve different keywords and put some identifiers in different places, they mean the same thing.

Anonymous functions are also called lambda expressions, a term that comes out of the lambda calculus, which is a mathematical model of computation in the same sense that Turing machines are a model of computation. In the lambda calculus, fun x -> e would be written λx.e. The λ denotes an anonymous function.

It might seem a little mysterious right now why we would want functions that have no names. Don't worry; we'll see good uses for them later in the course. In particular, we will often create anonymous functions and pass them as input to other functions.

Syntax.

fun x1 ... xn -> e

Static semantics.

Dynamic semantics.

An anonymous function is already a value. There is no computation to be performed.

Function application

Today we cover a somewhat simplified syntax of function application compared to what OCaml actually allows.

Syntax.

e0 e1 e2 ... en

The first expression e0 is the function, and it is applied to arguments e1 through en. Note that parentheses are not required around the arguments to indicate function application, as they are in languages in the C family, including Java.

Static semantics.

Dynamic semantics.

To evaluate e0 e1 ... en:

  1. Evaluate e0 to a function. Also evaluate the argument expressions e1 through en to values v1 through vn.

    For e0, the result might be an anonymous function fun x1 ... xn -> e. Or it might a name f, and we have to find the definition of f, in which case let's assume that definition is let rec f x1 ... xn = e. Either way, we now know the argument names x1 through xn and the body e.

  2. Substitute each value vi for the corresponding argument name xi in the body e of the function. That results in a new expression e'.

  3. Evaluate e' to a value v, which is the result of evaluating e0 e1 ... en.

Pipeline

There is a built-in infix operator in OCaml for function application that is written |>. Imagine that as depicting a triangle pointing to the right. It's called the pipeline operator, and the metaphor is that values are sent through the pipeline from left to right. For example, suppose we have the increment function inc from above as well as a function square that squares its input. Here are two equivalent ways of writing the same computation:

square (inc 5)
5 |> inc |> square
(* both yield 36 *)

The latter way of writing the computation uses the pipeline operator to send 5 through the inc function, then send the result of that through the square function. This is a nice, idiomatic way of expressing the computation in OCaml. The former way is ok but arguably not as elegant, because it involves writing extra parentheses and requires the reader's eyes to jump around, rather than move linearly from left to right. The latter way scales up nicely when the number of functions being applied grows, where as the former way requires more and more parentheses:

5 |> inc |> square |> inc |> inc |> square  
square (inc (inc (square (inc 5))))
(* both yield 1444 *)

It might feel weird at first, but try using the pipeline operator in your own code the next time you find yourself writing a big chain of function applications.

Since e1 |> e2 is just another way of writing e2 e1, we don't need to state the semantics for |>: it's just the same as function application. These two programs are another example of expressions that are syntactically different but semantically equivalent.

Summary

Syntax and semantics are a powerful paradigm for learning a programming language. As we learn the features of OCaml, we're being careful to write down their syntax and semantics. We've seen that there can be multiple syntaxes for expressing the same semantic idea, that is, the same computation. The semantics of function application is the very heart of OCaml and of functional programming, and it's something we will come back to several times throughout the course to deepen our understanding.

Terms and concepts

Further reading