Functions
Topics:
- five essential components to learning a language
- if expressions
- function definitions
- anonymous functions
- function application
- the pipeline operator
Learning a language
One of the secondary goals of this course is not just for you to learn a new programming language, but to improve your skills at learning languages in general—that is, to learn how to learn new languages.
There are five essential components to learning a language:
Syntax: By syntax, we mean the rules that define what constitutes a textually well-formed program in the language, including the keywords, restrictions on whitespace and formatting, punctuation, operators, etc. One of the more annoying aspects of learning a new language can be that the syntax feels odd compared to languages you already know. But the more languages you learn, the more you'll become used to accepting the syntax of the language for what it is, rather than wishing it were different. (If you want to see some languages with really unusual syntax, take a look at APL, which needs its own extended keyboard, and Whitespace, in which programs consist entirely of spaces, tabs, and newlines.) You need to understand syntax just to be able to speak to the computer at all.
Semantics: By semantics, we mean the rules that define the behavior of programs. In other words, semantics is about the meaning of a program—what computation a particular piece of syntax represents. There are two pieces to semantics, the dynamic semantics of a language and the static semantics of a language. The dynamic semantics define the run-time behavior of a program as it is executed or evaluated. The static semantics define the compile-time checking that is done to ensure that a program is legal, beyond any syntactic requirements. The most important kind of static semantics is probably type checking: the rules that define whether a program is well typed or not. Learning the semantics of a new language is usually the real challenge, even though the syntax might be the first hurdle you have to overcome. You need to understand semantics to say what you mean to the computer, and you need to say what you mean so that your program performs the right computation.
Idioms: By idioms, we mean the common approaches to using language features to express computations. Given that you might express one computation in many ways inside a language, which one do you choose? Some will be more natural than others. Programmers who are fluent in the language will prefer certain modes of expression over others. We could think of this in terms of using the dominant paradigms, whether they are imperative, functional, object oriented, etc., in the language effectively. You need to understand idioms to say what you mean not just to the computer, but to other programmers. When you write code idiomatically, other programmers will understand your code better.
Libraries: Libraries are bundles of code that have already been written for you and can make you a more productive programmer, since you won't have to write the code yourself. (It's been said that laziness is a virtue for a programmer.) Part of learning a new language is discovering what libraries are available and how to make use of them. A language usually provides a standard library that gives you access to a core set of functionality, much of which you would be unable to code up in the language yourself, such as file I/O.
Tools: At the very least any language implementation provides either a compiler or interpreter as a tool for interacting with the computer using the language. But there are other kinds of tools: debuggers; integrated development environments (IDE); and analysis tools for things like performance, memory usage, and correctness. Learning to use tools that are associated with a language can also make you a more productive programmer. Sometimes it's easy to confuse the tool itself for the language; if you've only ever used Eclipse and Java together for example, it might not be apparent that Eclipse is an IDE that works with many languages, and that Java can be used without Eclipse.
When it comes to learning OCaml in this class, our focus is primarily on semantics and idioms. We'll have to learn syntax along the way, of course, but it's not the interesting part of our studies. We'll get some exposure to the OCaml standard library and a couple other libraries, notably OUnit (a unit testing framework similar to JUnit, HUnit, etc.). Besides the OCaml compiler and build system, the main tool we'll use is the toplevel, which provides the ability interactively experiment with code.
Expressions
The primary piece of OCaml syntax is the expression. Just like
programs in imperative languages are primarily built out of commands,
programs in functional languages are primarily built out of expressions.
Examples of the kinds of expressions that you saw in the first lab
include 2+2
, if 3+5 > 2 then "yay!" else "boo!"
, and increment 21
.
The OCaml manual has a complete definition of all the expressions in the language. Though that page starts with a rather cryptic overview, if you scroll down, you'll come to some English explanations. Don't worry about studying that page now; just know that it's available for reference.
The primary task of computation in a functional language is to
evaluate an expression to a value. A value is an expression for
which there is no computation remaining to be performed. So, all values
are expressions, but not all expressions are values. Examples of values
include 2
, true
, and "yay!"
.
The OCaml manual also has a definition of all the values, though again, that page is mostly useful for reference rather than study.
Sometimes an expression might fail to evaluate to a value. There are two reasons that might happen:
- Evaluation of the expression raises an exception.
- Evaluation of the expression never terminates (e.g., it enters an "infinite loop").
If expressions
You learned about if expressions in the previous lab. Now let's study their syntax and semantics.
Syntax. The syntax of an if expression:
if e1 then e2 else e3
The letter e
is used here to represent any other OCaml expression; it's an
example of a syntactic variable aka metavariable, which is not actually
a variable in the OCaml language itself, but instead a name for a certain
syntactic construct. The numbers after the letter e
are being used
to distinguish the three different occurrences of it.
Dynamic semantics. The dynamic semantics of an if expression:
if
e1
evaluates totrue
, and ife2
evaluates to a valuev
, thenif e1 then e2 else e3
evaluates tov
if
e1
evaluates tofalse
, and ife3
evaluates to a valuev
, thenif e1 then e2 else e3
evaluates tov
.
We call these evaluation rules: they define how to evaluate expressions.
Note how it takes two rules to describe the evaluation of an if expression,
one for when the guard is true, and one for when the guard is false.
The letter v
is used here to represent any OCaml value; it's another
example of a metavariable. Later in the semester we will develop
a more mathematical way of expressing dynamic semantics, but for now
we'll stick with this more informal style of explanation.
Static semantics. The static semantics of an if expression:
- if
e1
has typebool
ande2
has typet
ande3
has typet
thenif e1 then e2 else e3
has typet
We call this a typing rule: it describes how to type check an expression.
Note how it only takes one rule to describe the type checking of an if expression.
At compile time, when type checking is done, it makes no difference whether the
guard is true or false; in fact, there's no way for the compiler to know
what value the guard will have at run time. The letter t
here is used
to represent any OCaml type; the OCaml manual also has definition of
all types (which curiously does not name
the base types of the language like int
and bool
).
We're going to be write "has type" a lot, so let's introduce a more compact
notation for it. Whenever we would write "e
has type t
", let's instead
write e : t
. The colon is pronounced "has type". This usage of colon
is consistent with how the toplevel responds after it evaluates an expression
that you enter:
# let x = 42;;
val x : int = 42
In the above example, variable x
has type int
, which is what the colon indicates.
Function definitions
The last example above, let x = 42
, has an expression in it (42
)
but is not itself an expression. Rather, it is a definition.
Definitions bind values to names, in this case the value 42
being
bound to the name x
. The OCaml manual has definition of
all definitions
(see the third major grouping titled "definition" on that page), but again
that manual page is primarily for reference not for study.
Definitions are not expressions, nor are expressions definitions—
they are distinct syntactic classes. But definitions can have expressions
nested inside them, and vice-versa.
We will return to the topic of definitions in general in the next lecture. For now, let's focus on one particular kind of definition, a function definition. You got some practice with these in recitation last time. Now let's study their syntax and semantics.
First, here's an example of a function definition:
(* requires: y>=0 *)
(* returns: x to the power of y *)
let rec pow x y =
if y=0 then 1
else x * pow x (y-1)
We provided a specification comment above the function to document the
precondition (requires
) and postcondition (returns
) of the function.
Note how we didn't have to write any types: the OCaml compiler infers
them for us automatically. The compiler solves this type inference
problem algorithmically, but we could do it ourselves, too.
It's like a mystery that can be solved by our mental power of deduction:
Since the if expression can return
1
in thethen
branch, we know by the typing rule forif
that the entire if expression has typeint
.Since the if expression has type
int
, the function's return type must beint
.Since
y
is compared to0
with the equality operator,y
must be anint
.Since
x
is multiplied with another expression using the*
operator,x
must be anint
.
If we did want to write down the types for some reason, we could do that:
(* requires: y>=0 *)
(* returns: x to the power of y *)
let rec pow (x:int) (y:int) : int =
if y=0 then 1
else x * pow x (y-1)
When we write the type annotations for x
and y
the parentheses are
mandatory. We will generally leave out these annotations, because
it's simpler to let the compiler infer them. There are other times when you'll
want to explicitly write down types though. One particularly useful time
is when you get a type error from the compiler that doesn't make sense.
Explicitly annotating the types can help with debugging such an error message.
Syntax. The syntax for function definitions:
let rec f x1 x2 ... xn = e
The f
is a metavariable indicating an identifier being used as a function
name. These identifiers must begin with a lowercase letter. The remaining
rules for lowercase identifiers can be found in the manual.
The names x1
through xn
are metavariables indicating argument identifiers.
These follow the same rules as function identifiers. The keyword rec
is required if f
is to be a recursive function; otherwise it may be omitted.
Note that syntax for function definitions is actually simplified compared to what OCaml really allows. We will learn more about some augmented syntax for function definition in the next couple weeks. But for now, this simplified version will help us focus.
Mutually recursive functions can be defined with the and
keyword:
let rec f x1 ... xn = e1
and g y1 ... yn = e2
For example:
(* [even n] is whether [n] is even.
* requires: [n >= 0] *)
let rec even n =
n=0 || odd (n-1)
(* [odd n] is whether [n] is odd.
* requires: [n >= 0] *)
and odd n =
n<>0 && even (n-1);;
The syntax for function types:
t -> u
t1 -> t2 -> u
t1 -> ... -> tn -> u
The t
and u
are metavariables indicating types. Type t -> u
is the
type of a function that takes an input of type t
and returns an output
of type u
. We can think of t1 -> t2 -> u
as the type of a function
that takes two inputs, the first of type t1
and the second of type
t2
, and returns an output of type u
. Likewise for a function that
takes n
arguments.
Dynamic semantics.
There is no dynamic semantics of function definitions. There is nothing
to be evaluated. OCaml just records that the name f
is bound to a function
with the given arguments x1..xn
and the given body e
. Only later, when
the function is applied, will there be some evaluation to do.
Static semantics. The static semantics of function definitions:
- For non-recursive functions: if by assuming that
x1:t1
andx2:t2
and ... andxn:tn
, we can conclude thate:u
, thenf : t1 -> t2 -> ... -> tn -> u
. - For recursive functions: if by assuming that
x1:t1
andx2:t2
and ... andxn:tn
andf : t1 -> t2 -> ... -> tn -> u
, we can conclude thate:u
, thenf : t1 -> t2 -> ... -> tn -> u
.
Note how the type checking rule for recursive functions assumes that the
function identifier f
has a particular type, then checks to see whether
the body of the function is well-typed under that assumption. This is
because f
is in scope inside the function body itself (just like the arguments
are in scope).
Anonymous functions
We already know that we can have values that are not bound to names.
The integer 42
, for example, can be entered at the toplevel without
giving it a name:
# 42;;
- : int = 42
Or we can bind it to a name:
# let x = 42;;
val x : int = 42
Similarly, OCaml functions do not have to have names; they may be
anonymous. For example, here is an anonymous function that increments
its input: fun x -> x+1
. Here, fun
is a keyword indicating an
anonymous function, x
is the argument, and ->
separates the argument
from the body.
We now have two ways we could write an increment function:
let inc x = x + 1
let inc = fun x -> x+1
They are syntactically different but semantically equivalent. That is, even though they involve different keywords and put some identifiers in different places, they mean the same thing.
Anonymous functions are also called lambda expressions, a term that
comes out of the lambda calculus, which is a mathematical model
of computation in the same sense that Turing machines are a model
of computation. In the lambda calculus, fun x -> e
would
be written λx.e. The λ denotes
an anonymous function.
It might seem a little mysterious right now why we would want functions that have no names. Don't worry; we'll see good uses for them later in the course. In particular, we will often create anonymous functions and pass them as input to other functions.
Syntax.
fun x1 ... xn -> e
Static semantics.
- If by assuming that
x1:t1
andx2:t2
and ... andxn:tn
, we can conclude thate:u
, thenfun x1 ... xn -> e : t1 -> t2 -> ... -> tn -> u
.
Dynamic semantics.
An anonymous function is already a value. There is no computation to be performed.
Function application
Today we cover a somewhat simplified syntax of function application compared to what OCaml actually allows.
Syntax.
e0 e1 e2 ... en
The first expression e0
is the function, and it is applied to
arguments e1
through en
. Note that parentheses are not required
around the arguments to indicate function application, as they are in
languages in the C family, including Java.
Static semantics.
- If
e0 : t1 -> ... -> tn -> u
ande1:t1
and ... anden:tn
thene0 e1 ... en : u
.
Dynamic semantics.
To evaluate e0 e1 ... en
:
Evaluate
e0
to a function. Also evaluate the argument expressionse1
throughen
to valuesv1
throughvn
.For
e0
, the result might be an anonymous functionfun x1 ... xn -> e
. Or it might a namef
, and we have to find the definition off
, in which case let's assume that definition islet rec f x1 ... xn = e
. Either way, we now know the argument namesx1
throughxn
and the bodye
.Substitute each value
vi
for the corresponding argument namexi
in the bodye
of the function. That results in a new expressione'
.Evaluate
e'
to a valuev
, which is the result of evaluatinge0 e1 ... en
.
Pipeline
There is a built-in infix operator in OCaml for function application that
is written |>
. Imagine that as depicting a triangle pointing to the
right. It's called the pipeline operator, and the metaphor is that
values are sent through the pipeline from left to right. For example,
suppose we have the increment function inc
from above as well as
a function square
that squares its input. Here are two equivalent
ways of writing the same computation:
square (inc 5)
5 |> inc |> square
(* both yield 36 *)
The latter way of writing the computation uses the pipeline operator to
send 5
through the inc
function, then send the result of that
through the square
function. This is a nice, idiomatic way of
expressing the computation in OCaml. The former way is ok but arguably
not as elegant, because it involves writing extra parentheses and
requires the reader's eyes to jump around, rather than move linearly
from left to right. The latter way scales up nicely when the number
of functions being applied grows, where as the former way requires
more and more parentheses:
5 |> inc |> square |> inc |> inc |> square
square (inc (inc (square (inc 5))))
(* both yield 1444 *)
It might feel weird at first, but try using the pipeline operator in your own code the next time you find yourself writing a big chain of function applications.
Since e1 |> e2
is just another way of writing e2 e1
, we don't need
to state the semantics for |>
: it's just the same as function application.
These two programs are another example of expressions
that are syntactically different but semantically equivalent.
Summary
Syntax and semantics are a powerful paradigm for learning a programming language. As we learn the features of OCaml, we're being careful to write down their syntax and semantics. We've seen that there can be multiple syntaxes for expressing the same semantic idea, that is, the same computation. The semantics of function application is the very heart of OCaml and of functional programming, and it's something we will come back to several times throughout the course to deepen our understanding.
Terms and concepts
- anonymous functions
- definitions
- dynamic semantics
- evaluation
- expressions
- function application
- function definitions
- identifiers
- idioms
- if expressions
- lambda expressions
- libraries
- metavariables
- mutual recursion
- pipeline operator
- recursion
- semantics
- static semantics
- syntax
- tools
- type checking
- type inference
- values
Further reading
- Introduction to Objective Caml, chapter 3
- OCaml from the Very Beginning, chapter 2
- Real World OCaml, chapter 2