We started off this course using the substitution model to understand how OCaml programs evaluate. The basic idea is simple: evaluate subexpressions to values, and when you have a function call, substitute the argument value for the formal parameter within the body of the function, and then evaluate the resulting expression.
But the substitution model is not without its shortcomings. It's not a very efficient or realistic model of OCaml programs are really evaluated. It's also difficult to extend the substitution model to support side effects, which we will soon be covering. So last week we introduced a more realistic semantics called the environment model that is much closer to how OCaml is really implemented.
Let's review the environment model. We start with a core sublanguage of OCaml:
e ::= c | (op) | x | (e1, ..., en) | C e | e1 e2 | fun x -> e | let x = e1 in e2 | match e0 with p1 -> e1 | ... | pn -> en | let rec f x = e1 in e2
where e
represents an expression, c
a constant,
(op)
a
built-in operator, x
and f
an identifier,
C
a constructor, and p
a pattern.
Expressions are evaluated with respect to an environment,
which is a mapping from identifiers to values.
We can think of an environment as a set of substitutions that we
have delayed making.
Values are, except for closures, a syntactic subset of expressions:
v ::= c | (op) | x | (v1, ..., vn) | C v | << f, fun x -> e, env >>
where env
is an environment and
<< f, fun x -> e, env >>
is a recursive closure. The closure records the name f
of a recursive function, its implementation fun x -> e
,
and the dynamic environment env
that was current
at the time the function was defined. If the function is anonymous,
rather then recursive, then we just leave the name out of the closure,
so that it looks like << fun x -> e, env >>
We write env :: e --> v
to mean that e
evaluates to v
in environment env
.
We write env(x)
to look up the value of x
in env
.
We write {}
to mean the empty environment (which is never truly
empty, because OCaml opens Pervasives
automatically).
We write env+{x=v}
to mean the same environment as env
, except that
x
is now bound to v
instead of whatever value (if any) it had before.
Constants and operators:
env :: c --> c env :: (op) --> (op)
Variables:
if env(x) = v then env :: x --> v
Constructors:
if env :: e --> v then env :: C e --> C v
Tuples:
if env :: en --> vn and ... and env :: e2 --> v2 and env :: e1 --> v1 then env :: (e1,...,en) --> (v1,...,vn)
Let expressions:
if env :: e1 --> v1 and env + {x=v1} :: e2 --> v2 then env :: (let x = e1 in e2) --> v2
Let rec expressions:
If env + {f=<< f, fun x -> e1, env >>} :: e2 --> v2 then env :: (let rec f x = e1 in e2) --> v2
Match expressions:
if env :: e0 --> v0 and pi is the first pattern to match v0 and that match produces bindings b and env+b :: ei --> vi then env :: (match e0 with p1 -> e1 | ... | pn -> en) --> vi
Function expressions:
env :: (fun x -> e) --> << fun x -> e, env >>
Anonymous function application:
if env :: e2 --> v2 and env :: e1 --> << fun x-> e, env' >> and env' + {x=v2} :: e --> v then env :: (e1 e2) --> v
Recursive function application:
if env :: e2 --> v2 and env :: e1 --> << f, fun x-> e, env' >> and env' + {x=v2, f=<< f, fun x-> e, env' >>} :: e --> v then env :: (e1 e2) --> v
Primitive operator application:
and env :: e1 --> v1 and env :: e --> (op) then env :: ((op) e1) --> "op" e1 (* where "op" means the primitive operator built-in to OCaml *)
The evaluation rules above use lexical scope, in which the meaning of a variable name is determined by where the name appears lexically in the source code, rather than when the variable name is dynamically evaluated. The latter would be dynamic scope. We saw rules for dynamic scope in lecture. OCaml (and nearly all modern languages) use lexical scope.
You should practice evaluating some expressions using the environment model to see how it works.
let d = 2 in let f = fun x -> x + d in let d = 1 in f 2 --> 4
because
let
binds d
to 2
,
creating environment {d=2}
.let
binds f
to
<< fun x->x+d, {d=2} >>
,
creating environment {d=2, f=<< fun x->x+d, {d=2} >>}
.let
binds d
to 1
,
shadowing the previous binding, and
creating environment {d=1, f=<< fun x->x+d, {d=2} >>}
.
Note how the original binding of x
persists in the closure.
That's the key insight to take away from this example.f
evaluates to the closure
<< fun x->x+d, {d=2} >>
.2
creates an environment
{d=2,x=2}
. The function body x+d
is then
evaluated in that environment.d
evaluates to 2
.
So does x
.2+2
, which is really syntactic sugar for
(+) 2 2
.2
and 2
yields 4
.4
.let rec fact n = if n = 0 then 1 else n * (fact (n-1)) in fact 1 --> 1
because
let rec
binds fact
to
a closure cl = << fact, fun n -> if n = 0 then 1 else n * (fact (n-1)) , {} >>
.fact
evaluates to that closure.1
creates
an environment env={n=1, fact=<< fact, fun n -> if n = 0 then 1 else n * (fact (n-1)) , {} >>}
.if n = 0 then 1 else n * (fact (n-1))
is evaluated in that environment.false
(here we're skipping a couple steps,
including looking up n
in the environment, and applying
the primitive equality operator).if
expression therefore proceeds by evaluating the else
branch, which is n * (fact (n-1))
,
in the environment env
fact (n-1)
is evaluated first.n-1
evaluates to 0
. (Again we're skipping a couple steps.)fact
evaluates to the closure cl
.0
creates
an environment {n=0, fact=<< fact, fun n -> if n = 0 then 1 else n * (fact (n-1)) , {} >>}
.
Note how fact
stays around in the environment for recursive calls.
That's the key insight to take away from this example.if n = 0 then 1 else n * (fact (n-1))
is evaluated in that environment.true
(here we're skipping a couple steps,
including looking up n
in the environment, and applying
the primitive equality operator).if
expression therefore proceeds by evaluating the then
branch, which is 1
.n
is evaluated second. It evaluates
to 1
in env
.
1*1
, which is a primitive operation and evaluates to 1
.
let rec filter = fun f -> fun xs -> (match xs with [] -> [] | x::xs' -> if f x then x::(filter f xs') else filter f xs') in let all_gt = fun n -> fun xs -> filter (fun x -> x > n) xs in all_gt 1 [1;2] --> [2]
because
let rec
binds filter
to
a closure << filter, fun f -> F, {} >>
,
where F
is the body of filter
as given above.let
binds all_gt
to
a closure << fun n -> A, {filter=<< filter, fun f -> F, {} >>} >>
,
where A
is the body of all_gt
as given above.all_gt
evaluates to the closure just created.1
creates an environment {filter=<< filter, fun f -> F, {} >>, n=1}
.fun xs -> filter (fun x -> x > n) xs
is evaluated in that environment.
It evaluates to a new closure << fun xs -> filter (fun x -> x > n) xs, {filter=<< filter, fun f -> F, {} >>, n=1} >>
.
[1;2]
.
That creates an environment {filter=<< filter, fun f -> F, {} >>, n=1, xs=[1;2]}
.
filter (fun x -> x > n) xs
is evaluated
in that latest environment. If we fully parenthesize, that body is
(filter (fun x -> x > n)) xs
. So the next step is
to evaluate the function application filter (fun x -> x > n)
.
That requires first evaluating the argument.(fun x -> x > n)
is evaluated.
It produces a closure << fun x -> x > n, {filter=<< filter, fun f -> F, {} >>, n=1, xs=[1;2]} >>
.
Note how the closure saves the value of n
. That's the key insight to
take away from this example.[2]
.Those examples get tedious. It's no wonder we have computers to compute them for us!