JoCalf Language Manual
JoCalf is an language that is most similar to OCaml, but also inherits features from JavaScript and Racket. From OCaml, JoCalf gets its basic syntax, higher-order functions, and references. From Racket, an untyped functional language in the Scheme family, JoCalf gets functions that truly take multiple arguments (unlike OCaml functions, which take just a single argument, despite syntactic sugar that makes it appear otherwise). From JavaScript, JoCalf gets operators that convert their arguments' run-time types, exceptions that can be throw and caught, and objects with fields that can be mutated.
The semantics described in English in this document assumes that desugaring has already been performed. This informal semantics describes how evaluation occurs, but it mostly elides the details of how exceptions propagate and how changes to the state are threaded throughout evaluation. Those details can be found in the formal semantics.
Table of Contents:
Tutorial
This brief tutorial shows off some features of JoCalf using its toplevel (aka REPL). Like the OCaml toplevel, JoCalf's toplevel accepts phrases, which are either expressions or let definitions. Terminating a phrase with double semicolon is optional.
Basics
JoCalf's operators are more flexible than OCaml's, allowing
arguments of various types to be combined. But sometimes
the result is a special value undefined
.
# 1 + 1
2
# "1" + "1"
"11"
# 31 + "10"
"3110"
# 1 * "zzz"
undefined
Let expressions and definitions are much the same as OCaml.
# let x = 1+1 in x+x
4
# let x = 1
1
# x
1
# y
Exception: "Unbound variable"
If expressions are more flexible, allowing non-Boolean guard expressions, missing else branches, and branches of differing types. The short-circuit conjunction and disjunction operators return values that might not be Boolean.
# if true then 42 else "forty two"
42
# if 3110 then "yay" else "boo"
"yay"
# if 0 then "yay"
undefined
# true && 1
1
# 1 && true
true
# "cool cool" || false
"cool cool"
Functions and external functions
Functions are always anonymous and permit multiple arguments. Moreover, functions may not be partially applied; all arguments must be supplied when the function is applied.
# let add = fun (x y) -> x + y
<closure>
# add 2 3
5
# add 1
Exception: "Application: wrong number of arguments"
The parentheses around the arguments in a function expression are mandatory.
# let add = fun x y -> x + y
Syntax error, line 1, characters 14-15: x
Let bindings may be used to create recursive functions.
# let rec fact = fun (n) -> if n = 0 then 1 else n * (fact (n-1))
<closure>
# fact 5
120
External functions are functions that can be used in JoCalf but are actually implemented in OCaml—similarly to how OCaml provides some functions in its standard library that are actually implemented in C. Externals are needed when the function would be impossible to implement directly in JoCalf.
# length "hello"
5
# is_int 42
42
# is_int "42"
false
Imperative features
References, sequencing, and while loops are similar to OCaml. Assignment returns the value of its right hand side.
# let inc = fun (r) -> r := !r + 1
<closure>
# let x = ref 0
<location>
# x := 10
10
# inc x; inc x; inc x
13
# !x
13
# while !x > 0 do x := !x-1 done
undefined
# !x
0
Exceptions are an amalgam of JavaScript and OCaml:
# throw 42
Exception: 42
# try throw "oops" catch exc handle exc + " caught"
"oops caught"
# try throw 1 catch x handle throw 3 finally throw 2
Exception: 2
Objects
Finally, JoCalf has objects. The object system is relatively immature (as befits a young language): it has mutable fields, but not methods nor inheritance nor a this keyword (implicit or explicit). Hence objects are mostly just mutable records. Their fields can be accessed with a couple different syntactic forms. Field names are always strings, but the field to be accessed can be computed at run time. Objects are heap-allocated, so they appear to be references.
# let o = {"x": 1, "1": 42, "dbl": fun (z) -> 2*z}
<location>
# o["x"]
1
# o.x
1
# o["1"]
42
# o[3-2]
42
# o["d"+"bl"] 10
20
# let o' = {"x": 1, "f" : fun (y) -> x+y}
<location>
# o'.g
undefined (* because g is not a field of o' *)
# o'.f 2
Exception: "Unbound variable" (* because x is not bound in the function *)
Reference
Next, we give a reference for each syntactic form of JoCalf, explaining how each form evaluates.
Syntax
The abstract syntax of JoCalf is as follows:
(* expressions *)
e ::=
| i | b | s | undefined (* constants *)
| (e) | begin e end (* nesting of expressions *)
| x (* variables *)
| let [rec] x = e1 in e2 (* let expressions *)
| fun (x1 ... xn) -> e (* functions *)
| e0 ... en (* application *)
| if e1 then e2 [else e3] (* control flow: if, sequence, while *)
| e1; ...; en
| while e1 do e2 done
| uop e | e1 bop e2 (* unary and binary operators *)
| e1 && e2 | e1 || e2
| ref e | !e | e1 := e2 (* references *)
| throw e (* exceptions *)
| try e1 catch x
handle e2 [finally e3]
| {s1:e1, ..., sn:en} (* objects *)
| e1[e2] | e1.e2
| e1[e2] := e3
| delete e1[e2]
(* definitions *)
d ::=
| let [rec] x = e (* let definition *)
(* unary operators *)
uop ::=
| not | - | typeof
(* binary operators *)
bop ::=
| + | - | * | / | mod
| < | <= | > | >=
| = | != | == | !==
i ::= integers
b ::= booleans
s ::= strings
x ::= identifiers
Dynamic semantics
Values. JoCalf values are defined as follows:
(* values *)
v ::=
| i | s | b | undefined (* constant values *)
| {s1:v1, ..., sn:vn} (* object values *)
| loc | cl | extern (* unexpressible values *)
loc ::= memory locations
cl ::= closures
extern ::= external functions
The final three kinds of values—locations, closures, and external functions—cannot be directly expressed in JoCalf, but result from evaluation. External functions are functions that are available for use by JoCalf programs but are not implemented in JoCalf. Instead, they are implemented in OCaml as part of the interpreter itself.
Conversion of values. JoCalf allows run-time conversion between different types of values.
To convert a value to a Boolean:
The values
true
, any integer except0
, any string except the empty string""
, and any object, location, extern, or closure all convert to the Booleantrue
. We call these values truthy.The values
false
, the integer0
, the empty string""
, and theundefined
value all convert to the Booleanfalse
. We call these values falsy.
To convert a value to an integer:
An integer
i
converts to itself.The Boolean
true
converts to1
, andfalse
converts to0
.A string
s
converts toPervasives.int_of_string s
if that OCaml function returns an integer. But if that OCaml function raisesFailure
, then the string converts toundefined
.A location, object, extern, or closure converts to
undefined
.The value
undefined
remainsundefined
; it does not become an integer.
To convert a value to a string:
Note that this algorithm is not how the REPL nor how the
interp
function inMain
convert values to strings to display them to the user.A string
s
converts to itself.An integer
i
converts toPervasives.string_of_int i
.The Boolean
true
converts to"true"
, andfalse
converts to"false"
.A location, object, extern, or closure converts to
"undefined"
.The value
undefined
converts to"undefined"
.
To convert a value to a primitive value:
Integer, Boolean, string, and
undefined
are already primitive values. They convert to themselves.Locations, objects, externs, and closures all convert to the primitive value
undefined
.
Expressions. Evaluation of a JoCalf expression requires two additional pieces of information: the environment, giving a mapping from free variables to their values, and the state, giving a mapping from memory locations to their values. Evaluation produces a result, which is either a value or an exception, as well as an updated state.
(* results *)
r ::=
| v (* value *)
| exn v (* exception carrying a value *)
Definitions. Evaluation of a let definition in the toplevel is much like evaluation of an expression, but it additionally introduces new bindings into the environment.
Phrases. A phrase is either a definition or an expression, and evaluates accordingly.
Let definitions
The let definition
let x = e
evaluates e
to a value v
, and binds x
to v
in the
environment.
Constants
Integers. JoCalf integers are the same as OCaml's. On a 64-bit implementation, that means they range from −262 to 262−1. Integer literals may be written in decimal, hexadecimal, octal, and binary:
# 42
42
# 0x2a
42
# 0o52
42
# 0b101010
42
Implementation note: the JoCalf parser produces an OCaml string
,
not an int
, for integers. It is up to your interpreter to instead
produce an integer. There are three problems to solve here.
The parser needs your help to recognize negative integer literals. For example, given the JoCalf expression
-1
, the parser produces a unary negation applied to"1"
. A minus sign-
followed in the JoCalf source code by an integer literal should itself be an integer literal, not a unary negation operation. So your AST factory should detect this situation, eliminate the unary negation, and tranform the string to"-1"
.Sometime before evaluation, those strings need to become an OCaml
int
. A good place to handle this would be during desugaring. The surface AST node for integers could carry astring
, and the core AST node for integers could carry anint
. Your desugaring pass could simpy use OCaml'sint_of_string
to transform thestring
to anint
. Note that since evaluation has not yet begun at that point, no run-time conversion is yet being performed.The parser needs your help to recognize whether integer literals are between the bounds of
min_int
andmax_int
, inclusive. Assuming you have solved the two problems above correctly,int_of_string
will take care of this for you. If it produces an exception, your desugaring pass should produce an error and refuse to evaluate the program. For example:# 10000000000000000000000000000000000000000 Integer literal exceeds representable range
Strings. JoCalf strings are the same as OCaml's, but do not support Unicode. There is some support for escaping provided by the parser:
# "\052" + "\050"
"42"
# "\n"
"\n"
Booleans.
JoCalf's Booleans are true
and false
.
Undefined.
JoCalf has a special value written undefined
that is similar to
OCaml's unit value ()
, in that it is the only value of its
type. JoCalf uses undefined
to represent the result of
an operation or language construct that does not produce a meaningful
value.
Parenthesized expressions
Expressions can be parenthesized, as in OCaml, with ( ... )
or begin ... end
.
Implementation note: You don't have to do anything to deal with parenthesized expressions. The parser automatically handles them for you. Remember that parentheses are not explicitly represented in an AST; rather, they are implicitly represented by the shape of the tree.
Variables and let expressions
Let expressions bind variables to values. JoCalf variable identifiers are a subset of OCaml identifiers: the first character must be a lowercase letter or an underscore, and any remaining letters must be a letter (of any case), a digit, an underscore, or a single quote.
Evaluation of a variable produces the value to which that variable
is bound in the environment. If the variable is unbound, evaluation
of it produces the exception "Unbound variable"
.
The let expression
let x = e1 in e2
evaluates e1
to a value v
, then evaluates e2
in an environment
in which x
is bound to v
.
Functions and application
JoCalf functions are values, just as OCaml functions are. But unlike OCaml, JoCalf functions are truly multi-argument, and all arguments must be supplied at the point where the function is applied; partial application is not permitted in JoCalf.
The anonymous function
fun (x1 ... xn) -> e
is a function value that takes n
arguments named x1
..xn
and
has a body e
in which those arguments are bound. An anonymous
function evaluates to a closure, which captures the environment
where the function was defined. It is a syntax error for a function
to have zero arguments, or to have arguments with duplicate names.
Implementation note: The parser already detects those syntax errors for you.
Application is written, as in OCaml, by juxtaposing the function and its arguments:
e0 e1 ... en
It is evaluated left-to-right. First e0
is evaluated, and that
must produce either a closure or an external function. If it does not,
evaluation of the function application is done, and the exception
"Application: not a function"
is produced.
If e0
does produce a function (either a closure or an external),
then e1
... en
are evaluated left-to-right.
If all of e1
... en
evaluate to values v1
... vn
,
and if the function takes exactly n
arguments, then the function
is applied to those arguments, producing a result.
If the function takes a different number of arguments than were
provided, then the exception "Application: wrong number of arguments"
is produced. The check for the number of arguments occurs before
the arguments are themselves evaluated.
Control flow: if, sequence, while
The if expression
if e1 then e2 else e3
evaluates the guard e1
, then conditionally evaluates a branch e2
or e3
based on the result of the guard.
First, the guard e1
is evaluated. If the guard is truthy,
then e2
is evaluated. If the guard is falsy, then e3
is evaluated.
It is also possible to omit the else branch:
if e1 then e2
That syntactic form evaluates equivalently to
if e1 then e2 else undefined
The sequence of expressions
e1; e2
evaluates left to right. First e1
is evaluated, then e2
.
We then have two values, v1
and v2
. The result
of the sequence is v2
. The other value, v1
, is ignored.
Likewise, a longer sequence of expressions
e1; e2; ...; en
is evaluated left to right, resulting in the final value vn
corresponding
to en
.
The while loop
while e1 do e2 done
is evaluated by instead evaluating the longer expression
if e1 then (e2; while e1 do e2 done)
and returning whatever result that longer expression produces. Note how that recursively unrolls the while loop during interpretation for as many iterations as are necessary.
Operators
If any of this section seems unnecessarily complicated, reflect on the fact that it is designed to be as faithful as possible to JavaScript, which is an immensely popular language. Also, for five minutes of PL-nerdy entertainment, watch the Wat talk on even weider definitions in dynamically typed languages like Ruby and JavaScript.
Unary operators. Application of a unary operator
uop e
first evaluates e
to a value v
. The result depends on the operator:
not
: Ifv
is truthy, thennot v
isfalse
. Ifv
is falsy, thennot v
istrue
.-
: Application of unary negation- v
first convertsv
to an integer. That might result in an integer, or theundefined
value. In the former case, the result of the operation is the result of applying OCaml's unary negation operator to the integer. The result of applying unary negation toundefined
isundefined
.Implementation note: A minus sign
-
followed in the JoCalf source code by an integer literal is itself an integer literal, not a unary negation operation. See the implementation note above on integers.typeof
: The typeof operator always produces a string. The type ofundefined
is"undefined"
. The type of a Boolean is"bool"
. The type of an integer is"int"
. The type of a string is"string"
. The type of an object is"object"
. The type of a location is"location"
. The type of a closure or an extern is"closure"
—that is, typeof cannot distinguish between functions coded in JoCalf and functions coded in OCaml.
Binary operators. Application of a binary operator
e1 bop e2
proceeds left to right. First, it evaluates e1
, then e2
.
We now have two values v1
and v2
, respectively,
and the result depends on the operator:
+
: First convertv1
andv2
to primitive values. If either one is a string, then convert the other to a string too, and the result is the application of OCaml's string append operator to those strings. Otherwise, further convert the primitives to integers, and the result is the application of OCaml's integer addition operator to those integers. But if conversion to integers causes either operand to convert toundefined
, then the result of the+
operation is likewiseundefined
.-
,*
,/
,mod
: Convertv1
andv2
to integers. The result of the binary operation is the application of OCaml's same binary operator to those integers, with the following deviations:If conversion to integers causes either operand to convert to
undefined
, then the result of the binary operation is likewise theundefined
value.If the binary operation is
/
ormod
andv2
is the integer0
, then the result is the JoCalf exception"Division by zero"
.
<
,<=
,>
,>=
: Convertv1
andv2
to primitives. If they are both strings, then the result is the application of OCaml's same comparison operator to those strings. (Recall, OCaml uses lexicographic comparison on strings.) Otherwise, further convert the primitives to integers, and the result is the application of OCaml's same comparison operator to those integers. But if conversion to integers causes either operand to convert toundefined
, then the result of the comparison operation isfalse
. Note that this is different behavior than+
on undefined operands.=
: Follow these steps, in order.If
v1
andv2
bothundefined
, returntrue
.If
v1
andv2
are both Booleans, or are both integers, or are both strings, then return the result of OCaml's=
operator on those values.If
v1
andv2
are both locations, then dereference both locations, and recurse to determine whether the resulting values are equal.If
v1
andv2
are both objects, then returntrue
if both objects have the same set of fields, and those fields are recursively equal in both objects; otherwise, returnfalse
.If
v1
andv2
are both closures or both externs, then returnfalse
.If either of the two values is an integer, and the other value is either a string or a Boolean, then convert the other value to an integer (which might actually produce
undefined
), and recurse to compare them.Otherwise, return
false
.
!=
: Comparev1
tov2
using=
, then apply JoCalf's unarynot
to that result.==
: Follow these steps. No type conversions are performed at any point during this operation.If
v1
andv2
are bothundefined
, returntrue
.If
v1
andv2
are both an integer, or are both strings, or are both Booleans, then return whether they are equal according OCaml's=
operator.If
v1
andv2
are both locations, then return whether they are the same location. Do not dereference the locations to check their contents. Since objects are desugared to locations, comparison of two objects will always use this rule.If
v1
andv2
are both closures, returnfalse
. Do not check their code or environment; it doesn't matter whether they are equal or not.Otherwise, return
false
.
!==
: Comparev1
tov2
using==
, then apply JoCalf's unarynot
to that result.
Short-circuit operators. Application of a short-circuit operator
e1 && e2
e1 || e2
works differently than application of a binary operator, because the right-hand operand might never be evaluated, depending on the result of the left-hand operand.
Start by evaluating e1
to a value v1
. Evaluation then proceeds
as follows:
&&
: Ifv1
is falsy, returnv1
. Do not evaluatee2
. Ifv1
is truthy, continue to evaluatee2
.||
: Ifv1
is truthy, returnv1
. Do not evaluatee2
. Ifv1
is falsy, continue to evaluatee2
.
If evaluation did not already return, now evaluate e2
to a value v2
.
Return v2
as the result of the short-circuit operation.
Note that in all cases, the expression that finally determined whether the operator as a whole is truthy or falsy is what gets returned. Unlike OCaml's short-circuit operators, that expression need not itself be a Boolean.
References
Evaluation of a reference creation
ref e
first evaluates e
to a value v
. Then it allocates a new location
loc
in the store, and initializes that location to have value v
.
Evaluation of a dereference
!e
first evaluates e
to a value v
. If that value is not a location, the
result of the dereference is the undefined
value. But if v
is a
location, return the value stored at that location. If the location is
not actually allocated in the store, then the result is unspecified (and
in fact this case should be impossible.)
Evaluation of an assignment
e1 := e2
first evaluates e1
, then evaluates e2
. We now have two values, v1
and v2
. If v1
is not a location, the result of the assignment is
the exception "Assignment to non-location"
. If v1
is a location but
is currently unbound in the store, then the result is unspecified (and
in fact this case should be impossible). Otherwise, store v2
at
location v1
, and return v2
as the result of the assignment.
Exceptions
The throw expression
throw e
first evaluates e
. If that evaluation produces an exception exn v'
,
then exn v'
is the result of evaluating throw e
. That is, the inner
exception thrown by e
itself is what actually propagates. But if
e
evaluates to a value v
, then throw e
evaluates to the
exception exn v
.
The try expression
try e1 catch x handle e2
first evaluates e1
. If that produces a value v
, then v
is
the result of evaluating the try expression. But if evaluation of
e1
produces an exception exn v
, then evaluation continues
by binding x
to v
in the environment, evaluating e2
in
that extended environment, and returning the result of that evaluation.
The try-finally expression
try e1 catch x handle e2 finally e3
first evaluates try e1 catch x handle e2
, obtaining a result r
. It
then evaluates e3
. If evaluation of e3
produces an exception, then
that exception is the result of the try-finally and r
is ignored.
Otherwise, r
is the result and the value of e3
is ignored.
Objects
The object expression
{s1:e1, ... sn:en}
evaluates its field expressions e1
...en
left to right,
producing values v1
...vn
. The result of the object
expression is the object value
{s1:v1, ..., sn:vn}
The get-field expression
e1[e2]
first evaluates e1
and e2
to values v1
and v2
. It
then converts v2
to a primitive, then converts that primitive
to a string s
. It then checks whether v1
is an object; if not,
the result of the get-field expression is the undefined
value.
But if v1
is an object, then the field s
is found in that
object. If there is no such field, then the result of the
get-field expression is the undefined
value. If there is
a field s
and it is bound to a value v
, then v
is
the result of the get-field expression.
Implementation note: The alternative form e.x
of the get-field expression is already handled for you by the
parser, which transforms it into e["x"]
for you. So you don't
need to do anything extra to interpret this alternative form.
The update-field expression
e1[e2] := e3
is a special case of the assignment expression e := e'
, where the
left-hand side e
is itself a get-field expression. It first evaluates
e1
, e2
, and e3
to values v1
, v2
, and v3
. It then converts
v2
to a primitive, then converts that primitive to a string s
. It
then checks whether v1
is an object; if not, the result of the
update-field expression is v3
. But if v1
is an object, then its s
field is bound to v3
, and the updated object (not v3
) is the result
of the update-field expression. It doesn't matter whether v1
already
had an s
field or not: if it didn't, the field is created; if it did,
the field is updated.
The delete-field expression
delete e1[e2]
first evaluates e1
and e2
to values v1
and v2
. It then converts
v2
to a primitive, then converts that primitive to a string s
. It
then checks whether v1
is an object; if not, the result of the
delete-field expression is v1
. But if v1
is an object, then its s
field is removed, and the updated object is the result of the
delete-field expression. It doesn't matter whether v1
had an s
field or not: if it didn't, the object doesn't change; if it did, the
field is now deleted.
External function library
There are seven external functions that must be bound in the initial environment of every JoCalf program. They comprise the "pervasives" of JoCalf.
is_int
takes one argument. If that argument is an integer, the function returns that integer. Otherwise, the function returns false.is_bool
takes one argument. If that argument is a Boolean, the function returns that Boolean. Otherwise, the function returns false.is_string
takes one argument. If that argument is a string, the function returns that string. Otherwise, the function returns false.is_defined
takes one argument. If that argument is theundefined
value, the function returns false. Otherwise, the function returns the argument.is_prim
takes one argument. If that argument is an integer, string, Boolean, or theundefined
value, the function returns that value. Otherwise, the function returns false.length
takes one argument. If that argument is a string, the function returns the length of that string, which is the number of characters in it. If the argument is not a string, the function returnsundefined
.has_field
takes two arguments. The first must be an object, and the second must be a string; if either of those fails to hold, the function returnsundefined
. Otherwise, the function returns true if the object has a field with that string as its name, and false if it does not.
None of these external functions modifies the JoCalf state, though of course evaluation of their arguments could do so.
Desugaring
Desugaring transforms surface syntax into core syntax. The core
syntax of a language is usually simpler, to a greater or lesser degree,
than its surface syntax. JoCalf involves very little desugaring, so its
surface syntax is nearly the same as its core syntax. The only
syntactic form that is completely eliminated by desugaring is let rec
,
which becomes simply let
. Desugaring is also a good time to eliminate
illegal integer constants. Other than those, desugaring involves only
objects.
Let rec. A recursive let expression let rec x = e1 in e2
is desugared
to let x = z (fun (x) -> desugar(e1)) in desugar(e2)
. And a recursive
let definition let rec x = e
is likewise desugared to
let x = z (fun (x) -> desugar (e))
. In both, z
is the
function
fun (f) -> (fun (x) -> f (fun (v) -> (x x) v)) (fun (x) -> f (fun (v) -> (x x) v))
which is the strict fixed point combinator Z. This function has the amazing property of implementing recursion in untyped functional languages!
Object construction. An object construction {s1:e1, ..., sn:en}
is desugared to ref {s1:desugar(e1), ..., sn:desugar(en)}
. This
causes all objects to be heap allocated. The rest of the
object desugaring, below, is to account for that heap allocation.
Get field. A get-field expression e1[e2]
is desugared to
let obj = desugar(e1) in let field = desugar(e2) in (!obj)[field]
.
Update field. An update-field expression e1[e2] := e3
is
desugared to let obj = desugar(e1) in let field = desugar(e2) in let update = desugar(e3) in obj := ((!obj)[field] := update); update
.
Delete field. A delete-field expression delete e1[e2]
is
desugared to let obj = desugar(e1) in let field = desugar(e2) in obj := (delete (!obj)[field]); true
.