JoCalf Language Manual

JoCalf is an language that is most similar to OCaml, but also inherits features from JavaScript and Racket. From OCaml, JoCalf gets its basic syntax, higher-order functions, and references. From Racket, an untyped functional language in the Scheme family, JoCalf gets functions that truly take multiple arguments (unlike OCaml functions, which take just a single argument, despite syntactic sugar that makes it appear otherwise). From JavaScript, JoCalf gets operators that convert their arguments' run-time types, exceptions that can be throw and caught, and objects with fields that can be mutated.

The semantics described in English in this document assumes that desugaring has already been performed. This informal semantics describes how evaluation occurs, but it mostly elides the details of how exceptions propagate and how changes to the state are threaded throughout evaluation. Those details can be found in the formal semantics.

Table of Contents:

Tutorial
Reference
External function library
Desugaring

Tutorial

This brief tutorial shows off some features of JoCalf using its toplevel (aka REPL). Like the OCaml toplevel, JoCalf's toplevel accepts phrases, which are either expressions or let definitions. Terminating a phrase with double semicolon is optional.

Basics

JoCalf's operators are more flexible than OCaml's, allowing arguments of various types to be combined. But sometimes the result is a special value undefined.

# 1 + 1
2

# "1" + "1"
"11"

# 31 + "10"
"3110"

# 1 * "zzz"
undefined

Let expressions and definitions are much the same as OCaml.

# let x = 1+1 in x+x
4

# let x = 1
1

# x
1

# y
Exception: "Unbound variable"

If expressions are more flexible, allowing non-Boolean guard expressions, missing else branches, and branches of differing types. The short-circuit conjunction and disjunction operators return values that might not be Boolean.

# if true then 42 else "forty two"
42

# if 3110 then "yay" else "boo"
"yay"

# if 0 then "yay"
undefined

# true && 1
1

# 1 && true
true

# "cool cool" || false
"cool cool"

Functions and external functions

Functions are always anonymous and permit multiple arguments. Moreover, functions may not be partially applied; all arguments must be supplied when the function is applied.

# let add = fun (x y) -> x + y
<closure>

# add 2 3
5

# add 1
Exception: "Application: wrong number of arguments"

The parentheses around the arguments in a function expression are mandatory.

# let add = fun x y -> x + y
Syntax error, line 1, characters 14-15: x

Let bindings may be used to create recursive functions.

# let rec fact = fun (n) -> if n = 0 then 1 else n * (fact (n-1))
<closure>

# fact 5
120

External functions are functions that can be used in JoCalf but are actually implemented in OCaml—similarly to how OCaml provides some functions in its standard library that are actually implemented in C. Externals are needed when the function would be impossible to implement directly in JoCalf.

# length "hello"
5

# is_int 42
42

# is_int "42"
false

Imperative features

References, sequencing, and while loops are similar to OCaml. Assignment returns the value of its right hand side.

# let inc = fun (r) -> r := !r + 1 
<closure>

# let x = ref 0
<location>

# x := 10
10

# inc x; inc x; inc x
13

# !x
13

# while !x > 0 do x := !x-1 done
undefined

# !x
0

Exceptions are an amalgam of JavaScript and OCaml:

# throw 42
Exception: 42

# try throw "oops" catch exc handle exc + " caught"
"oops caught"

# try throw 1 catch x handle throw 3 finally throw 2
Exception: 2

Objects

Finally, JoCalf has objects. The object system is relatively immature (as befits a young language): it has mutable fields, but not methods nor inheritance nor a this keyword (implicit or explicit). Hence objects are mostly just mutable records. Their fields can be accessed with a couple different syntactic forms. Field names are always strings, but the field to be accessed can be computed at run time. Objects are heap-allocated, so they appear to be references.

# let o = {"x": 1, "1": 42, "dbl": fun (z) -> 2*z}
<location>

# o["x"]
1

# o.x
1

# o["1"]
42

# o[3-2]
42

# o["d"+"bl"] 10
20

# let o' = {"x": 1, "f" : fun (y) -> x+y}
<location>

# o'.g
undefined  (* because g is not a field of o' *)

# o'.f 2
Exception: "Unbound variable"  (* because x is not bound in the function *)

Reference

Next, we give a reference for each syntactic form of JoCalf, explaining how each form evaluates.

Syntax

The abstract syntax of JoCalf is as follows:

(* expressions *)
e ::=  
    | i | b | s | undefined   (* constants *) 
    | (e) | begin e end       (* nesting of expressions *) 
    | x                       (* variables *)
    | let [rec] x = e1 in e2  (* let expressions *)
    | fun (x1 ... xn) -> e    (* functions *)
    | e0 ... en               (* application *)
    | if e1 then e2 [else e3] (* control flow: if, sequence, while *)
    | e1; ...; en             
    | while e1 do e2 done     
    | uop e | e1 bop e2       (* unary and binary operators *)
    | e1 && e2 | e1 || e2     
    | ref e | !e | e1 := e2   (* references *)
    | throw e                 (* exceptions *)
    | try e1 catch x 
      handle e2 [finally e3]  
    | {s1:e1, ..., sn:en}     (* objects *)
    | e1[e2] | e1.e2 
    | e1[e2] := e3
    | delete e1[e2]

(* definitions *)
d ::=
    | let [rec] x = e         (* let definition *)

(* unary operators *)
uop ::=  
    | not | - | typeof

(* binary operators *)
bop ::=  
    | + | - | * | / | mod 
    | < | <= | > | >= 
    | = | != | == | !==

i ::= integers
b ::= booleans
s ::= strings
x ::= identifiers

Dynamic semantics

Values. JoCalf values are defined as follows:

(* values *)
v ::= 
    | i | s | b | undefined   (* constant values *)
    | {s1:v1, ..., sn:vn}     (* object values *)
    | loc | cl | extern       (* unexpressible values *)

loc    ::= memory locations
cl     ::= closures
extern ::= external functions

The final three kinds of values—locations, closures, and external functions—cannot be directly expressed in JoCalf, but result from evaluation. External functions are functions that are available for use by JoCalf programs but are not implemented in JoCalf. Instead, they are implemented in OCaml as part of the interpreter itself.

Conversion of values. JoCalf allows run-time conversion between different types of values.

To convert a value to a Boolean:
- The values true, any integer except 0, any string except the empty string "", and any object, location, extern, or closure all convert to the Boolean true. We call these values truthy.
- The values false, the integer 0, the empty string "", and the undefined value all convert to the Boolean false. We call these values falsy.
To convert a value to an integer:
- An integer i converts to itself.
- The Boolean true converts to 1, and false converts to 0.
- A string s converts to Pervasives.int_of_string s if that OCaml function returns an integer. But if that OCaml function raises Failure, then the string converts to undefined.
- A location, object, extern, or closure converts to undefined.
- The value undefined remains undefined; it does not become an integer.
To convert a value to a string:
- Note that this algorithm is not how the REPL nor how the interp function in Main convert values to strings to display them to the user.
- A string s converts to itself.
- An integer i converts to Pervasives.string_of_int i.
- The Boolean true converts to "true", and false converts to "false".
- A location, object, extern, or closure converts to "undefined".
- The value undefined converts to "undefined".
To convert a value to a primitive value:
- Integer, Boolean, string, and undefined are already primitive values. They convert to themselves.
- Locations, objects, externs, and closures all convert to the primitive value undefined.

Expressions. Evaluation of a JoCalf expression requires two additional pieces of information: the environment, giving a mapping from free variables to their values, and the state, giving a mapping from memory locations to their values. Evaluation produces a result, which is either a value or an exception, as well as an updated state.

(* results *)
r ::= 
    | v      (* value *)
    | exn v  (* exception carrying a value *)

Definitions. Evaluation of a let definition in the toplevel is much like evaluation of an expression, but it additionally introduces new bindings into the environment.

Phrases. A phrase is either a definition or an expression, and evaluates accordingly.

Let definitions

The let definition

let x = e

evaluates e to a value v, and binds x to v in the environment.

Constants

Integers. JoCalf integers are the same as OCaml's. On a 64-bit implementation, that means they range from $−2^{62}$ to $2^{62}−1$ . Integer literals may be written in decimal, hexadecimal, octal, and binary:

Implementation note: the JoCalf parser produces an OCaml string, not an int, for integers. It is up to your interpreter to instead produce an integer. There are three problems to solve here.

The parser needs your help to recognize negative integer literals. For example, given the JoCalf expression -1, the parser produces a unary negation applied to "1". A minus sign - followed in the JoCalf source code by an integer literal should itself be an integer literal, not a unary negation operation. So your AST factory should detect this situation, eliminate the unary negation, and tranform the string to "-1".
Sometime before evaluation, those strings need to become an OCaml int. A good place to handle this would be during desugaring. The surface AST node for integers could carry a string, and the core AST node for integers could carry an int. Your desugaring pass could simpy use OCaml's int_of_string to transform the string to an int. Note that since evaluation has not yet begun at that point, no run-time conversion is yet being performed.
The parser needs your help to recognize whether integer literals are between the bounds of min_int and max_int, inclusive. Assuming you have solved the two problems above correctly, int_of_string will take care of this for you. If it produces an exception, your desugaring pass should produce an error and refuse to evaluate the program. For example:
```
# 10000000000000000000000000000000000000000
Integer literal exceeds representable range
```

Strings. JoCalf strings are the same as OCaml's, but do not support Unicode. There is some support for escaping provided by the parser:

# "\052" + "\050"
"42"

# "\n"
"\n"

Booleans. JoCalf's Booleans are true and false.

Undefined. JoCalf has a special value written undefined that is similar to OCaml's unit value (), in that it is the only value of its type. JoCalf uses undefined to represent the result of an operation or language construct that does not produce a meaningful value.

Parenthesized expressions

Expressions can be parenthesized, as in OCaml, with ( ... ) or begin ... end.

Implementation note: You don't have to do anything to deal with parenthesized expressions. The parser automatically handles them for you. Remember that parentheses are not explicitly represented in an AST; rather, they are implicitly represented by the shape of the tree.

Variables and let expressions

Let expressions bind variables to values. JoCalf variable identifiers are a subset of OCaml identifiers: the first character must be a lowercase letter or an underscore, and any remaining letters must be a letter (of any case), a digit, an underscore, or a single quote.

Evaluation of a variable produces the value to which that variable is bound in the environment. If the variable is unbound, evaluation of it produces the exception "Unbound variable".

The let expression

let x = e1 in e2

evaluates e1 to a value v, then evaluates e2 in an environment in which x is bound to v.

Functions and application

JoCalf functions are values, just as OCaml functions are. But unlike OCaml, JoCalf functions are truly multi-argument, and all arguments must be supplied at the point where the function is applied; partial application is not permitted in JoCalf.

The anonymous function

fun (x1 ... xn) -> e

is a function value that takes n arguments named x1..xn and has a body e in which those arguments are bound. An anonymous function evaluates to a closure, which captures the environment where the function was defined. It is a syntax error for a function to have zero arguments, or to have arguments with duplicate names.

Implementation note: The parser already detects those syntax errors for you.

Application is written, as in OCaml, by juxtaposing the function and its arguments:

e0 e1 ... en

It is evaluated left-to-right. First e0 is evaluated, and that must produce either a closure or an external function. If it does not, evaluation of the function application is done, and the exception "Application: not a function" is produced.

If e0 does produce a function (either a closure or an external), then e1 ... en are evaluated left-to-right. If all of e1 ... en evaluate to values v1 ... vn, and if the function takes exactly n arguments, then the function is applied to those arguments, producing a result.

If the function takes a different number of arguments than were provided, then the exception "Application: wrong number of arguments" is produced. The check for the number of arguments occurs before the arguments are themselves evaluated.

Control flow: if, sequence, while

The if expression

if e1 then e2 else e3

evaluates the guard e1, then conditionally evaluates a branch e2 or e3 based on the result of the guard.

First, the guard e1 is evaluated. If the guard is truthy, then e2 is evaluated. If the guard is falsy, then e3 is evaluated.

It is also possible to omit the else branch:

if e1 then e2

That syntactic form evaluates equivalently to

if e1 then e2 else undefined

The sequence of expressions

e1; e2

evaluates left to right. First e1 is evaluated, then e2. We then have two values, v1 and v2. The result of the sequence is v2. The other value, v1, is ignored.

Likewise, a longer sequence of expressions

e1; e2; ...; en

is evaluated left to right, resulting in the final value vn corresponding to en.

The while loop

while e1 do e2 done

is evaluated by instead evaluating the longer expression

if e1 then (e2; while e1 do e2 done)

and returning whatever result that longer expression produces. Note how that recursively unrolls the while loop during interpretation for as many iterations as are necessary.

Operators

If any of this section seems unnecessarily complicated, reflect on the fact that it is designed to be as faithful as possible to JavaScript, which is an immensely popular language. Also, for five minutes of PL-nerdy entertainment, watch the Wat talk on even weider definitions in dynamically typed languages like Ruby and JavaScript.

Unary operators. Application of a unary operator

uop e

first evaluates e to a value v. The result depends on the operator:

not: If v is truthy, then not v is false. If v is falsy, then not v is true.
-: Application of unary negation - v first converts v to an integer. That might result in an integer, or the undefined value. In the former case, the result of the operation is the result of applying OCaml's unary negation operator to the integer. The result of applying unary negation to undefined is undefined.

Implementation note: A minus sign - followed in the JoCalf source code by an integer literal is itself an integer literal, not a unary negation operation. See the implementation note above on integers.
typeof: The typeof operator always produces a string. The type of undefined is "undefined". The type of a Boolean is "bool". The type of an integer is "int". The type of a string is "string". The type of an object is "object". The type of a location is "location". The type of a closure or an extern is "closure"—that is, typeof cannot distinguish between functions coded in JoCalf and functions coded in OCaml.

Binary operators. Application of a binary operator

e1 bop e2

proceeds left to right. First, it evaluates e1, then e2. We now have two values v1 and v2, respectively, and the result depends on the operator:

+: First convert v1 and v2 to primitive values. If either one is a string, then convert the other to a string too, and the result is the application of OCaml's string append operator to those strings. Otherwise, further convert the primitives to integers, and the result is the application of OCaml's integer addition operator to those integers. But if conversion to integers causes either operand to convert to undefined, then the result of the + operation is likewise undefined.
-, *, /, mod: Convert v1 and v2 to integers. The result of the binary operation is the application of OCaml's same binary operator to those integers, with the following deviations:
- If conversion to integers causes either operand to convert to undefined, then the result of the binary operation is likewise the undefined value.
- If the binary operation is / or mod and v2 is the integer 0, then the result is the JoCalf exception "Division by zero".
<, <=, >, >=: Convert v1 and v2 to primitives. If they are both strings, then the result is the application of OCaml's same comparison operator to those strings. (Recall, OCaml uses lexicographic comparison on strings.) Otherwise, further convert the primitives to integers, and the result is the application of OCaml's same comparison operator to those integers. But if conversion to integers causes either operand to convert to undefined, then the result of the comparison operation is false. Note that this is different behavior than + on undefined operands.
=: Follow these steps, in order.
- If v1 and v2 both undefined, return true.
- If v1 and v2 are both Booleans, or are both integers, or are both strings, then return the result of OCaml's = operator on those values.
- If v1 and v2 are both locations, then dereference both locations, and recurse to determine whether the resulting values are equal.
- If v1 and v2 are both objects, then return true if both objects have the same set of fields, and those fields are recursively equal in both objects; otherwise, return false.
- If v1 and v2 are both closures or both externs, then return false.
- If either of the two values is an integer, and the other value is either a string or a Boolean, then convert the other value to an integer (which might actually produce undefined), and recurse to compare them.
- Otherwise, return false.
!=: Compare v1 to v2 using =, then apply JoCalf's unary not to that result.
==: Follow these steps. No type conversions are performed at any point during this operation.
- If v1 and v2 are both undefined, return true.
- If v1 and v2 are both an integer, or are both strings, or are both Booleans, then return whether they are equal according OCaml's = operator.
- If v1 and v2 are both locations, then return whether they are the same location. Do not dereference the locations to check their contents. Since objects are desugared to locations, comparison of two objects will always use this rule.
- If v1 and v2 are both closures, return false. Do not check their code or environment; it doesn't matter whether they are equal or not.
- Otherwise, return false.
!==: Compare v1 to v2 using ==, then apply JoCalf's unary not to that result.

Short-circuit operators. Application of a short-circuit operator

e1 && e2
e1 || e2

works differently than application of a binary operator, because the right-hand operand might never be evaluated, depending on the result of the left-hand operand.

Start by evaluating e1 to a value v1. Evaluation then proceeds as follows:

&&: If v1 is falsy, return v1. Do not evaluate e2. If v1 is truthy, continue to evaluate e2.
||: If v1 is truthy, return v1. Do not evaluate e2. If v1 is falsy, continue to evaluate e2.

If evaluation did not already return, now evaluate e2 to a value v2. Return v2 as the result of the short-circuit operation.

Note that in all cases, the expression that finally determined whether the operator as a whole is truthy or falsy is what gets returned. Unlike OCaml's short-circuit operators, that expression need not itself be a Boolean.

References

Evaluation of a reference creation

ref e

first evaluates e to a value v. Then it allocates a new location loc in the store, and initializes that location to have value v.

Evaluation of a dereference

!e

first evaluates e to a value v. If that value is not a location, the result of the dereference is the undefined value. But if v is a location, return the value stored at that location. If the location is not actually allocated in the store, then the result is unspecified (and in fact this case should be impossible.)

Evaluation of an assignment

e1 := e2

first evaluates e1, then evaluates e2. We now have two values, v1 and v2. If v1 is not a location, the result of the assignment is the exception "Assignment to non-location". If v1 is a location but is currently unbound in the store, then the result is unspecified (and in fact this case should be impossible). Otherwise, store v2 at location v1, and return v2 as the result of the assignment.

Exceptions

The throw expression

throw e

first evaluates e. If that evaluation produces an exception exn v', then exn v' is the result of evaluating throw e. That is, the inner exception thrown by e itself is what actually propagates. But if e evaluates to a value v, then throw e evaluates to the exception exn v.

The try expression

try e1 catch x handle e2

first evaluates e1. If that produces a value v, then v is the result of evaluating the try expression. But if evaluation of e1 produces an exception exn v, then evaluation continues by binding x to v in the environment, evaluating e2 in that extended environment, and returning the result of that evaluation.

The try-finally expression

try e1 catch x handle e2 finally e3

first evaluates try e1 catch x handle e2, obtaining a result r. It then evaluates e3. If evaluation of e3 produces an exception, then that exception is the result of the try-finally and r is ignored. Otherwise, r is the result and the value of e3 is ignored.

Objects

The object expression

{s1:e1, ... sn:en}

evaluates its field expressions e1...en left to right, producing values v1...vn. The result of the object expression is the object value

{s1:v1, ..., sn:vn}

The get-field expression

e1[e2]

first evaluates e1 and e2 to values v1 and v2. It then converts v2 to a primitive, then converts that primitive to a string s. It then checks whether v1 is an object; if not, the result of the get-field expression is the undefined value. But if v1 is an object, then the field s is found in that object. If there is no such field, then the result of the get-field expression is the undefined value. If there is a field s and it is bound to a value v, then v is the result of the get-field expression.

Implementation note: The alternative form e.x of the get-field expression is already handled for you by the parser, which transforms it into e["x"] for you. So you don't need to do anything extra to interpret this alternative form.

The update-field expression

e1[e2] := e3

is a special case of the assignment expression e := e', where the left-hand side e is itself a get-field expression. It first evaluates e1, e2, and e3 to values v1, v2, and v3. It then converts v2 to a primitive, then converts that primitive to a string s. It then checks whether v1 is an object; if not, the result of the update-field expression is v3. But if v1 is an object, then its s field is bound to v3, and the updated object (not v3) is the result of the update-field expression. It doesn't matter whether v1 already had an s field or not: if it didn't, the field is created; if it did, the field is updated.

The delete-field expression

delete e1[e2]

first evaluates e1 and e2 to values v1 and v2. It then converts v2 to a primitive, then converts that primitive to a string s. It then checks whether v1 is an object; if not, the result of the delete-field expression is v1. But if v1 is an object, then its s field is removed, and the updated object is the result of the delete-field expression. It doesn't matter whether v1 had an s field or not: if it didn't, the object doesn't change; if it did, the field is now deleted.

External function library

There are seven external functions that must be bound in the initial environment of every JoCalf program. They comprise the "pervasives" of JoCalf.

is_int takes one argument. If that argument is an integer, the function returns that integer. Otherwise, the function returns false.
is_bool takes one argument. If that argument is a Boolean, the function returns that Boolean. Otherwise, the function returns false.
is_string takes one argument. If that argument is a string, the function returns that string. Otherwise, the function returns false.
is_defined takes one argument. If that argument is the undefined value, the function returns false. Otherwise, the function returns the argument.
is_prim takes one argument. If that argument is an integer, string, Boolean, or the undefined value, the function returns that value. Otherwise, the function returns false.
length takes one argument. If that argument is a string, the function returns the length of that string, which is the number of characters in it. If the argument is not a string, the function returns undefined.
has_field takes two arguments. The first must be an object, and the second must be a string; if either of those fails to hold, the function returns undefined. Otherwise, the function returns true if the object has a field with that string as its name, and false if it does not.

None of these external functions modifies the JoCalf state, though of course evaluation of their arguments could do so.

Desugaring

Desugaring transforms surface syntax into core syntax. The core syntax of a language is usually simpler, to a greater or lesser degree, than its surface syntax. JoCalf involves very little desugaring, so its surface syntax is nearly the same as its core syntax. The only syntactic form that is completely eliminated by desugaring is let rec, which becomes simply let. Desugaring is also a good time to eliminate illegal integer constants. Other than those, desugaring involves only objects.

Let rec. A recursive let expression let rec x = e1 in e2 is desugared to let x = z (fun (x) -> desugar(e1)) in desugar(e2). And a recursive let definition let rec x = e is likewise desugared to let x = z (fun (x) -> desugar (e)). In both, z is the function

fun (f) -> (fun (x) -> f (fun (v) -> (x x) v)) (fun (x) -> f (fun (v) -> (x x) v))

which is the strict fixed point combinator Z. This function has the amazing property of implementing recursion in untyped functional languages!

Object construction. An object construction {s1:e1, ..., sn:en} is desugared to ref {s1:desugar(e1), ..., sn:desugar(en)}. This causes all objects to be heap allocated. The rest of the object desugaring, below, is to account for that heap allocation.

Get field. A get-field expression e1[e2] is desugared to let obj = desugar(e1) in let field = desugar(e2) in (!obj)[field].

Update field. An update-field expression e1[e2] := e3 is desugared to let obj = desugar(e1) in let field = desugar(e2) in let update = desugar(e3) in obj := ((!obj)[field] := update); update.

Delete field. A delete-field expression delete e1[e2] is desugared to let obj = desugar(e1) in let field = desugar(e2) in obj := (delete (!obj)[field]); true.