Setting up utop and cs3110 for async
------------------------------------
When working with async, it is useful to configure both utop and the cs3110 tool
to automatically load async.
You can cause `utop` to automatically load async by creating a file named `.ocamlinit`
in your current directory, and including the lines
```
#require "async";;
open Async.Std
```
`utop` automatically executes `.ocamlinit` every time it starts; this will
automatically load the async library and open the Async.Std module.
You can cause `cs3110 compile` to automatically include async by creating a file
called `.cs3110` in your current directory, and including the lines
```
compile.opam_packages=async
compile.thread=true
```
This lets you use `cs3110` to compile async programs without passing the `-t`
and `-p async` flags every time.
Note that files whose names start with `.` are hidden in unix; to include them
while listing files at the command line, you can type `ls -A` instead of `ls`.
Async documentation
-------------------
When working with Async, there are several important references that you should
familiarize yourself with.
- **Official async documentation** The official Async API documentation can be
found [here](https://blogs.janestreet.com/ocaml-core/111.28.00/doc/). This
is the authoritative documentation, and covers the full Async API.
- **CS 3110 async documentation** Async is a large, complex API. To help you
focus on the parts of Async that are relevant to the course and the
projects, we have written CS3110 specific documentation that covers a subset
of the API. We have omitted many modules, functions, and optional
parameters from the documentation. Last semester's version of the
documentation is available
[here](http://www.cs.cornell.edu/Courses/cs3110/2015sp/lectures/18/async/Async.html).
We will release a new version of this documentation with A5, and will update
these notes with a link when we do.
- **Utop** As discussed in a [previous recitation](../03-var/rec.html), you can
print the contents of a module `M` in utop by typing `module type T = module
type of M;;`. This can be a valuable method for quickly finding the
function you are looking for, if you can guess the module it is in.
- **Real world OCaml** [Chapter 18](https://realworldocaml.org/v1/en/html/concurrent-programming-with-async.html)
of Real World OCaml covers the basics of Async. It would be a good chapter
to read as you're familiarizing yourself with the library.
A quick note: it is standard practice to `open Async.Std` whenever using async.
All of these references assume that you have done so. For example, when the
documentation discusses `Deferred.t`, it is really referring to
`Async.Std.Deferred.t`. Make sure you open Std!
**Exercise**: Find and read the documentation for `Writer.write` in the
official async documentation and in the 3110 async documentation. Compare the
two.
Programming with >>=
--------------------
As you may have gathered, programming with `bind` and `upon` can lead to code
that is difficult to read. Conceptually, a program might want to first read a
string from a file, then convert it to an integer n, then wait for n seconds,
then read a message from the network, and then print "done".
In an imperative language, this could would look something like
```
program () {
s = read_file ();
n = parse_int (s);
wait (n);
read_from_network ();
print ("done");
}
```
In OCaml, without deferreds, this could might look like
```
let program () =
let s = read_file () in
let n = int_of_string s in
let _ = wait n in
let p = read_from_network () in
print "done"
```
This simple structure becomes obscured when using `bind`, because each step
requires a new function, and that function has to then call bind to schedule
the next step. You might end up writing code like:
```
let program () =
let do_last_step p =
print "done"; return ()
in
let do_third_step () =
bind (read_from_network ()) do_last_step
in
let do_second_step s =
bind (wait (int_of_string s)) do_third_step
in
bind (read_file ()) do_second_step
```
This awkward style of writing code is often called "inversion of control", and
different asynchronous programming environments take different approaches to
avoid it.
In OCaml, we can simplify the code using bind by using anonymous
functions:
```
bind (read_file ()) (fun s ->
let n = int_of_string s in
bind (wait n) (fun _ ->
bind (read_from_network ()) (fun p ->
print "done"; return ()
)
)
)
```
This allows the code to be written "in the right order", but it still lacks the
clarity of the non-asynchronous OCaml version.
The infix bind operator `(>>=)`, combined with good indentation solves this
problem.
> **The secret to writing and reading async programs**:
> Think of a function of type `'a -> 'b Deferred.t` as being just like a
> function of type `'a -> 'b`, except that you might have to wait to get the
> result.
>
> Think of
> ```
> f x >>= fun x ->
> ```
> as being just like
> ```
> let x = f x in
> ```
> except that `>>=` waits for the result of `f x` to become available.
>
> Both expressions first execute `e`, and when `e`'s value becomes available,
> that value is bound to `x` and then evaluation continues from the next line.
> The only difference is that the `(>>=)` version allows other parts of the
> program to run in between the execution of `e` and the time when `e`'s value
> becomes available.
>
> Finally, where a synchronous function contains the final value to return,
> the asynchronous function should actually call `return` to wrap the value to
> be returned in a `Deferred.t`.
Let's apply this rule to the above example:
```
(* synchronous function *) (* asynchronous function *)
(* let program () : unit = *) let program () : unit Deferred.t =
(* let s = read_file () in *) read_file () >>= fun s ->
(* let n = int_of_string s in *) let n = int_of_string s in
(* let _ = wait n in *) wait n >>= fun _ ->
(* let p = read_from_network () in *) read_from_network () >>= fun p ->
(* print "done"; *) print "done";
(* () *) return ()
```
The way OCaml parses the asynchronous expression is
```
let program () : unit Deferred.t =
read_file () >>= (fun s ->
let n = int_of_string s in
wait n >>= (fun _ ->
read_from_network () >>= (fun p ->
print "done";
return ()
)
)
)
```
which is the same as our `bind` version above. However, by omitting the
parentheses and indentation, we can think of the code as a sequence of `let`
expressions, and we can forget that there's a complex scheduling process going
on as this code executes.
**Exercise**: The file [sequence.ml](rec_code/sequence.ml) contains a
comment with a hypothetical synchronous function that prompts the user to enter
some input, then reads a line of input, then waits 3 seconds, then prints
"done", and finally exits the program. Convert the hypothetical synchronous
version of the code to a real asynchronous version.
Note that the functions called in the hypothetical code are the correct async
funtions. That is, you should use `printf` to print, `Reader.read_line stdin`
to get input, `after` to wait, and `exit` to exit.
Just as you can use recursive functions to repeatedly process input in a
synchronous program, you can write recursive functions to repeatedly process
input in an asynchronous program.
**Exercise**: The file [loop.ml](rec_code/loop.ml) contains a
hypothetical recursive function that repeatedly prompts for input, and then
reads the input, waits for three seconds, and then prints the input.
If the end of the file is reached, then the program instead prints "done" and
exits. Complete the asynchronous implementation of this pseudocode.
Note: while typing at the console, you can send an "end of file" by pressing
control+d.
**Exercise**: Another way to interpret the idea contained in `loop.ml`
is to schedule the output to be printed after three seconds, but to immediately
prompt for the next input. Complete this implementation in the function
`loop_prompt_immediately`. Compile and test your code. See what happens if
you type many lines in rapid succession.
**Exercise**: The file [input.txt](rec_code/input.txt) contains several lines;
each line is either blank or is a filename. In the file `createFiles.ml`,
write a program that creates a new blank file for each filename in `input.txt`.
Specifically, your program should
- include a helper function `create_file : string -> unit Deferred.t` that uses
`Writer.open_file` and `Writer.close` to create a new empty file with the
given filename.
- include a recursive helper function `create_all_files : Reader.t -> unit
Deferred.t` that repeatedly reads a line from the file (using
`Reader.read_line`), checks to see if the line is blank, and if not, calls
`create_file` to create the file.
- use `Reader.open_file` to open the file and then call `create_all_files` to
create the files. After create_all_files completes, your program should
call `exit 0` to cause the program to terminate.
Compile and run your program to ensure that it works properly. Note:
`create_file` will raise an exception if the files already exist, so you should
delete them if you run `createFiles` multiple times.
As you've learned, many recursive functions can be replaced by good uses of
higher order functions like `map`, `fold`, and `filter`. The `Deferred.List`
module contains many versions of these functions adapted to work with functions
that return deferred values. For example, without async, I might write a
function that takes a list of line numbers and returns the corresponding lines
as follows:
```
let read_lines (f : file) (line_numbers : int list) : string list =
List.map get_line_of_file line_numbers
```
The analogous asynchronous program would be:
```
val get_line_of_file : file -> int -> string Deferred.t
let read_lines (f : file) (line_numbers : int list) : string list Deferred.t =
Deferred.List.map line_numbers get_line_of_file
```
Unfortunately, the order of the arguments to `Deferred.List.map` is the
opposite to the order for `List.map`. But other than this small discrepancy,
the asynchronous version of the code is extremely similar to the synchronous
version.
**Exercise**: Create a second version of the the `create_files` program that
uses `Reader.file_lines` and `Deferred.List.map` instead of a recursive helper
function.
Ivar introduction
-----------------
So far, the deferreds we've seen are all automatically determined when a given
event happens (e.g. time passes, or the bytes from a file become available, or
the deferred returned by a bound function becomes determined).
Often, you will want to create a deferred that you decide when to determine.
An `Ivar.t` contains an deferred value, which you can determine by calling
`Ivar.fill`. See the
[3110 Ivar documentaion](http://www.cs.cornell.edu/Courses/cs3110/2015sp/lectures/18/async/Async.Std.Ivar.html)
for more details.
**Exercise**: Use `Ivar` to implement a function
```
either : 'a Deferred.t -> 'b Deferred.t -> [`Left of 'a | `Right of 'b] Deferred.t
```
The deferred returned from `either` should become
determined when either of the input deferreds become determined.
The value of the result should contain the results of either the
first or the second input deferred.
Hint: first create a new `Ivar.t` and then use `upon` to schedule a function on
each of the two input deferreds.