So far in this class we've been talking about sequential programs. Execution of a sequential program proceeds one step at a time, with no choice about which step to take next. Sequential programs are somewhat limited, both because they are not very good at dealing with multiple sources of simultaneous input and because they are limited by the execution resources of a single processor. For this reason, many modern applications are written using parallel programming techniques. There are many different approaches to parallel programming, but they all share the fact that a program is split into multiple different processes that run at the same time. Each process runs a sequential program, but the collection of processes no longer results in a single overall predictable sequence of steps. Rather, steps execute concurrently with one another, resulting in potentially unpredictable order of execution for certain steps with respect to other steps.
The granularity of parallel programming can vary widely, from coarse-grained techniques that loosely coordinate the execution of separate programs, such as pipes in Unix (or even the http protocol between a Web server and its clients), to fine-grained techniques where concurrent code shares the same memory, such as lightweight threads. In both cases it is necessary to coordinate the execution of multiple sequential programs. Two important types of coordination are commonly used:
In this lecture we will consider the lightweight thread mechanism in OCaml. The threads library provides concurrent programming primitives for multiple threads of control that execute concurrently in the same memory space. Threads communicate by modifying shared data structures or by sending and receiving data on communication channels. The threads library is not enabled by default. Compilation using threads is described in the threads library documentation. You can create a top level loop that has system threads enabled using:
ocamlmktop -thread unix.cma threads.cma -o ocaml_threadsThis executable can then be run as follows:
./ocaml_threads -I +threads
It should be noted that the OCaml threads library is implemented by time-sharing on a single processor and does not take advantage of multi-processor machines. Thus the library will not make programs run faster, however often programs may be easier to write when structured as multiple communicating threads.
For instance, most user interfaces concurrently handle user input and the processing necessary to respond to that input. A user interface that does not have a separate execution thread for user interaction may be frustrating to use because it does not respond to the user in any way until a current action is completed. For example, a web browser must be simultaneously handling input from the user interface, reading and rendering web pages incrementally as new data comes in, and running programs embedded in web pages. All these activities must happen at once, so separate threads are used to handle each of them. Another example of a naturally concurrent application is a web crawler, which traverses the web collecting information about its structure and content. It doesn't make sense for the web crawler to access sites sequentially, because most of the time would be spent waiting for the remote server and network to respond to each request. Therefore, a typical web crawler is highly concurrent, simultaneously accessing thousands of different web sites. This design uses the processor and network efficiently.
Concurrency is a powerful language feature that enables new kinds
of applications, but it also makes writing correct programs more
difficult, because execution of a concurrent program is
nondeterministic: the order in which things happen is not known ahead
of time. The programmer must think about all possible orders in which
the different threads might execute, and make sure that in all of them
the program works correctly. If the program is purely functional,
nondeterminism is easier because evaluation of an expression always
returns the same value no matter what. For example, the
expression (2*4)+(3*5)
could be executed concurrently,
with the left and right products evaluated at the same time. The
answer would not change. Imperative programming is much more
problematic. For example, the expressions (!x)
and (x := !x+1)
, if executed by two different threads,
could give different results depending on which thread executed first.
Let's consider a simple example using multiple threads and a shared variable. This example illustrates that a straightforward sequential program, when implemented as a concurrent program, may produce quite unexpected results.
A partial signature for the Thread module is
module type Thread = sig type t val create : ('a -> 'b) -> 'a -> t val self: unit -> t val id: t -> int val delay: float -> unit end
Thread.create f a
creates a new thread in which the
function f
is applied to the argument a
,
returning the handle for the new thread as soon as it is created (not
waiting for f
to be run). The new thread runs
concurrently with the other threads of the program. The thread exits
when f
exits (either normally or due to an uncaught
exception). Thread.self()
returns the handle for the
current thread, and Thread.id m
returns the identifier
for the given thread handle. Thread.delay d
causes the
current thread to sleep (stop execution) for d
seconds. There are a number of other functions in
the Thread
module, however note that a number of these
other functions are not implemented on all platforms.
Now consider the following function, which defines an internal
function f
that simply loops n
times, and on
each loop increments the shared variable result
by the
specified amount, i
, sleeping for a random amount of time
up to one second in between reading result
and
incrementing it. The function f
is invoked in two
separate threads, one of which increments i
by 1 on each iteration
and the other of which increments by 2.
let prog1 n = let result = ref 0 in let f i = for j = 1 to n do let v = !result in Thread.delay (Random.float 1.0); result := v + i; print_endline ("Value " ^ string_of_int !result); flush stdout done in ignore (Thread.create f 1); ignore (Thread.create f 2)
Viewed as a sequential program, this function could never result in
the value of result
decreasing from one iteration to the
next, as the values passed in to f
are positive, and are
added to result
. However, with multiple threads, it is
easy for the value of result
to actually decrease.
If one thread reads the value of result
, and then while
it is sleeping that value is incremented by another thread, that
increment will be overwritten, resulting in the value decreasing. For
instance:
# prog1 10;; - : unit = () # Value 2 Value 1 Value 4 Value 6 Value 8 Value 2 Value 10 Value 3 Value 4 Value 12 Value 14 Value 5 Value 16 Value 6 Value 7 Value 8 Value 18 Value 20 Value 9 Value 10
It is important to note that this same issue exists even without the
thread sleeping between the time that it reads and updates the
variable result
. The sleep increases the chance that we
will see the code execute in an unexpected manner, but the simple act
of incrementing a mutable variable inherently needs to first read that
variable, do a calculation and then write the variable. If a process
is interrupted between the read and write steps by some other process
that also modifies the variable, the results will be unexpected.
A basic principle of concurrent programming is that reading and writing of mutable shared variables must be synchronized so that shared data is used and modified in a predictable sequential manner by a single process, rather than in an unpredictable interleaved manner by multiple processes at once. The term critical section is commonly used to refer to code which accesses a shared variable or data structure that must be protected against simultaneous access. The simplest means of protecting a critical section is to block any other process from running until the current process has finished with the critical section of code. This is commonly done using a mutual exclusion lock or mutex.
A mutex is an object that only one party at a time has control over. In Ocaml, mutexes are provided by the Mutex module. The signature for this module is:
module type Mutex = sig type t val create : unit -> t val lock: t -> unit val try_lock: t -> bool val unlock: t -> unit end
Mutex.create()
creates a new mutex and returns a handle to
it. Mutex.lock m
returns once the specified mutex has
been successfully locked by the calling thread. If the mutex is
already locked by some other thread, then the current thread is
suspended until the mutex becomes available. Mutex.try_lock
m
returns true
if the specified mutex has been
successfuly locked by the current thread, and false
if it
is already locked by some other thread. Mutex.unlock m
unlocks the specified mutex, provided the thread issuing this instruction
owns the lock. The unlocking of a mutex causes other threads that are
suspended trying to lock m
to restart and try again to
obtain the lock. Only one of those threads will succeed.
Mutex.unlock
throws an exception if the current
thread does not have the specified mutex locked.
If all the code that accesses some shared data structure acquires a given mutex before such access and releases it afterwards, then this guarantees access by only one process at a time. This is called mutual exclusion.
Mutex.lock m; foo d; (* Critical section operating on some shared data structure *) Mutex.unlock m
We commonly refer to the mutex m
as protecting the data
structure d
. Note that this protection is only
guaranteed if all code that accesses d
correctly obtains
and releases the mutex.
Now we can rewrite the function prog1
above to use a
mutex to protect the critical section that reads and modifies the
shared variable result
:
let prog2 n = let result = ref 0 in let m = Mutex.create() in let f i = for j = 1 to n do Mutex.lock m; let v = !result in Thread.delay (Random.float 1.0); result := v + i; print_endline ("Value " ^ string_of_int !result); flush stdout; Mutex.unlock m; Thread.delay (Random.float 1.0) done in ignore (Thread.create f 1); ignore (Thread.create f 2)
This function has the expected behavior of always incrementing the
value of result
.
# prog2 10;; - : unit = () # Value 1 Value 3 Value 4 Value 6 Value 7 Value 9 Value 10 Value 12 Value 14 Value 15 Value 17 Value 18 Value 20 Value 21 Value 23 Value 25 Value 26 Value 28 Value 29 Value 30
Unfortunately, too much locking with mutexes defeats the advantages of concurrency. In fact, the excessive use of locking can result in code that is slower than a single-threaded version. That said, however, sharing variables across threads without proper synchronization will yield unpredictable behavior! Sometimes that behavior will only occur very rarely. Concurrent programming is hard. Often a good approach is to write code in as functional a style as possible, as this minimizes the need for the synchronization of threads.
A more insidious hazard is the potential for deadlock, where multiple threads have permanently prevented each another from running because they are waiting for conditions that can never become true. A simple example of a deadlock occurs with two threads and two mutexes m and n. Suppose one thread tries to obtain the locks in the order m and then n, while at the same time the other thread tries to obtain the locks in the order n and then m. If the first thread succeeds in locking m and the second thread succeeds in locking n, then no forward progress can ever be made, because each is waiting on the other lock. This situation is sometimes referred to as deadly embrace.