So far in this class we've been talking about sequential programs. Execution of a sequential program proceeds one step at a time according to the evaluation rules, with no choice about which step to take next. We saw this in the various SML semantics that we explored earlier. Sequential programs are somewhat limited because they are not very good at dealing with multiple sources of simultaneous input. For this reason, many modern applications are therefore concurrent (or multi-threaded, parallel): there are multiple threads of execution concurrently executing in parallel.
For example, a web browser must be simultaneously handling input from the user interface, reading and rendering web pages incrementally as new data comes in, and running embedded programs written in Java, Javascript and other languages. All these activities must happen at the same time, so separate threads are used to handle each of them. Another example of a naturally concurrent application is a web crawler, which traverses the web collecting information about its structure and content. It doesn't make sense for the web crawler to access sites sequentially, because most of the time would be spent waiting for the remote server and network to respond to each request. Therefore, a typical web crawler is highly concurrent, simultaneously accessing thousands of different web sites. This design uses the processor and network efficiently.
Concurrency is a powerful language feature that enables new kinds of
applications, but it also makes writing correct programs more difficult, because
execution of a concurrent program is nondeterministic : the order in
which things happen is not known ahead of time. The programmer must think about
all possible orders in which the different threads might execute, and make sure
that in all of them the program works correctly. If the program is purely
functional, nondeterminism is not a problem because evaluation of an expression
always returns the same value no matter what. For example, the expression
(2*4)+(3*5)
could be executed concurrently, with the left and right
products evaluated at the same time. The answer would not change. Imperative
programming is much more problematic. For example, the expressions
(!x)
and (a := !a+1)
, if executed by two different
threads, could give different results depending on which thread executed first,
if it happened that x
and a
were the same ref.
A few modern languages directly support concurrent programming. Java is one. Languages like C and C++ don't directly support concurrency, though most
operating systems allow concurrent programs to be written in these languages,
somewhat awkwardly. It turns out that the SML distribution includes Concurrent ML (CML), an extension
to SML that supports a relatively clean model of concurrent programming.
Concurrent ML is found in the sml/src/cml directory of the
distribution. It's actually written in SML, and you
can compile it by just running CM.make("cml.cm")
at the SML
prompt.
structure RunCML = struct (* doit(f, t) evaluates the expression f() with thread quantum t. * It returns the return status of the program. *) val doit: (unit->unit)*(Time.time option)->Word32.word ... end
The thread quantum is the amount of time that a processor will work on executing any one thread before switching to another thread. Although we think of the machine as running all the threads at once, it is much more efficient for a processor to execute one at a time; for one thing, the various caches work better. As long as the quantum is sufficiently small (usually, a few milliseconds), it isn't noticeable. The machine may have multiple processors that can each work on running a separate thread, but the semantics of running a program don't change depending on the number of processors. A concurrent system has a scheduler that decides what thread to run on a given processor. When the current thread's quantum expires, the scheduler is invoked.
CML provides a special operation that creates a new thread:
structure CML = struct (* spawn(f) creates a new thread that evaluates the expression f() * concurrently with the current thread. It returns the thread * identifier of the new thread. *) val spawn : (unit -> unit) -> thread_id ... end
For example, we can write a program that spawns two threads that generate output:
- fun prog() = (CML.spawn (fn() => print "hello!"); print "goodbye!"); - val q = Time.fromMilliseconds(1) - RunCML.doit(prog, q)
There are two possible executions of this code: it might print "hello!goodbye!" or "goodbye!hello!", depending on whether the spawned thread gets to run first or its parent thread does. If we care which one we get, this code won't do.
You've probably noticed that the computation of a thread is given type
unit->unit
, which doesn't give a lot of opportunity for a thread
to send a result back to its parent thread. For example, if the web
browser spawns a thread to read an image embedded in a web page, it needs to get
the actual image data back from that thread. One obvious way to accomplish this
is using refs. Here is a circuitous way to add two numbers:
fun prog() = let val result = ref 0 in CML.spawn (fn() => result := 2+2); print(Int.toString(!result)) end
If we're lucky this will work, but what if the parent tries to access the contents of result before it is updated? In that case we'll read the original 0. Assuming that we know the result isn't zero, we could try to wait until it gets updated. That is, we want to synchronize the actions of one thread with those of another:
fun prog() = let val result = ref 0 fun wait() = if !result = 0 then wait() else () in CML.spawn (fn() => result := 2+2); wait(); print(Int.toString(!result)) end
This is an example of a primitive synchronization technique known as
spinning.. In this case we don't want the printing thread to
print until the computing thread is done. On a single-processor system, this is
probably an unsatisfactory synchronization technique because the parent thread
might waste processor time waiting for the result to arrive. It can make sense
in a multiprocessor system if the expected spinning duration is small. (CML
provides a function yield()
that allows a thread to give up its
quantum, which can be helpful.)
For real programs we need more powerful synchronization techniques. Consider what happens if we write a simple web server that allows money transfers between two accounts (represented as refs). A web server typically spawns threads to handle each incoming request. We could easily end up with code with an effect like the following:
fun prog() = let val savings = ref 1000
val checking = ref 1000
fun transfer(n: int) = (savings := !savings - n; checking := !checking + n)
in
CML.spawn(fn() => transfer(100)); (* thread 1 *)
CML.spawn(fn() => transfer(100)); (* thread 2 *)
print(Int.toString(!savings)^" "^Int.toString(!checking))
end
Clearly, we would expect this to print out "800 1200". But it might not,
because the threads can be scheduled in other ways. Each thread does a read and
a write from each of checking
and savings
. Consider
some possible orders of execution on a single-processor machine:
thread 1 thread 2 read savings (1000) write savings (900) read checking (1000) write checking (1100) read savings (900) write savings (800) read checking (1100) write checking (1200) Result: 800 1200
thread 1 thread 2 read savings (1000) read savings (1000) write savings (900) write savings (900) read checking (1000) write checking (1100) read checking (1100) write checking (1200) Result: 900 1200
With the second, entirely possible schedule of execution, $100 is manufactured from thin air. Worst yet, we could test this code quite a bit and have it return the right result every time. Yet when deployed as a product, it will occasionally create or consume money. The problem is that we really cannot allow two threads to execute the transfer code at the same time; it is an example of a critical section that only one thread should be able to run at a time.
This kind of problem is the reason for the synchronized statement and
attribute in Java. In Java we could wrap synchronized
around the
whole transfer function, and prevent the interleaved executions shown above.
Another language feature that can be used to prevent interleaved access is
locks. One thread acquires a lock, does the transfer, and
releases the lock. If a thread tries to acquire a lock that is currently
held by another thread, it blocks waiting until the first thread releases
the lock. This kind of simple lock is known as a mutex, for "mutual
exclusion". Locks are difficult to program with if there is more than one lock,
because of the possibility of deadlock when two or more threads can both
try to acquire locks the other one holds, e.g.
thread 1 thread 2 acquire(L1) acquire(L2) ... ... acquire(L2) acquire(L1)
In this example both threads will block and the program will stop. Debugging programs to eliminate deadlocks can be very difficult.
These mutual exclusion features (such as synchronized
and
mutexes) can be implemented using just refs, but it turns out to be amazingly
difficult to get right; for this reason they are usually provided as primitives.
What we have just been describing is known as a shared-memory approach to thread communication, because the state of refs is shared among the various threads. Shared-memory communication does not work in all concurrent programming models; for example, the standard programming model of Unix (Linux, etc.) is based on processes rather than threads. The major difference is that processes do not share any state; a spawned process gets a copy of the state of its parent process.
For the reasons we've just seen, CML discourages communication through refs; instead, it takes the other major approach to managing thread communication and synchronization, called message-passing. Message passing has the benefit of being easier to reason about, and also easier to implement in a distributed system. In CML, threads communicate and synchronize using channels, mailboxes, and events (These are terms specific to CML.) Channels and mailboxes provide the ability to deliver values from one thread to another. Events give a thread the ability to synchronize on activity by multiple other threads.
structure CML = struct ... type 'a chan val channel: unit -> 'a chan val send: 'a chan * 'a -> unit val recv: 'a chan -> 'a ...
A value of type T chan
is a channel that transmits values of
type T
. A new channel is created using channel
. The
channel allows two threads to synchronize: a sending thread and a receiving
thread. When a thread evaluates send(c,x)
for some channel
c
and message value x
, it then blocks waiting for some
thread to receive the value by calling recv(c)
. Once one thread is
waiting on send
and another on recv
, the value
x
is transferred and becomes the result of the recv
.
The two threads then both resume execution. Similarly, if a thread performs a
recv(c)
but there is no other thread doing a send already, the
receiving thread blocks waiting for a sender. This is known as
synchronous message-passing because the sender and receiver synchronize
at the moment that the message is delivered.
Here is a simple example of using channels:
open CML fun prog() = let val c1: int chan = channel() in spawn (fn() => send(c1,2)); spawn (fn() => print(Int.toString(recv(c1)))); () end
struct Mailbox = struct type 'a mbox val mailbox : unit -> 'a mbox val send : ('a mbox * 'a) -> unit val recv : 'a mbox -> 'a ... end
Mailboxes provide asynchronous messages: the sender does not wait for the receiver before going on. Otherwise they act like channels. A mailbox provides a FIFO message queue: messages are delivered in the order they were sent. This is important because a mailbox can contain a large number of messages. Mailboxes can be implemented using channels and threads; it's a good exercise to think about how to do this.
Concurrent applications need the ability to select from several different
possible input sources. CML provides this ability through the event
abstraction:
structure CML = struct ... val recvEvt: 'a chan -> 'a event val select: 'a event list -> 'a ... end structure Mailbox = struct val recvEvt: 'a mbox -> 'a event ... end
Given a channel or a mailbox, we can generate a corresponding event to
synchronize on. Given a list of events, the select
function blocks
until one of the events arrives, then reads from the corresponding channel or
mailbox. Without select
the program can only test for incoming data
on one channel at a time, blocking if there is no data. In Unix there is a
system call select
that provides similar functionality.
Using events we can write an extended version of the banking example from earlier. Since we want only one thread to be able to do the update at a time, we invent a thread whose job that is. This thread also processes requests to read the balance, because otherwise a read might be interleaved with an update, resulting in inconsistent account balances. Other threads communicate with it via channels:
open CML fun prog() = let val c1: int chan = channel() val e1 = recvEvt(c1) val c2: int chan = channel() val e2 = recvEvt(c2) in spawn(fn() => send(c1,100)); spawn(fn() => send(c2,100)); spawn(fn() => let val savings = ref 1000 val checking = ref 1000 fun server() = ( let val amount = select([e1,e2]) in savings := !savings - amount; checking := !checking + amount end; server()) in server() end); print "main thread done" end
(What if we wanted the server to send back results? What kind of channel could we use then?)
A thread may also want to select from a number of different channels to send
output on. In this case it might want to choose the channel on which
there is already a receiver waiting. Send events provide this
functionality. A send event is created by using the sendEvt
function:
val sendEvt: 'a chan * 'a -> unit event
Selection on a send event created with sendEvt(c,v)
enables it
to send the value v
when there is a receiver waiting on the channel
c
. The select
call then returns a unit value to
indicate that the send has occurred.
In general a CML thread may want to wait on various different events, with different associated types. The events cannot be put onto a common event list because the types are not equal. Events can be wrapped to give them a different type:
val wrap: 'a event * ('a -> 'b) -> 'b event
This allows simultaneous selection on receive and send events, for example. It also helps keep track of which of several channels delivered an event. In the server example above, we might want to know which client thread sent a value, which can be accomplished by tagging the request:
let val (client: int, amount:int) = select([wrap(e1, fn(a) => (1,a)), wrap(e2, fn(a) => (2,a))])
When a value arrives on the channel, the function wrapped around the event is automatically applied to that value.
Concurrent ML home
page
John H. Reppy, Concurrent Programming in ML, Cambridge
University Press, 1999.