Lecture 18: Concurrency—Producer/Consumer Pattern and Thread Pools

A classic concurrent programming design pattern is producer-consumer, where processes are designated as either producers or consumers. The producers are responsible for adding to some shared data structure and the consumers are responsible for removing from that structure. Only one party, either a single producer or a single consumer, can access the structure at any given time.

Here we consider an example with a shared queue, using a mutex (introduced previously) to protect the queue:

let f = Queue.create() and m = Mutex.create()

We divide the work of a producer into two parts: produce, which simulates the work of creating a product, and store, which adds the product to the shared queue. The produce operation increments the counter p that it is passed, sleeps for d seconds to simulate the time taken to produce, and outputs a status message:

let produce i p d =
  incr p;
  Thread.delay d;
  Printf.printf "Producer %d has produced %d" i !p;
  print_newline()

The operation store acquires the mutex m, adds to the shared queue, outputs a status message and releases the mutex:

let store i p =
  Mutex.lock m;
  Queue.add (i, !p) f;
  Printf.printf "Producer %d has added its %d-th product" i !p;
  print_newline();
  Mutex.unlock m

The producer loops n times. Each time through the loop, it calls produce and then store, then sleeps for a random interval of time up to 2.5 seconds. When it is done looping, it outputs a status message.

let producer (n, i) =
  let p = ref 0
  and d = Random.float 2. in
    for j = 1 to n do
      produce i p d;
      store i p;
      Thread.delay (Random.float 2.5)
    done;
    Printf.printf "Producer %d is exiting." i;
    print_newline()

The consumer loops n times, acquiring the mutex m, then attempting to take an item from the shared queue. If it succeeds, it prints out the item. If not, it prints out that it failed to get an item. In either event, it unlocks the mutex and then waits a random interval of time up to 2.5 seconds. When it is done looping, it outputs a status message.

let consumer (n, i) =
  for j = 1 to n do
    Mutex.lock m;
    (try
    let ip, p = Queue.take f
    in
      Printf.printf "Consumer %d has taken product (%d,%d)" i ip p;
      print_newline()
    with Queue.Empty ->
      Printf.printf "Consumer %d has returned empty-handed" i;
      print_newline());
    Mutex.unlock m;
    Thread.delay (Random.float 2.5)
  done;
  Printf.printf "Consumer %d is exiting." i;
  print_newline()

This use of mutual exclusion is very coarse grained. It would be better to be able to have a consumer wait until something is in the queue rather than returning empty handed. For this we can make use of condition variables, which were introduced previously. Now the store function store2 signals the condition c to indicate that something is in the queue.

let c = Condition.create()

let store2 i p =
  Mutex.lock m;
  Queue.add (i, !p) f;
  Printf.printf "Producer %d has added its %d-th product" i !p;
  print_newline();
  Condition.signal c;
  Mutex.unlock m

Now we split the consumer function consumer2 into two parts, wait and take.

let wait i =
  Mutex.lock m;
  while Queue.length f = 0 do
    Printf.printf "Consumer %d is waiting" i;
    print_newline();
    Condition.wait c m
  done

let take i =
  let ip, p = Queue.take f in
    Printf.printf "Consumer %d has taken product (%d,%d)" i ip p;
    print_newline();
    Mutex.unlock m

let consumer2 (n, i) =
  for j = 1 to n do
    wait i;
    take i;
    Thread.delay (Random.float 2.5)
  done;
  Printf.printf "Consumer %d is exiting." i;
  print_newline()

Note that wait locks the mutex, so no work should be done after wait returns unless it is intended to be in the critical section. Generally, a function that has the effect of locking or unlocking a mutex should be used with caution, and should be clearly documented as doing so.

Thread Pools

A thread pool consists of a collection of threads, called workers, that are used to process work. Each worker looks for new work to be done. When it finds work to do, it does it, and when finished, it goes back to get more work. The workers play the role of consumers in the producer-consumer model that we just considered above. In fact, thread pool implementations often use a shared queue to store the work, thus building quite directly on the previous example. Before considering implementation of thread pool, let's get a better idea of what it does and where it is useful.

The basic operations on a thread pool are:

create a new thread pool with some specified number of workers,
add work to an existing thread pool, which will subsequently be performed by one of the workers, and
destroy an existing thread pool, shutting it down once all previously added work is complete.

Here is the signature for a basic thread pool:

module type SIMPLE_THREAD_POOL = sig
  type pool
  (* A No_workers exception is thrown if addwork is called when the
     threadpool is being shut down.  The work is not added. *)
  exception No_workers
  (* create a thread pool with the specified number of worker threads *)
  val create: int -> pool
  (* add work to the pool, where work is any unit -> unit function *)
  val addwork: (unit -> unit) -> pool -> unit
  (* destroy a thread pool, stopping all the threads once all work
   * in the pool has been completed. *)
  val destroy: pool -> unit
end

Thread pools are particularly useful in settings where work arrives asynchronously, such as occurs with a server where many network requests may need to be handled promptly. In such settings, a thread receives an event such as a network request, adds the corresponding work to a thread pool (which will be run at some point in the future), and then quickly returns indicating to the caller that the request will be handled. Sometimes it is also useful to have a handle associated with each unit of work to which some value is sent. The simple abstraction that we presented here does not have any means of returning a result, as the functions representing work are of type unit -> unit.

Here we consider an implementation of the SIMPLE_THREAD_POOL interface in terms of a 4-tuple: a mutable counter, a mutable queue of functions that are the work remaining to be done, a mutex, and a condition variable. The mutex is used to protect the counter and the queue. The condition variable is used to signal when a worker should wake up to get new work.

module Tpool : SIMPLE_THREAD_POOL = struct
  type pool = int ref * (unit -> unit) Queue.t * Mutex.t * Condition.t
  exception No_workers
  let dowork tp = ...
  let create size = ...
  let addwork f tp = ...
  let rec done_wait tp n = ...
  let destroy tp = ...
end

If the counter is positive, it indicates the number of worker threads that the thread pool was created with. If the counter is non-positive, it indicates that the thread pool is being destroyed, and the absolute value of the counter is the number of threads that have properly exited. This allows the destroy function to wait for the threads to finish their work and exit before returning.

Each worker thread runs the function dowork. This function is not exposed in the interface, so it can only be called from inside the implementation of Tpool. The function dowork loops as long as the thread pool has not finished its work. When the work is finished, it exits. A thread pool is finished when it is being destroyed and there is no work remaining to do. We use the counter in the 4-tuple, here called nworkers, to indicate that the thread pool is being destroyed by setting its value to something less than 1. In that case, if the queue of work is also empty, then the thread exits as the pool is finished. Otherwise, on each loop the worker waits for work to do, and then takes that work from the queue, executing it inside a try to ensure that unhandled exceptions in the work do not cause the worker to exit.

  let dowork (tp : pool) =
    match tp with (nworkers, q, m, c) ->
      Mutex.lock m;
      (* When nworkers <= 0, the thread pool is being
       * destroyed.  If that is true and there is also no work left to do
       * then stop looping and drop through to exit processing. *)
      while (!nworkers > 0) || (Queue.length q > 0) do
        (* If nworkers > 0, wait for stuff in the queue. *)
        while (!nworkers > 0) && (Queue.length q = 0) do
          dbgprint "waiting";
          Condition.wait c m
        done;
      (* Verify something in the queue rather than we are now being
       * shut down, then get the work from the queue, unlock the
       * mutex, do the work and relock the mutex before looping
       * back. *)
      if Queue.length q > 0
      then
        let f = Queue.take q in
          dbgprint "starting work";
          Mutex.unlock m;
          (* Don't let an exception in the work, f, kill the thread,
           * just catch it and go on. *)
          (try ignore (f()) with _ -> ());
          Mutex.lock m
      done;
    (* A worker thread exits when the pool is being shut down.  It
     * decrements the worker count, which should be -n when all
     * threads are finished, where n was the number of threads in
     * the pool (counts down from 0). *)
    decr nworkers;
    dbgprint "exiting";
    Mutex.unlock m

Note the use of the mutex in the 4-tuple, called m here, to protect accesses to the queue and the counter. Before entering the while loop, and at the end of the while loop (thus before the next iteration of the loop), the mutex must be locked because both the counter and the queue are accessed. The nested while loop checking for work to do uses Condition.wait to release the mutex and sleep until it receives the condition of work being ready. Recall that this reacquires the mutex before returning. It is important that the mutex is then unlocked before calling f, the work to be done. This allows other workers to safely run concurrently, as f cannot access the queue or counter. Then the mutex is reacquired before the end of the while loop. When exiting the while loop, the mutex is already locked, so the counter is simply decremented and then the mutex is released before the worker exits.

The create function makes a 4-tuple and starts up the specified number of threads, each of which runs the dowork function. The create function simply acquires the mutex at the beginning and releases it at the end. There are other possible ways of writing this code in which the mutex is not held during the entire process of creating the thread pool.

  let create size =
    if size < 1
    then failwith "Tpool create needs at least one thread"
    else let tp =
      (ref 0, Queue.create(), Mutex.create(), Condition.create()) in
    match tp with (nworkers, _, m, _) ->
      Mutex.lock m;
      while !nworkers < size do
        ignore (Thread.create dowork tp);
        incr nworkers
      done;
      Mutex.unlock m;
      tp

The add_work function adds the given function to the queue. To do so, it first locks the mutex and checks whether the pool is being destroyed. If the pools is being destroyed, instead of adding the work it raises the No_workers exception after first releasing the mutex. If the pool is not being destroyed, it adds the work to the queue, signals that there is work to be done, and unlocks the mutex.

  let addwork (f : unit -> unit) (tp : pool) =
    match tp with (nworkers, q, m, c) ->
      Mutex.lock m;
      if !nworkers < 1
      then (Mutex.unlock m; raise No_workers)
      else
        (Queue.add f q;
        Condition.signal c;
        Mutex.unlock m)

The destroy function acquires the mutex, then sets the number of workers to zero to indicate that the thread pool is being shut down. It then broadcasts to all the workers that there is something to be done to ensure that all the workers exit, including ones that were currently sleeping while awaiting work. Finally, it releases the mutex, then waits for all the workers to exit using the helper function done_wait.

  let destroy (tp : pool) =
    match tp with (nworkers, _, m, c) ->
      Mutex.lock m;
      let n = !nworkers in
    nworkers := 0;
    Condition.broadcast c;
    Mutex.unlock m;
    done_wait tp n