The following solution is safe, live and fair. Note that it can be generalized to multiple threads, but it is not obvious how to do so.
Shared state:
has_milk = False working_1 = False working_2 = False turn = 0 |
|
Thread one code:
1: working_1 = True 2: turn = 2 3: while working_2 and turn == 2: 4: do nothing 5: if not has_milk: 6: buy milk 7: has_milk = True 8: working_1 = False |
Thread two code: (symmetric)
11: working_2 = True 12: turn = 1 13: while working_1 and turn == 1: 14: do nothing 15: if not has_milk: 16: buy milk 17: has_milk = True 18: working_2 = False |
The idea behind this code is that neither can take control from the other, they can only yield control to the other.
This code is safe, live, and fair, although the argument is rather complicated:
safety: clearly, by the time either thread finishes, milk will have been bought at least once. However, we must show that it is bought at most once.
Suppose otherwise, that is, that both lines 6 and 16 are executed. This implies that thread one must have been on lines 5-7 at the same time that thread two was on lines 15-17. One of the two threads must have exited the while loop first. Without loss of generality, assume it was thread one. When it exited the loop on line 3, one of two things was true:liveness: the only place that the threads can get stuck is in the spin loops on lines 3 and 13. However, both threads cannot be stuck simultaneously, because turn cannot be both 1 and 2. Once one of the threads proceeds past the spin lock, it will eventually set its working variable to false, which will allow the other thread to exit from the spin loop
fairness: the code is completely symmetric, and thus fair.
Although this solution is correct, it is difficult to write, and even harder to reason about. This is inherently harder than writing sequential code, because instead of considering a single path of execution, there are an exponential number of paths to consider (exponential in the length of the code: roughly speaking, for each instruction, either of the two threads could execute next, so there are 2^length possible sequences of operations).
A small amount of hardware support can help considerably. By atomically reading and writing an address in memory (without any other processor changing the state in between), we can write fairly simple locking code:
We discussed two common hardware primitives for this task:
Shared state:
lock = False |
|
Thread one code:
1: while test_and_set(lock): 2: do nothing 3: # critical section 4: lock = False |
Thread two code: (same)
5: while test_and_set(lock): 6: do nothing 7: # critical section 8: lock = False |
If a thread starts executing this code, the state of the lock will be false. The test_and_set instruction will simultaneously set the lock to true and return the value false. Since it returns false, the while loop does not execute, and the thread enters the critical section.
If another thread tries to enter the critical section, the lock will be set to true, so test_and_set will still set the lock to True, but will return True as well (since it was True before the TAS). The second thread will continue to execute the while loop until the first thread executes line 4. After that, the second thread's subsequent call will return False, allowing it to enter the critical section.
The process of continually monitoring a variable to wait for it to change is referred to as spinning; locks that are implemented using atomic operations are called spin locks.
The compare and swap instruction (CAS) is similar to, but more complicated than, the test_and_set instruction. The CAS instruction takes three parameters: a location, an "expected value" for that location, and a new value for the location.
It checks that the contents of the location match the expected value. If so, it replaces them with the new value, but if not it has no effect. In any case, the previous value of the variable is returned.
This can be used to implement a more sophisticated spin lock that stores the thread identifier in the lock (instead of just true or false). The following code ensures that at most one thread can be in the critical section, and if there is a thread in the critical section, then the value of the lock variable is the thread's identifier (or 0 if there is no thread in the CS):
Shared state:
owner = 0 |
|
Thread one code:
1: while compare_and_swap(owner, 0, thread 1): 2: do nothing 3: # critical section 4: # invariant: owner == 1 5: owner = 0 |
Thread two code: (symmetric)
5: while compare_and_swap(owner, 0, thread 2): 6: do nothing 7: # critical section 8: # invariant: owner == 2 9: owner = 0 |
CAS's are nice because they can be used to implement optimistic transactional data structures. The idea behind an optimistic data structure is that all updates are performed on a copy of the data structure; when the operations are finished, a compare and swap is used to replace the data structure in one fell swoop. For example, we may want to write code for a concurrent balanced binary search tree. Operations that modify the tree (such as insertion and balancing) will create a new tree and update the root pointer.
Shared state:
root = pointer to the root of the tree |
|
Insert code:
do old_root = root new_root = new Tree # copy old_root into new_root # do insertion into new_root until compare_and_swap (root, old_root, new_root) == old_root |
|
Balance code:
do old_root = root new_root = balanced_copy_of (old_root) until compare_and_swap (root, old_root, new_root) == old_root |
If an insertion is performed while a balance is in progress, then it will update the root to point to its new root. When the balancing thread completes, the compare_and_swap will fail, because the root will point to the new root that the insertion produced and not the original root pointer. The loop will then be repeated, and the new tree will be balanced instead.
Similarly, if the balance finishes before the insertion, then the CAS in the insertion code will fail (again, because root points to a different node than old_root), and the insertion will be retried on the new (balanced) root.
Semaphore is a data structure that encapsulates an integer. From the user's perspective, the integer is never allowed to become negative; attempting to decrement will block the running thread until another thread increments the count.
Semaphores support the following interface: - initialize the semaphore to an initial value - V: increment the semaphore, also called release, or signal. - P: block until the semaphore has a positive value, then decrement it. also called acquire or wait.
Some semaphore implementations allow you to perform other operations. You should avoid using anything other than P and V. For example, python provides the ability to acquire without blocking; other libraries provide the ability to read the internal value of a semaphore. Using these operations can easily lead you to write buggy code. Stick to P and V.