Concurrent Hash Table

Instructions: Remember, all assignments in CS 3410 are individual. You must submit work that is 100% your own. Remember to ask for help from the CS 3410 staff in office hours or on Ed! If you discuss the assignment with anyone else, be careful not to share your actual work, and include an acknowledgment of the discussion in a collaboration.txt file along with your submission.

The assignment is due via Gradescope at 11:59pm on the due date indicated on the schedule.

Submission Requirements
Restrictions
Provided Files
Overview
Task 0: Introduction to Inline Assembly
- Structure
- Exercises
Task 1: Spinlock
Task 2: Condition Variable (Wait & Broadcast)
Task 3: Barrier
Task 4: Readers-Writer Lock
Task 5: Concurrent Hash Table
- Test Function
Running and Testing
Submission

Submission Requirements

Please submit these files via Gradescope:

barrier.c, which is your implementation of a thread barrier.
hash_table.c, which is your implementation of a concurrent hash table.
lab.c, which is the lab activity.
readers-writer.c, which is your implementation of a readers-writer lock.
spinlock.c, which is your implementation of a spinlock.
wait-broadcast.c, which is your implementation of wait and broadcast for a condition variable.

Restrictions

You may not use any additional #include directives. Including libraries such as pthread.h (excluding its use in the test files) or syscall.h will result in a score of zero on this assignment. (The point of this assignment is to implement some of the functionality of these libraries yourself, so it is critical that you do not use them in your solution.)
As always, do not change other files (outside of the ones you turn in, listed above). We will use the original versions of these files for grading, so our grading results will not reflect any changes.

Provided Files

The following files are provided in the release:

lab.c, the lab activity.
{barrier, hash_table, readers-writer, spinlock, wait-broadcast}.c, which includes the necessary function signatures and #include directives.
{barrier, hash_table, readers-writer, spinlock, wait-broadcast}.h, the above’s header file.
test_{barrier, hash_table, readers-writer, spinlock, wait-broadcast}.c, which provides the structure for testing the above.
Makefile, which will appropriately compile and link the above to produce an executable, test_<construct>.

You can find these files in your personalized repository on GitHub.

Overview

In this assignment, you will implement synchronization primitives: the fundamental building blocks of all parallel programming. Some of these will require writing inline assembly to use RISC-V’s LR and SC atomic instructions. Then, you will use the synchronization primitives to create a concurrent hash table.

The purpose of this assignment is to learn how threads and synchronization actually work by implementing the basic building blocks yourself. For that reason, you may not use existing libraries such as the POSIX threads library.

Task 0: Introduction to Inline Assembly

As stated in the overview, inline assembly will be necessary for implementing certain synchronization primitives. As part 0 of this assignment, in lab, you will work through some exercises on writing inline RISC-V assembly.

Inline assembly lets you mix some assembly code into your C programs. You can even exchange data between C variables and registers in assembly. It can be useful for situations where you know exactly what assembly code you want, and C doesn’t have a construct that generates that code. That’s the case for the synchronization primitives in this assignment, which require careful use of RISC-V’s atomic instructions.

Structure

Use this syntax to add inline assembly to your C code:


__asm__ volatile(
  // Assembly instructions
  : // Output operands
  : // Input operands
  : // Clobber list
  : // Goto list
);

The volatile keyword instructs the compiler to avoid “optimizing your code away,” so the instructions appear verbatim in the compiled program. The first thing in the parentheses is the assembly code itself. Then, the lists after each colon describe how the assembly code interacts with the rest of the program:

The first two rows are for inputs and outputs. These specify the variables that the inline assembly will interact with.
The third row is for the clobber list. This list describes to the compiler what the assembly code (might) overwrite. For RISC-V, this list can contain register names and the special name "memory" to indicate that the assembly writes to memory.
Finally, the fourth row is to inform the compiler of the list of goto labels used in the assembly.

Here’s an example that calculates and returns $a + 3b$ :


int a_plus_3b(int a, int b) {
  int result;
  __asm__ volatile(
    "slliw t0, %2, 1\n"
    "addw t0, t0, %2\n"
    "addw %0, t0, %1\n"
    : "=r" (result)
    : "r" (a), "r" (b)
    : "t0");
  return result;
}

Notice that the assembly code uses placeholders like %0 and %1 in places where register names (like x17 or a1) would usually appear. These placeholders let the assembly code refer to C variables:

%0 refers to the first operand that appears in the :-delimited lists below the assembly code. In this case, result.
%1 is the second operand, a.
%2 is the third operand, b.

Then, the three lines after the assembly code describe how it uses registers. The r in these lines indicate a variable that should be placed in a general-purpose register. The = means that the assembly will write to that register. (These are called constraints and constraint modifiers.) Here’s what the three lines mean:

The first line is the output operands. "=r" (result) says that the C variable result should be placed in a register so the assembly code can write to it.
For the input operands, "r" (a), "r" (b) makes the arguments a and b available in registers.
The third line is the clobber list. We include t0 here to indicate that the assembly code overwrites register t0. When you write inline assembly, remember to list all the registers that the assembly writes to.
We omit the goto list because our assembly does not use any labels.

Beyond r and =, some other basic constraints and constraint modifiers are:

m: The operand lives in memory.
f: The operand lives in a floating point register.
i: The operand is a constant integer (immediate).
F: The operand is a constant floating point number.
+: The operand is both read from and written to.
&: The operand is written to before all (note: not any) operands have been read.

Exercises

Complete the functions in lab.c by writing inline assembly.

These two exercises are independent from the rest of the assignment, but they will be useful. They should be submitted together in a file lab.c. Do not change the function signatures.

You can compile lab.c with this command:


$ gcc lab.c -pthread -o lab

Atomic Increment

This is a function that atomically adds 1 to an integer variable in memory, var, and returns its original value. “Atomically” means that other threads cannot interfere with the increment: they cannot change the variable between the load and store. For example, it should be impossible to have two threads simultaneously increment the variable and both read the same original value, leading to lost updates. This kind of lost update is possible with a normal, non-atomic implementation (load, add 1, store), so you must use RISC-V’s atomic instructions (lr and sc). Atomic increments ensure that each thread sees a consistent and up-to-date value of the variable in concurrent environments.

Compare-and-Swap

This function implements the CAS operation, which atomically compares the current value of an integer var with an expected value old. If they are equal, it updates var to a new value new. The CAS operation ensures thread safety by preventing race conditions, as it guarantees that the update occurs only if no other thread has modified var in the meantime. The function returns true upon a successful swap and false if the current value did not match the expected value. A correct implementation will utilize the lr and sc instructions.

Task 1: Spinlock

A spinlock is an implementation of mutual exclusion where a waiting thread repeatedly checks if the associated lock is available. This is called “spinning” or “busy waiting.” Here you will implement the two functions of a spinlock:

spin_lock() to obtain the lock: spin until the lock becomes free, and then acquire it. (The lock may be free already, in which case your spin loop should exit immediately.)
spin_unlock() to release the lock.

Each lock is represented as an int*. Use the int in memory to store a value that indicates whether the lock is free or held by some thread. The purpose of the keyword volatile is to tell the compiler that other threads may be concurrently modifying the variable.

Hint: Any correct solution will use the RISC-V atomic instructions lr and sc. Using “ordinary” loads and stores cannot guarantee that memory updates will be visible to other threads in order.

Task 2: Condition Variable (Wait & Broadcast)

Monitors or condition variables are a concurrency mechanism where threads can wait for a condition to become true. Other threads can wake up the waiting threads by broadcasting a signal when the condition changes.

To use a condition variable, a program always pairs it with a lock. For example, imagine a program that uses a queue data structure to keep track of work to do. The program would use a lock to protect the queue, so any thread that pushes or pops the queue must hold the lock while it does so. Now, imagine that some “worker” threads want to wait for the queue to become nonempty: when some work becomes available to do. The program could use a condition variable that indicates when the queue is nonempty. When any thread pushes new work into an empty queue, it would broadcast a notification to all the waiting threads that the condition has changed.

There are two functions for you to implement here:

wait(lock, condition). The first argument, lock, is a spinlock as you implemented it in the previous task. The second argument, condition, is the condition variable (a pointer). The function should immediately release lock and then wait for another thread to call broadcast(condition) on the same condition variable. When that happens, acquire lock again and return.
broadcast(condition). Calling this should wake all threads waiting on the associated condition.

One possibly counterintuitive aspect of this API is that, while condition must be a pointer to valid memory, the value it points to isn’t important. It would be OK for this value to always be zero, for example. The actual logical condition (e.g., “the queue is nonempty”) is generally stored in some other application state.

Use your spin_unlock and spin_lock functions to release and acquire the lock in wait. Then, to sleep until another thread calls broadcast, you must put the thread to sleep using a system call instead of spinning. While you usually make system calls via functions in the C standard library, you are not allowed to do so this time (recall that you may not import any additional headers). Instead, you must use inline RISC-V assembly to perform the system call. Refer to the lecture notes on using the ecall instruction to perform system calls.

You must also determine the appropriate Linux system calls to make. See the syscalls manual page for a complete list. Then, look at the unistd.h header from Linux or this searchable list (under the “riscv64” column) to find the system call number for the call you want to use. A good place to start would be the futex syscall, which provides a wide variety of sleeping/waiting functionality.

Task 3: Barrier

When a thread encounters an $n$ -thread barrier, it must wait for $n$ threads to reach the barrier to continue. Barriers are especially useful for bulk synchronous parallelism, where many threads coordinate to work on a problem in coarse-grained steps.

Aside from initializing the barrier, there is one function to implement:

barrier_wait(barrier): If the thread that calls this is the $n$ th to reach the barrier, all threads waiting at this barrier should be awoken. Otherwise, this thread should be put to sleep.

We have provided a barrier struct (see barrier.h) that holds information like $n$ and the current number of threads waiting for the barrier. You can (and probably should) use your spinlock and condition variable (wait and broadcast) functions to implement your barrier.

Ensure that waiting threads go to sleep instead of spinning.

Hint: If you use your functions from previous tasks, it is possible to implement the barrier correctly in pure C, without any inline assembly.

Task 4: Readers-Writer Lock

In parallel programs, we should be able to distinguish between critical and non-critical actions: those that need synchronization and those that don’t. For example, in the case of reading from and writing to a data structure, parallel writes can lead to race conditions, but parallel reads can be safe. If many threads just need to read the same data concurrently, it is needlessly slow to serialize them.

A readers–writer lock embodies this distinction. Like your basic spinlock, threads will acquire and release the lock to synchronize. Unlike the spinlock, threads must distinguish between acquiring the lock as a reader vs. acquiring the lock as a writer. Multiple threads should be able to read from the lock-protected data in parallel, but only one thread should be able to write to it at a time. And while one thread is writing, no other threads may be reading or writing.

You will implement a write-preferring policy. This means that, when there is a writer waiting to acquire the lock, no new readers can acquire it.

Aside from initializing the readers-writer lock, there are four functions to implement:

start_read() to acquire the lock as a reader, requesting permission to read the protected data structure. If there is an active writer, then sleep until the writer releases the lock.
end_read() to release the lock after a start_read(), indicating that its read operation is completed.
start_write() to acquire the lock as a writer. If there are are active readers or writers, then sleep until there are none.
end_write() to release the lock after a start_write().

We have again provided a rw_lock struct in readers-writer.h with all the data you need to construct a readers–writer lock.

As in the previous two tasks, put waiting threads to sleep instead of spinning. You will again want to use spin_lock and spin_unlock to protect the data within the rw_lock struct itself, but be sure that (for example) threads that call start_read while there is an active writer go to sleep.

Use any of your implementations from the previous tasks to implement the readers–writer lock. It is again possible to do this in pure C, without inline assembly.

Task 5: Concurrent Hash Table

Finally, you will implement a hash table that allows for insertion, deletion, and parallel accesses. This hash table handles collisions via separate chaining (i.e., each bucket is a linked list, and a node is added to the linked list upon insertion of a new key).

The idea with a concurrent hash table is that it is safe to use in parallel threads. That is, multiple threads are allowed to concurrently insert, look up, and delete values without holding any locks themselves. The concurrent hash table performs all necessary synchronization internally. It guarantees that all of the operations happen atomically: for example, while one thread is inserting a value into the table, no other thread can observe an inconsistent intermediate state.

Your task is to use any of your synchronization primitives from the previous tasks to implement a concurrent hash table. Aside from initializing the hash table, there are four functions to implement:

cht_insert() to insert a key/value pair into the hash table.
cht_delete() to remove the node with the specified key from the hash table.
cht_get() to return the value in the hash table associated with a specified key. If there is no such key, return INT_MIN.
thread_cht_requests(), for testing (explained below).

The focus of this assignment is the synchronization primitives, so we have kept this hash table simple: the keys and values are both of type int. The hash table does not resize; it has a constant number of buckets.

Test Function

The thread_cht_requests() function is for testing your hash table. It is used in test_hash_table.c. That program launches several concurrent threads that all run thread_cht_requests().

The idea is that thread_cht_requests() receives a queue of operations to perform. It should repeatedly dequeue operations (cht_request values) and perform them: i.e., look at request.op, which is one of CHT_INSERT, CHT_DELETE, or CHT_GET, and call one of your cht_* functions accordingly. Our test program works by reading a list of requests from a file and then adding them to this queue for processing.

The test program, test_hash_table.c, creates several threads that all run your thread_cht_requests() function. Because of the way the pthreads library works, the argument and return value to this function have type void*, but the argument will be a pointer to a cht_thread_arg struct (and the return value is ignored). This struct has a pointer to a cht_request_queue that contains the operations that the threads should perform. Your thread function should repeatedly call dequeue_cht_request to obtain a request and then perform the indicated hash table operation.

To understand more about how thread_cht_requests() should behave, you can see how it is used during testing in test_hash_table.c.

Running and Testing

For each synchronization primitive in this assignment, we provide a file named test_<primitive>.c. It contains an empty function thread_function(), where you can write some code to test your primitive. The program launches some number of threads (given on the command line), each of which runs thread_function(). You should add code there that calls your synchronization function repeatedly and ensures that the threads interact in the way you want.

You can compile these programs by running make <primitive>. The executable then takes one command-line argument for the number of threads. Do not submit these files; they are only for your own testing.

The test program for Task 5 is a little different; it calls the thread_cht_requests() function in your hash_table.c. This function is a required part of this assignment. Notice also that test_hash_table.c uses your implementation of barriers to synchronize threads.

Submission

On Gradescope, submit barrier.c, hash_table.c, lab.c, readers-writer.c, spinlock.c, and wait-broadcast.c.