Locking granularities


Assignment outline:

For this assignment you will be experimenting with different granularities of locking on a data structure that is shared among many threads.

The data structure you will be experimenting with is a simple n x n array of numbers. The invariant is that the sum of all entrees in the grid remain the same.

n will range from 1 to 10.

You will have many threads executing in your program.

The life cycle of each thread is:

	for i = 1 to NO_SWAPS
		cell1 = random cell
		cell2 = random cell
		swap(cell1, cell2)
	end for

In order to allow more than 1 thread to execute this code concurrently on the data structure, there must be some locking to prevent corruption of data.

Since this is a gridded data structure, it is natural to think of four different granularities of locking:

Program usage: gridapp gridsize numthreads locking_granularity

Examples:

gridapp 8 3 -cell
- executes gridapp with an 8 x 8 array, 3 threads running, with cell level locking

gridapp 10 100 -grid
- 10 x 10 array, 100 threads running, grid level locking

gridapp 10 100 -row
10 x 10 array, 100 threads running, row level locking

gridapp 10 100 -none
10 x 10 array, 100 threads running, *** NO LOCKING *** this is for testing, and data should be corrupted at the end.


Program output:

The initial and final sums should match if there is no data corruption

(i.e. the synchronization code is correct).


Steps for your assignment

  1. Examine the skeleton code we have provided: gridapp.c in gridapp.tar.gz
    Note that grid initialization, printing, and summing functions are provided.
    We have also provided the thread starting code and the thread's function body.
    Note the sleep(1) hidden in the middle of the swap operation. You will investigate this later in the assignment.
  2. Compile (note the -lpthread)and execute this code, examine results:
    gcc -o gridapp gridapp.c -lpthread
    ./gridapp 10 1 -none
  3. We have provided everything you need except for the locking data structures and locking code. (Don't forget threads_left!)
    You should be able to fill in the necessary code using either pthread synchronization primitives like pthread_mutex_lock/pthread_mutex_unlock or pthread_cond_wait/pthread_cond_signal. You will need to decide how many locks to use and what each one protects. v

    NOTE THAT YOU WILL NEED A DEADLOCK PREVENTION SCHEME. You can add code elsewhere if necessary like global variable declarations, etc.

  4. Verify that the concurrency is working correctly:
      Run
      
    	    ./gridapp 10 10 -cell
    	    ./gridapp 10 10 -row
    	    ./gridapp 10 10 -grid
      
      
    All of these should result in the same initial and final sum.
    There should also be significant differences in time elapsed. Note these results.
  5. Port your code to Windows using the Win32 API not pthreads for Windows:-) (Less portable I know but the point is to give you a different experience with Windows primitives like CreateThread, WaitForSingleObject, WaitForMultipleObject, etc.) You should produce a single source file that will compile under Linux using pthread or under Windows using the Win32API. You should use #defines to include the appropriate thread and locking calls in each enviornment (the #defines should bracket only these calls and not seperate the code into two seperate versions).

  6. OPTIONAL: For none, implement deadlock detection and spawn off a thread to run the deadlock detection algorithm.
  7. Now do a series of performance comparisons in either Windows or Linux (not both). Experiment with different grid sizes, thread counts, and granularities of locking.
    1. First, try running without locking again
      Run ./gridapp 10 10 -none
      The initial and final sum should be different.
      Also note how the elapsed time compares to the row, grid and cell cases.
    2. At grid size 10, graph the time elapsed on the y axis and thread count from 1 to 10 on the x axis. Make this graph for all four types of granularity (-none, -cell, -row, -grid).

      (Optional: What is the maximum number of threads you can create with pthread_create failing?)

    3. Now, leave the thread count at 10, graph the time elapsed on the y axis and the grid size from 2 to 10 on the x axis. Make this graph for all four types of granularity.
    Now, you could run all 76 tests one at a time, and copy and paste the results into MS Excel spreadsheets to graph the results.
    Or:

    OPTIONAL:
    Write scripts and/or small C programs to automate this process of generating data into CSV files that are easily imported and graphed by Excel or another spreadsheet program.

    OPTIONAL::
    When running performance tests, you usually want to execute multiple runs and take the average so as to get more accurate performance figures. So, extend your scripts to perform multiple executions (say 10) of each data point and calculate means and variances of the data, outputting these into the CSV file instead of the raw data.

  8. What can you conclude about how grid size, number of threads, and locking granularity affect performance?
    What general results can you conclude from this exercise? When does fine granularity locking have the biggest effect? What costs does it impose?
  9. Experiment with no locking. Without any modifcations, you shouldn't get any integrity violation with 1 thread. Why? With a big enough grid and few enough threads, you might avoid data integrity violations. Why?
  10. Note the sleep(1) command in the swapping code.

What to submit:

A single gzipped tar file (file.tar.gz) containing one directory with the following files:

Optional-- source files for scripts/C programs that generate --> -- automated CSV data