CS432 Assignment 6:
Transaction Simulation & Crash Recovery
- Deadline: December 9, 11:59pm. No late submissions will
be accepted. The course management system will not accept any
submissions after the deadline, and you will receive 0% of the grade for this
assignment if you do not turn it in by the deadline.
- This
is a group assignment. The group size can be at most 2 persons. You will
have to create groups using the course management system.
- You
can download any necessary files for the assignment using the course
management system.
- This
assignment is worth 10% of your overall grade.
Goal
In this assignment you will complete certain parts of a database engine. The
database is modeled very simply, and runs by executing a series of transaction
operations given to it by the user. Each transaction performs a series of reads
and writes on the pages of the database, and can commit or abort at any time.
The simulator keeps a recovery manager module that keeps a log of the
database's activity. At any point in time, the database may crash
(specifically, when it encounters a special command to induce a crash). The
recovery manager then performs a restart to restore the database to a correct
state, using the Aries recovery algorithm.
Note: You will not have to write
a lot of code for this assignment, but there are many small details to take
care of when writing an ARIES-style recovery manager. We strongly suggest that
you start soon. In addition, it is worthwhile to write pseudo-code before you
start coding, and to make sure that you understand all the different components
of the assignment.
Background - The Transaction Simulator
The transaction simulator works just like a mini-database. There is a series
of pages stored in a file on disk, accessed through a buffer. Transactions are
modeled as a sequence of reads and writes to the pages of the database. A log
is kept of all changes to database pages, with WAL used to insure that
committed transactions are fully represented by the log as written to stable
storage. Checkpoints can be inserted at will.
This assignment uses an application called MARS which is a GUI for testing
the simulator. The input comes from .mars files that can be found in the folder
Tests. These files can be edited using any simple text editor. The syntax for
commands to the simulator is discussed in Getting
Started with the Simulator.
A breakdown of the major components follows:
- BufMgr
(bu.h/buf.cpp) - This module operates much like the buffer you implemented
for project 1. When a page is to be read or written to, a pinPage()
request is made, and the page is, if not already present, loaded into the
buffer pool. On request, one or all pages can be flushed. Pages are
released with an unpinPage() call.
The buffer for this project is somewhat small, being intended to
illustrate the role of the buffer in crash recovery.
- Xaction_table
(xaction_table.h/xaction_table.cpp) - This keeps track of all the
transactions currently active in the database. It also keeps information
on each active transaction, namely the oldest and most recent log sequence
numbers (LSN's) for each transaction.
- The
log subsystem (log.h/log.cpp):
- log
- This is the log that is used in WAL while the transactions are
executing. It provides simple sequential access to the log files for
reading purposes, and can append new records to the end of the log. The
log records are of uniform size, and the log does not concern itself
with the type of log being written.
- logrecord
- This represents a single log entry. A special field is kept in each
record to identify the variety of log record (commit, update, etc.) that
it is. It contains a generic data buffer, which holds the information
specific to each type of log record.
- masterlog
- This keeps track of the checkpointing, so that the recovery process
will not need to go back too far to reconstruct the database at the time
of the crash.
- LogData
structures (logrec.h/logrec.cpp) - These classes represent each type of
update, and the information associated with it like prevLSN, xaction_id,
and the page affected. Each Log Data structure is stored in the data field
of a log record, and should be accessed by typecasting that data to the
appropriate LogData type (UPD, CLR, ABORT).
- Recovery
Manager - consists of several related modules for logging & recovery
procedures. While the database is running, each transaction has its own
recovery manager that is responsible for logging its actions. Only one
recovery module, though, is necessary for performing crash recovery. The
pieces of the manager are:
- The
logging functionality (logfunc.cpp) - This translates write requests by
transactions into Update records that are written to the log, and creates
CLR records whenever a process aborts on its own.
- rollback.cpp
- This performs a rollback on a process that has just aborted on its own,
undoing the changes and making sure that none of them persist even after
a crash.
- restart.cpp
- This is the part responsible for bringing the database up to a
consistent state following a crash. It performs the three phases of the
Aries recovery algorithm based on the information written to the log.
- checkpoint.cpp
- This generates checkpoints and writes them to the log, extracting the
information from the Xaction table and getting the Dirty Page Table from
the Buffer.
- RecDirtyPgTbl (recovery_mgr.h/misc.cpp)
- this is the list of possibly dirty pages that is built during the
analysis phase of recovery.
- RecXactTable
(recovery_mgr.h/misc.cpp) - this is the list of active transactions that is
built during the analysis phase of recovery.
- The
recmgr_tab.c module is responsible for parsing the input. It is computer
generated, and not very fit for modification. Don't worry too much about
what it does; just know that it takes one command at a time from the
standard input (which we might have redirected to point to a file) and
converts it into a database operation.
- The
handle.cpp module contains the standalone functions for performing the
commands. There's one for each operation, and are the "top-level"
functions that are first called when an operation is done. The functions
in handle.cpp are the functions that will call the Recovery Manager
functions that you will be finishing, and will pass in the relevant data
about the operations.
When you have included your modification and are ready to test your code,
you can run the Mars.exe file generated by visual studio when you compile the
project. When you are generating the Mars.exe file it is recommended that you
build the release version of the project.
Assignment Details
Your Task
You will implement various
pieces of the recovery module. Specifically, you will implement the
functionality to handle the most basic and common of transactions, the write
(WriteUpdateLog()). You must add code to the logging mechanism so that writes
(also called updates) to the pages of the DB are reflected in the log,
and then implement those parts of the restart mechanism that deal specifically
with those log entries to restore the database to a consistent state.
The code you will write belongs in these modules:
- logfunc.cpp
- You should complete this function:
WriteUpdateLog()
- Given the information about a given update, generate an update log
record, and update all affected information in the rest of the database.
- restart.cpp
- You should complete these four main functions:
_restart_analysis()
- This scans the log record forward from the last checkpoint, building up
information about the database at the time of the crash. Fill in the code
executed when the record being looked at is an update record, updating
the Recovery Xaction table and Recovery Dirty Page Table being rebuilt.
_redo_update()
- This is the function for redoing a single action. Add the code for
redoing an update (UPD) record, extracting the necessary information from
the log record data and performing the update.
_restart_redo()
- This is the second phase of recovery, and restores the database to its
pre-crash state. You should implement the code that handles each update
record, calling _redo_update()
if the action needs to be retaken. You should consider the pageLSN stored
with each page to determine the necessity of repeating the action, and
update the Recovery Dirty Page Table where necesary.
restart_undo()
- This third phase of recovery aborts all transactions active at the time
of the crash, scanning the log backwards and undoing the actions of
that transaction. Implement the code that handles the case for undoing an
update record. You'll need to work with the Recovery Xaction table, and
you should generate a CLR to be written to the log.
Hint: This is essentially a rollback of the transaction, so look at
the code that normally handles a rollback when a transaction aborts.
- You
should also finish three small methods concerning the control of the
recovery process:
findRedoLsn()
- identifying the earliest lsn from which the redo process should start.
keepPerformingUndo()
- determining when to stop the undo process.
findNextUndoLsn()
- identifying the next log record to be undone in the undo process.
Submission procedure
Create a zip file that contains the following files, and upload the zip file
into the course management system by the deadline.
- logfunc.cpp
- restart.cpp
- documentation.doc
- The
output, with debugging enabled, of your code when run on the tests that
demonstrate the full range of functionality. The output from Test 7, 8,
and 9 would be suitable. Be sure include enough detailed output so that
the grader can see the steps taken by the recovery manager to restore the
database after a crash. The debugging code already written (mainly
PrintLogRec) and included in the code should be sufficient for this
purpose.
Keep a copy of the project in your own account just in case.
Grading
Your assignments will be
graded according to the following criteria:
- Correctness
(70%): You will get full marks for a correct implementation. Partial
credit will be to partially correct answers.
- Documentation
(20%): An explanation of your code, including any assumptions
made, and any deviations from the standard Aries recovery scheme given in
the text. Include some comments on the design of the recovery manager,
including problems you saw and ways to improve the code. Call your
document documentation.doc.
- Coding
Style (10% ): You are expected to write neat code. Code should be
properly indented and commented. You should follow the coding conventions.
Hints
- If
you want to print out something other than a log record (which would use
PrintLogRec ( )), use the function WriteLogOutput ( char * ). In some
files, it may have to be extern'ed before it can be used.
As an example:
extern void
WriteLogOutput(char *);
...
WriteLogOutput( "now entering function WriteUpdateLog( )"
);
...
char s[30];
sprintf( s, "the LSN of the record just written was %d",
lsn.GetOffset( ) );
WriteLogOutput( s );
- But
for more efficient debugging, use the VC++ debugger. Make sure you are in
the Debug version and not the Release version. Go under the menu Build
-> Set Active Configuration... and select Mars -> Win32 Debug.
Reference
The following pages provide more detailed explanations about the classes and
types that will be useful for this assignment.
Minor Bugs
- When
you run Mars with a new test, it may not produce values for the
read commands (i.e. - read 1 4; will display 'read returned 0'). This is
usually characterized by ALL the read commands returning 0. Close Mars and
run it again by choosing the test you want to load from the Transactions
menu's recent test files a la Word, Excel, etc, and it should work. This
bug should not hamper your work in any way but we're working to fix it.