CSS522 - Assignment 2
Deadline extended to March 8, 2005, 10:00 am
You will each work independently. Feel free to discuss the text of the assignment itself, of the behavior and characteristics of various predefined Matlab functions that you consider relevant. Do not show, make available, discuss, or otherwise share your programs, your algorithms, and/or conclusions with anybody else.
Should the need arise, we will post clarifications, explanations, or corrections by updating this document from time to time. These changes will also be posted on the course web site. Make sure you keep up with these developments.
Important note: This assignment involves the handling of information from various financial databases. Due to legal restrictions on the use of this data, you are not permitted to distribute this information to any third parties. If you will handle this data on a public - or shared - computer you must take precautions to prevent individuals not enrolled in CS522 from accessing your files. You must delete your data files (but not your programs or graphs) as soon as you submit your completed homework.
You should think of this assignment as being a small research project. You will be given real trading data, and the problems you will have to solve will not be fully specified. For example, we will not tell you precisely what are the conditions that would justify the discardal of certain bonds. You will have to take informed decisions, which you will justify in your writeup. The course staff will be available to answer questions and offer guidance.
You will probably spend a lot of time becoming acquainted with the data and experimenting with it. Do not underestimate the importance of this step; you will confront issues like this in all real Finance projects you will face in the future. The success of your data clean-up will not be measured in the number of bad data items that you find, but in the thoroughness and care with which you test it. You might be lucky and get a good data set, but how would you know if you do not look into it?
All materials must be submitted by the deadline. You can change your submission any time before the deadline, however, we will only consider the last version when grading. If at all possible, do not wait with your submission until the very last minute; despite our best efforts CMS might get overloaded and you might not be able to complete your submission. Unless extraordinary circumstances warrant it, we will not accept late submissions. If all else fails (but then, and only then), send email to Radu and attach your work. Make sure you send the email in time, so that he receives it before the deadline.
A. Files
We made available Treasury trading data for all business days between June 1 and June 9, 1999. The files listed below are all accessible through a restricted link on the course web page.
- File "crsp data filed names.pdf" describes the meaning and the interpretation of the field names in the CRSP data files.
- File "govpx.txt" provides the field names (but very little detail) for the GovPX data file.
- File "crsp master.csv" contains end-of-month information on the Treasuries outstanding at the respective moment in time. We bracketed the first nine days in July by providing you the information at the end of May 1999, and at the end of June 1999. In principle, non-time dependent characteristics of a Treasury outstanding on both of these days should not change; for example the maturity date of the instrument with CUSIP 9128272V is listed as July 8, 1999 both in May and June 1999. Time-dependent information, like the date of the quote itself, can - and in general will - change. Note that this file contains both a CRSP id and a CUSIP id for each Treasury; both these identifiers uniquely identify the respective instrument.
- File "crsp-cross-sectional.csv" contains price quotes for each Treasury outstanding on the first nine days of June, 1999.
- File "govpx.csv" contains transaction records for the first nine days of June, 1999. We use the term "transaction" in the sense of "significant event," not in the sense of "trade." A significant event can be a new bid or a new ask price, for example, or a step in a workup process. This file is almost 62 MB in size - you will probably need to be careful when handling it. A badly written algorithm might make processing this file excessively slow, or even impossible on a computer with limited resources.
Important: Remember the price quoting conventions that we have discussed: bills are quoted on a discount basis, while the prices for notes and bonds are all "clean" (or "naked"). You will have to convert the price of the respective Treasuries to total dollars per $100 face value.
You should not extract, nor use, "derived" information from these files. Prices, bond characteristics (CUSIP, maturity date, dated date, coupon, and a few others) are all you need. Yields can always be inferred from prices, so you should not use the yields given in the database files, as these have also been derived from prices (and you do not know the precise definition of the yield formula that has been used anyway).
You will note that the format of the same type of information (e.g. the format of dates) varies even within the same file. For example, field qdate (the day the quote was recorded) in the CRSP master file represents a date in the format yyyymmdd, while the same file represents the value of field matdt (maturity date) in the format yymmdd. Also, note that many fields are often empty (there is no value, and in fact not even a space, between two successive commas). You will have to identify and accommodate such situations by studying the data files and the available documentation.
B. Tools You Can Use
You can solve all the problems below relying exclusively on Matlab; and we request that you program all non-trivial computations (e.g. the determination of the forward rates) in Matlab. If, however, you find that it would be simpler/faster for you to use other programs (e.g. awk/gawk, or Perl) to pre-process your text files to eliminate, for example, Treasuries with negative prices, feel free to do so. The tools that you use should be widely available and have publicly accessible documentation. Restrict your use of non-Matlab tools to pre-processing your data and/or to the generation of graphs illustrating various aspects of the input data.
C. What to Submit
You should submit a single pdf document that contains the written answers to all the questions we pose below. Your answer should include, if applicable, a description of the decisions that you had to make (e.g. the characteristics you used to identify and eliminate bad bonds), a description of the results (including graphs), an explanation of how you approached and solved the problem (i.e. describe your program), and any other comments you find relevant. Your answers must be complete, unambiguous, and as concise as possible.
In addition to the pdf file, you will submit one zipped file containing all the source files that you used to develop your programs. In your writeup you should indicate clearly which functions have been used to produce a certain result, and how these functions interact. Be precise and concise in your explanations. Your writeup should provide enough information for us to be able to reproduce your results.
D. Matlab Functions you Might Find Useful
You are free to use any predefined Matlab functions to answer our questions, except for functions that directly compute the forward rate curve. You might wish to read the available documentation on functions like textread, fopen, fclose, fgets, fprintf, str2num, cfdates, datenum, numdate, yearfrac, optimset, optimget, lsqnonlin, lsqcurvefit, fmincon, fminunc, warning, error. You do not have to use any of these functions, they are provided only to serve as possible starting points in your exploration of the Matlab documentation. You might also find it useful to become familiar with the Matlab debugger before you start working.
Depending on the version of Matlab you are using, and depending on whether you have certain toolboxes installed or not, you might not have access to some functions that we suggested above. You can always re-implement their functionality, in case of simple functions like yearfrac, but you certainly do not want to implement your own non-linear optimization algorithm (unless you happen to have done extensive previous research on the topic). Remember that you can always get an account in the CSUGLAB.
E. Questions
When addressing the questions below, try to think as a researcher. When the problem has not been defined completely, make assumptions and choose criteria for evaluating and interpreting results that you can defend based on your knowledge. Due to the multitude of choices that you face, there is no unique solution. This, however, does not imply that all solutions are acceptable. For example, a solution generating forward rate curves that price all bonds with an absolute error of less than one cent per $100 face value is clearly acceptable, while a solution that prices all bonds with errors of, say, $5 per $100 face value is clearly bad. State and defend your decisions and choices in your writeup!
Whenever possible, create your graphs using programs. When answering question (5) below, for example, you should write a program that could generate an analogous plot for any Treasury in the relevant set, given its respective CUSIP. The less automated the process that generates your graphs, the less credit you will get. Minor adjustments done manually are acceptable.
Unless the nature of the underlying graph justifies it, plot points, not continuous curves. For example, in question (2) below you should plot the yield of the individual Treasuries, not a continuous yield curve. When plotting forward rate curves, or yield curves inferred from forward rate curves, however, you should represent a continuous curve or surface, as the case may be.
- Using the CRSP master file and the cross-sectional file, examine the bid and ask prices for each of the first nine days of June 1999. Eliminate all bonds that have special features (are callable, are "flower bonds" or "when issued," are STRIPS, are TIPS, and so on). Eliminate all quotes that contain obviously bad prices. For example, a bond price exceeding $1000 per $100 face value would probably indicate a bad price. Do not forget to explain and justify your "badness" criteria.
After you have cleaned up the data create summary tables for June 1, 1999, showing the initial number of Treasuries for which CRSP provides quotes (treat bid and ask prices separately), the number of Treasuries that have been retained after clean-up, as well as the number of Treasuries eliminated based on the various criteria you have established (e.g. "3 bonds have been eliminated because they were callable"). Break down your data into the following intervals of leftover maturity: [0, 1], (1, 3], (3, 10], and (10, 30] years.
- Using the cleaned-up data from (1) above, plot the continuously compounded yield (as computed by you, not as taken from the database) for all acceptable instruments on June 1, 1999. Create two graphs: one in which the horizontal axis is not at scale, and the ith unit on the scale corresponds to the ith Treasury in the increasing order of leftover maturity; and a second one, in which the time axis is at scale (proportional). You can find a pair of such graphs in image (1.1) and (1.2) in the lecture notes on Treasuries. Show both the bid yield and the ask yield for each Treasury on each of the two graphs that you will create.
- Assume that you hold a portfolio consisting of one unit of each Treasury included in the set obtained in (1) above for June 1, 1999. This portfolio will have coupon payments at various future moments in time. Create a plot showing the total number (thus not the value) of coupon payments due on a certain date in the future. For example, if i Treasuries from your portfolio will have a coupon payment y years from today, then the value shown on the graph for time y should be i.
- Using the GovPX data we have provided, use the set of transactions (records) and the CRSP master-file to eliminate records that refer to Treasuries with special features. Also, eliminate records that are incorrect or inconsistent with respect to the previous trading history (but do not eliminate the Treasuries to which these records refer to). Keep in mind that late in the day GovPX typically adds records that refer to transactions earlier in the day, but which have somehow got lost previously. Ignore bid and ask prices, and focus on establishing when trades occur; retain only informations on these trades. When computing forward rate curves, use the last intra-day trade price for each Treasury. Carefully explain the criteria you employ for cleaning up your data. Summarize the your results for June 1, 1999 using tables analogous to those in (1).
- Using the cleaned-up data from (4) create a graph (or a series of graphs) analogous to figure (1.5) in the lecture notes on Treasuries, describing the trading history of the bond with CUSIP 912795CA7 on June 1, 1999. Note that figure (1.5) does not show values for the aggregate trading volume (the continuous red line); you should, and for this you will likely need to represent the aggregate trading volume on a separate graph.
- Using the data in (1) above, compute the forward rate curves based on the Svensson model for each business between June 1 and June 9, 1999. Explain what options did you set for the non-linear regression, also, explain in detail your implementation. Summarize your results (e.g. show the maximum and average pricing error for each day, the value of the Svensson parameters, etc).
Plot (a) the resulting forward rate; (b) the resulting yield curve; (c) the resulting discount factor curve for June 1, 1999. Plot the analogous surfaces for all business days between June 1 and June 9, 1999.
- Solve the problem analogous to (6) above using the data obtained in (5).
- Solve the problem analogous to (6) above, but compute the smoothest forward rate curve. When discussing the parameter choices you made, do not forget to address the choice of knot points.
- Solve the problem analogous to (8) above, but use the data obtained in (5).
- Comment on the results obtained in (6), (7), (8), and (9) above.