In class we have discussed protocols for logging the messages that are sent between processors. In the event of a processor failure, the messages in these logs can be "replayed" in order to reconstruct the state of the recovering processors. On point which we did not discuss to much in class was how large these message logs can grow in real applications.
Preliminary results suggest that they can grow quite large. The table above shows the volume of messages that individual processors send when running some of the Class B NAS Parallel Benchmarks on 16 processors of the CTC Velocity cluster. These results suggests that message-logging may require too much memory in order to be practical to implement. Unless, of course, techniques can be developed to reduce the size of the message logs.
We have already investigated two approaches, compression and reversible computation. We found that using the "gzip" compression program only reduced the message logs by a factor of 2, while we think we need to reduce the message logs by at least a factor of 10. In the case of reversible computation, we found that the side of the state log needed to reverse the computation was at least as big, and usually bigger, than the original message logs. We will be happy to share these results with you.
The primary goal of this project is to improve upon our previous attempts or to develop a novel technique to reduce message log sizes. As a secondary goal, we would like you show the effects of your techniques on some benchmark applications.