Professor Danny Dolev is teaching "Self-Stabilization and Fault-Tolerance of Distributed Protocols," a graduate level course at Cornell University this spring. A professor in the School of Engineering and Computer Science at the Hebrew University of Jerusalem since 1982, Dolev is completing a six-month sabbatical at Cornell, during which time he expects to devote significant attention to fault tolerance research.
"My work is at the intersection of the work done by several of Cornell's professors, namely Ken Birman, Joe Halpern, and Fred Schneider," said Dolev. "Our research interests overlap and expand on each other's work, adding new angles and different dimensions. There are fault tolerance experts elsewhere, but I enjoy the collaborative spirit that I find in Cornell's computer science department."
"Danny is a long-time friend of the department," said Charlie Van Loan, chair of computer science. "His research complements the work done by several of our faculty members, and we all look forward to learning more about his exciting work."
Dolev, who earned his Ph.D. from the Weizmann Institute of Science in Israel, also serves as a consultant to various companies, including IBM where he was part of the team that designed the protocols for the air traffic control system used in America. He has more than 100 journal articles and refereed conferences to his credit, as well as a variety of other dissertations, book chapters, and patents. Dolev's research is in the areas of distributed algorithms, computer networks, reliability of distributed systems, parallel processing, synchronization primitives, protocols and security, as well as on the durability of distributed systems to faults and developing an absolute taxonomy for distributed algorithms.
Dolev's interest in fault tolerance was sparked in the early 1980s at Stanford where he was serving as a postdoc and participated (at SRI) in one of the world's first fault tolerance projects that addressed the worst case scenario. Sponsored by NASA, the project considered the consequences and options when machinery in space fails to function.
"What if all the memory is erased?" Dolev said. "What can be done to recover and stabilize the system so that it will begin functioning again? Twenty years ago, we were dependent on electricity. If you went to the bank and the electricity was out, you couldn't complete a transaction. Today, we are dependent on computers. If you go to the bank, and the computers are down, you leave empty-handed. Computers are not that reliable yet, and hackers are continually trying to control networks. Because our dependence on computers extends to many necessary functions, including driving cars and flying airplanes, it's critical that we understand how to improve their reliability and robustness."