The Byzantine Generals Problem
L. Lamport, R. Shostak and M. Pease. ACM Transactions on
Programming Languages and Systems, 4(3):382-401, July 1982
Notes by Indranil Gupta, March 08, 1999.
Adapted from
- original notes
by Xun Wilson Huang.
- original notes
by Lawrence Kesteloot.
Thanks to: Xun Wilson Huang, Lawrence Kesteloot.
Problem Statement: Byzantine Generals
Problem(BGP)
The setting: There are n generals, one of them the
commanding general. Generals can send (and receive) messages from
other generals.
The problem: Develop a communication protocol for the
commanding general to send an order to the n-1 lieutenant
generals so that
- All loyal lieutenants obey the same order
- If the commanding general is loyal, every loyal
lieutenant obeys the order he sends.
Adversary: Any of the generals could be traitors
i.e., could send inconsistent messages regarding the order to
other generals.
Impossibility Results
- For n = 3 generals and 1 traitor, there is no solution
(protocol). This is because a loyal lieutenant cannot
distinguish who is the traitor when he gets conflicting
information from the commander and the other lieutenant.
Let's call this the 3-Generals Problem.
- BGP for n < 3m+1 generals and m traitors can be
reduced to the 3 - generals problem, with each of the
Byzantine generals simulating at most m lieutenants and
taking the same decision as the loyal lieutenants they
simulate. Thus BGP for n < 3m+1 and m traitors is not
solvable.
- Reaching approximation is as hard as reaching agreement.
I. A solution with oral messages for n > 3m
A solution for BGP with n > 3m and upto m traitors, is
given.
Oral message system properties:
- A1. Every message that is sent is delivered
correctly. -> No message loss.
- A2. The receiver of a message knows who sent it.
-> Completely connected network with reliable
links(due to A1).
- A3. The absence of a message can be detected. ->
Synchronous system only.
Every general can send a message to every other general.
Solution in brief:
- uses a function majority which takes in a
set of values and returns the value that is the majority
among them (a possible implementation - median of the
values).
- uses 'rounds' in each of which a general
broadcasts the value he has received in the earler round
to all the other generals through whom the value has not
passed before he received it.
- when returning from the round, for each j,
any two loyal lieutenants receive the same vector of
values {v1, ... v(n-1)}. As the majority of the loyal
lieutenants' values in these is ensured, applying the
majority function on {v1, ... v(n-1)} to obtain vn
preserves the above fact (that any two loyal lieutenants
receive the same vector of values {v1, ... vn}). This
ensures that BGP is solved.
Note: If the commander
is not a traitor, we can be done in 2 rounds. If the commander is
a traitor, you may need upto m+1 rounds.
II. A solution with (unforgable) signed messages
The difficulty of BGP is in the ability of a traitor
lieutenant to lie about the commander's order. If we can restrict
this ability by making the following assumptions, BGP is solvable
with any number of traitors as long as their maximum number
is known.
Signed messages:
- A4. In addition to the 3 assumptions made in the solution
with oral messages, we add the following assumption.
- A loyal general's signature cannot be forged, any
alteration can be detected. -> can drop a
message, but can't change it
- Any one can verify the authenticity of a
signature. -> no one can fool a general
Again, assume a fully connected message graph among the
generals.
Solution in brief:
Uses a majority-like function called choice.
The solution:
- the commander sends a signed order to
lieutenants
- if a lieutenant receives an order from
some one (either from commander directly, or from other
lieutenants), he verifies it and then puts it in a set V
if it's not already there. Relay the order if there are
less than m distinct signatures on the order.
- Everyone halts at round m+2, and use
choice(V) as the desired action
The algorithm is to make all loyal lieutenants keep the same
set of V, thus choice(V) is the same. If the commander is loyal,
the algorithm works because all loyal lieutenants have the
correct order by round 1 and by unforgablity no more orders can
be produced. If the commander is not loyal, by running the
algorithm to round m+1, at least one loyal lieutenant will get
the order before round m( because there are only m traitors). And
that loyal lieutenant will send it to all others. In short, if
one loyal lieutenant gets an order, all loyal lieutenants will
get it in the next round.
III. IV. Relaxing the assumption on full-connectivity of the
generals graph - extending above solutions
The previous 2 solutions can be extended to relax the
assumption that the message graph among the generals is fully
connected.
- Oral messages: Solution with oral messages is
extended to solve BGP with upto m traitors in a p-regular
graph with m>0 and p>3m-1.
- Unforgable messages: Earlier solution with
signed messages solves BGP with upto m traitors in
(m+d-1) rounds, where d is the diameter of the subgraph
of loyal generals. Assumption here: subgraph of loyal
generals is connected (this can be relaxed by relaxing
the problem statement of BGP)
Practical use of BGP in building real life systems
The best way to provide faul-tolerant decision-making in
redundant systems is by majority voting. A faulty input devices
may generate meaningless inputs, but majority voting would ensure
that the same meaningless values are used.
For majority voting to yield a reliable system, the following
2 conditions must be satisified
- All non-faulty processors must use the same input
value
- If input unit is non-faulty, then all non-faulty
processes use the value it provides
But these are just the requirements of the BGP !
So we can apply the above solutions to the BGP in real-life.
Now what about the practicality of the assumptions made by those
solutions ?
About A1: In real life, link failures occur.
However, link failures are indistinguishable with failures of
processors, therefore we can count the link failures as one
of the m. Signed message is insensitive to link failures
because no message can be forged even if links fail.
About A2: What is actually required is that no
traitor can forge a non-faulty process' message. A2 not
needed in the solution with signed messages.
About A3: In an asynchronous system, this
condition cannot be satisfied. It is usually implemented via
time-outs.
About A4: Signing message has 2 aspects:
- If processor is non-faulty, then no faulty processor
can generate S(M). This can never be solved in
real-life - only its probability of failure reduced.
- Given M and X, any one can verify if X == S(M). This
is doable in real world.
Further Observations
- Optimizations for the BGP solution.
- combine messages to reduce the total number of
messages.
- reduce the amount of information transferred.
- BGP required in the most general undecidable case of
process failure.
- Solution presented is optimal because Fischer and Lynch
have proved that any solution to the BGP necessarily has
each lieutenant wait for a message that has passed
through the hands of at least m generals after the
commander.
- Solutions to clock synchronization (needed for the
implementation of above BGP solutions) - very similar to
the solutions for BGP.
- Further impossibility results
- BGP with messages transmitted arbitrarily quickly
with upper bound on message transmission delay.
- Consensus with restricting traitors to fail-crash
only.
- BGP works but is inherently expensive, especially in
terms of number of messages O(m !). So it's a trade-off
between performance and reliability. If you want more
reliability in the most general failure conditions, you
have to settle for a (costly) BGP solution. If, however,
you can relax the failure conditions in your systems (ex.
assume only fail-crash processes in a synchronous system
and leave it to God to ensure that), you can go for
cheaper solutions.
Critique and Questions
- Graph connectivity. Are p-regular topologies
that frequent ? Can we extend the BGP solutions to any
network topology ? Has it been extended to any other
topologies ?
- Value of m: How would one obtain a reasonable
value for maximum m in a practical system (note that this
maximum number is required even in the solution with
signed messages).
- Synchronous/asynchronous systems: How many
synchronous system do we really use (SMP machines, and?)
How about asynchronous systems ?
- Further work after this paper:
- What other solutions to BGP have been proposed
after this paper ?
- Has any attempt been made to extend the BGP
solutions to asynchronous systems to ensure 'some
degree/probability' of reliability ?
- Answers in next section ;-)
- Bounds on best possible BGP solution (in terms of
messages) ?
Further readings
Impossibility/necessity results
- Fischer, M. J., Lynch, N. A., and Paterson, M. S.
``Impossibility of Distributed Consensus with One Faulty
Process,'' J. ACM 32, 2 (April 1985), 374--382.
- Dolev, D., Dwork, C., and Stockmeyer, L. ``On the Minimal
Synchronism Needed for Distributed Consensus,'' J.
ACM 34, 1 (January 1987), 77--97.
Approximate agreement
- Bracha, G. ``An O(log n) Expected Rounds Randomized
Byzantine Generals Protocol,'' J. ACM 34, 4
(October 1987), 910--920.
- Bracha, G. and Toueg, S. ``Asynchronous Consensus and
Broadcast Protocols,'' J. ACM 32, 4 (October
1985), 824--840.
- Ben-Or, M. ``Another Advantage of Free Choice: Completely
Asynchronous Agreement Protocols,'' ACM Symposium on
Principles of Distributed Computing, 1983, 27--30.
- Dolev, D., Lynch, N. A., Pinter, S. S., Stark, E. W., and
Weihl, W. E. ``Reaching Approximate Agreement in the
Presence of Faults,'' J. ACM 33, 3 (July 1986),
499--516.
- Dolev, D., Ruediger, R., and Strong, H. R. ``Early
Stopping in Byzantine Agreement,'' J. ACM 37, 4
(October 1990), 720--741.
- Hadzilacos, V. and Halpern, J. Y. ``Message-Optimal
Protocols for Byzantine Agreement,'' ACM Symposium on
Principles of Distributed Computing, 1991, 309--323.
- Halpern, J. Y., Moses, Y., and Waarts, O. ``A
Characterization of Eventual Byzantine Agreement,'' ACM
Symposium on Principles of Distributed Computing,
1990, 333--346.
Failure detectors
- Chandra, T. D., Hadzilacos, V., and Toueg, S. ``The
Weakest Failure Detector for Solving Consensus,'' ACM
Symposium on Principles of Distributed Computing,
1992, 147--158.
- Chandra, T. D. and Toueg, S. ``Unreliable Failure
Detectors for Asynchronous Systems,'' ACM Symposium
on Principles of Distributed Computing, 1991,
325--340.