(See: Introduction to Algorithms, Cormen, Leiserson and Rivest and Stein, 2nd ed., Ch. 17.)
Amortized analysis refers to determining the time-averaged running time for a sequence of operations. It is different from what is commonly referred to as average case analysis, because amortized analysis does not make any assumption about the distribution of the data values, whereas average case analysis assumes the data are not "bad" (e.g., some sorting algorithms do well on "average" over all input orderings but very badly on certain input orderings). That is, amortized analysis is a worst case analysis, but for a sequence of operations, rather than for individual operations. It uses the fact that we are analyzing a sequence to "spread out" the costs (think of insurance where everyone can pay a relatively modest amount despite some catastrophic costs).
The motivation for amortized analysis is to better understand the running time of certain techniques, where standard worst case analysis provides an overly pessimistic bound. Amortized analysis generally applies to a method that consists of a sequence of operations, where the vast majority of the operations are cheap, but some of the operations are expensive. If we can show that the expensive operations are particularly rare we can "charge them" to the cheap operations, and only bound the cheap operations.
The general approach is to assign an artificial cost to each operation in the sequence, such that the total of the artificial costs for the sequence of operations bounds total of the real costs for the sequence. This artificial cost is called the amortized cost of an operation. In order to analyze the running time, the amortized cost thus is a correct way of understanding the overall running time — but note that particular operations can still take longer so it is not a way of bounding the running time of any individual operation in the sequence.
Three approaches are commonly used for amortized analysis, and have been termed:
In the last lecture, we saw the aggregate method, banker's method, and physicist's method for dealing with dynamically resizable arrays. Here, let us use the physicist's method to analyze the implementation of a stack using dynamically resizable arrays. The stack operations are push, pop, and check for empty stack.
We represent a stack of n elements by an array of length m ≥ n with the elements in locations 0, 1, ..., n &minus 1. We also maintain a pointer to the first free location. This is just like the dynamically resizable array, except that we now have two additional operations pop and check for empty stack. We start with an array of length 1 and double the array whenever we try to push and the array is full.
We need a potential function satisfying the properties
It was shown in lecture that these properties imply that the total amortized time is an upper bound on the actual time.
For the resizable array, we used the potential function
Φ(h) = 2n − m.
This was fine as long as we were only adding elements, but unfortunately here it no longer works due to the pop operation, which can cause Φ to go negative. Intuitively though, popping should not hurt the amortized complexity, because it does not incur any resizing.
To handle this formally using the physicist's method, all we need to do is modify the potential function to take popping into account. A simple fix suffices: we will take
Φ(h) = max (2n − m, 0).
Certainly the two properties required of potential functions are satisfied, so it remains only to show that the amortized time of each operation is constant. Recall that the amortized time is defined to be
c + Φ(h') − Φ(h),
where c is the actual time of the operation and h and h' are the states of the data structure before and after the operation, respectively. Thus the amortized time is the actual time plus the change in potential.
Now we just have to consider all the cases that can occur with all the possible operations.
In all cases, the amortized time is O(1).
Consider the problem of storing a very large binary counter. Say we decide to use an array, where each entry A[i] stores the i-th bit.
We will analyze the running time of the operation of counting using this representation, so the sequence of operations is a sequence of increments of the counter.
We will use the standard way of incrementing the counter, which is to toggle the lowest order bit. If that bit switches to a 0 we toggle the next higher order bit, and so forth until the bit that we toggle switches to a 1 at which point we stop.
A[m] A[m-1] ... A[3] A[2] A[1] A[0] cost -------------------------------------------- ---- 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 1 0 2 0 0 0 0 1 1 1 0 0 0 1 0 0 3 0 0 0 1 0 1 1 0 0 0 1 1 0 2
When the result of the increment operation is n, the number of bits that change is at most 1+floor(log n), that is the number of bits in the binary representation of n.
Thus in a traditional worst case analysis, the cost of counting up to n, which a sequence of n increments, is O(n log n).
But do you really think this is taking that much time? Does each increment in the sequence really cost O(log n)? How do we show it is less? This is the goal of amortized analysis.
Aggregate Method. Let's consider how often we flip each individual bit, and sum those up to bound the total, rather than individually obtaining a worst case bound for each bit. How often is A[0] toggled in counting up to n? How often is A[1] toggled, A[2] toggled, etc? Every time, every other time, every fourth time, ...
n +
floor(n/2) + floor(n/4) + ...
≤ n + n/2 + n/4 + ...
≤ 2n
So the amortized cost of an increment is 2, and the time to count to n (n increments) is 2n = O(n).
Banker's method. This approach works a bit like health insurance. Each individual operation pays a certain amount to a central pot, but some operations that require more resources may pay for it with money from the central pot. The amount that each individual operation pays into the pot is set so that there is just enough to do all the work.
In the binary counter example, the only work is toggling a bit, so lets say that it costs $1 to toggle a bit. The question is, how much money do we need to allocate to each increment to have banked up enough to pay for all the toggles as they occur?
Of course, having just used the aggregate method, $2 would be a good guess (though not a guarantee — maybe we run out along the way at some point, even though there is enough money later).
What happens on an increment? Some number of low-order bits, say m (perhaps m = 0) change from 1 to 0, then the m-th bit changes from 0 to 1. We charge $2 to each increment operation. We pay for flipping the m-th bit from 0 to 1 with $1 of those $2. The other $1 we leave at the m-th bit for later use. For the m bits we had to change from 1 to 0, we pay for those with the $1 left at those bits by previous operations.
In other words, we maintain the invariant that every 1-bit has $1 associated with it that it can use the next time it has to flip back to 0.
This shows that the amortized cost of the increment is 2, and the overall time is O(n) for n increments.
Earlier in the semester we saw a way of implementing a queue (FIFO) using two stacks (LIFO). Say that our stack has three operations, push, pop and empty, each with cost 1.
We saw that a queue can be implemented as
We've seen earlier that this algorithm is correct, now we will consider the running time in more detail.
A conventional worst case analysis would establish that dequeue takes O(n) time, but this is clearly a weak bound for a sequence of operations, because very few dequeues actually take that long. Thus O(n2) is not a very accurate characterization of the time needed for a sequence of n enqueue and dequeue operations, even though in the worst case an individual dequeue can take O(n) time.
To simplify the amortized analysis, we will consider only the cost of the push and pop operations and not of checking whether stack2 is empty.
Aggregate method. Each element is clearly pushed at most twice and popped at most twice, at most once from each stack. If an element is enqueued and never dequeued, then it is pushed at most twice and popped at most once. Thus the amortized cost of each enqueue is 3 and of each dequeue is 1.
Banker's method. Each enqueue will be charged $3. This will cover the $2 cost of popping it and pushing it from stack1 to stack2 if that ever needs to be done, plus $1 for the initial push onto stack1. The dequeue operations cost $1 to pop from stack2.
Note that the analysis in both cases seems to charge more for storing than removing, even though in the code it is the other way around. Amortized analysis bounds the overall sequence, which in this case depends on how much stuff is stored in the data structure. It does not bound the individual operations.
This is a more formal version of the banker's method. Instead of distributing money across items in the data structure, we will keep money in a piggy bank, as what's really important is that we have enough money to pay for operations as they happen. The amount of money in the bank depends only on the current state of the data structure. This is the value of the potential function Φ.
For each of the problems considered above we can write down a potential function. For the counter problem,
Φ(counter) = number of 1 bits in the counter.
For the queue implemented with two stacks,
Φ(queue) = 2 * size of stack 1.
If the data structure changes from state S to state S' as the result of doing some operation, the amortized cost of the operation is defined as
amortized cost = actual cost + Φ(S') − Φ(S),
which is simply the amount of additional money that we need to maintain our piggy bank and pay for the work.
If the potential is nonnegative and starts at 0, then the sum of the actual costs is bounded by the sum of the amortized costs.