In the last lecture we showed we can compute asymptotic performance bounds by computing a closed-form solution to the recurrence and then converting the solution to an asymptotic complexity. A shorter path to the goal is to directly prove the complexity bound, using induction. This is known as the substitution method for reasons that will become clear. The shorter recipe to proving an asymptotic complexity is then:
For example, the version of the recurrence relation for merge sort that doesn't sweep any details under the rug is rather complicated, and we don't want to have to produce a closed-form solution for it:
T(n) = T(⌊n/2⌋) + T(⌈n/2⌉) + c1n + c2
In fact, the equality is actually a inequality with ≤ because we didn't show that merging two lists always takes linear time. Fortunately, the substitution method works just fine with inequalities too.
We want to show that T(n)≤ kn lg n for some k and n≥n0. We know that T(n) depends on T(⌊n/2⌋) and T(⌈n/2⌉). The plan is to use the induction hypothesis to substitute T(⌊n/2⌋) and T(⌈n/2⌉) with the bound. However, in order to do this we need to have our induction hypothesis hold for n/2, which might be less than n0. Therefore we prove our property for all n.
Unfortunately our property doesn't hold for n=1, because kn lg n = 0. So we weaken the property we are proving slightly to T(n)≤ k(n lg n + 1) = kn lg n + k, which is true for all n as long as k≥1. Adding the lower-order term (1) has no effect on the asymptotic complexity we show, of course. In practice we don't have to worry about small values of n, because the combination of multiplying and adding k can be used generally to ensure that for any finite set of small n, the inequality holds. For values of n up to some n0, we choose k large enough to make sure the property holds.
Now for n≥n0, we have to show T(n)≤ kn lg n + k, assuming that the same inequality is true for n'<n. Let us prove this by cases on the parity of n.
Therefore T(n) ≤ kn lg n + k as long as k + (c1 - k)n + c2 < 0. But if we pick k>c1, this will be clearly true for large n.
T(n) = T(⌊n/2⌋) + T(⌈n/2⌉) + c1n + c2 ≤ k⌊n/2⌋ lg ⌊n/2⌋ + k + k⌈n/2⌉ lg ⌈n/2⌉ + k + c1n + c2 (substitution, by IH) ≤ k(n/2) lg (n/2) + k(n/2) lg(n/2) + 2k + c1n + c2 ≤ kn(lg n - 1) + 2k + c1n + c2 ≤ kn lg n - kn + 2k + c1n + c2 ≤ kn lg n +k + k + (c1 - k)n + c2
Again T(n) ≤ kn lg n + k as long as k>c1. Note that we used the fact that (lg(n-1) + lg(n+1))/2 ≤ lg n, which is true because lg n has a negative second derivative everywhere.
T(n) = T(⌊n/2⌋) + T(⌈n/2⌉) + c1n + c2 ≤ k⌊n/2⌋ lg ⌊n/2⌋ + k + k⌈n/2⌉ lg ⌈n/2⌉ + k + c1n + c2 (by IH) ≤ k((n-1)/2) lg ((n-1)/2) + k((n+1)/2) lg((n+1)/2) + 2k + c1n + c2 ≤ k(n-1)/2 * (lg(n-1) - 1) + k(n+1)/2 * (lg(n+1) - 1) + 2k + c1n + c2 ≤ kn(lg(n-1) + lg(n+1))/2 - k(n-1)/2 + k(n+1)/2 + c1n + c2 ≤ kn lg n + k + (c1 - k)n + c2
Thus, we are able to use this technique to prove asymptotic complexity without cutting any corners on the form of the recurrence.
Sometimes the induction hypothesis needs to be tweaked a little to make it stronger. For example, consider the recurrence T(n) = 2T(n/2) + 1, which is O(n). But when we try to prove T(n)≤kn with the substitution method, we get
T(n) = 2T(n/2) + 1 ≤ 2k(n/2) + 1 (substitution, by IH) ≤ kn + 1 (substitution, by IH)
We didn't prove what we needed to, because of that +1 term. The key is to strengthen the IH by proving that T(n) ≤ kn - b for some positive b, which will also prove that T(n) ≤ kn, of course. Then things work out:
T(n) = 2T(n/2) + 1 ≤ 2k(n/2) - 2b + 1 (substitution, by IH) ≤ kn - 2b + 1 ≤ kn - b + (1 - b) ≤ kn - b (if b ≥ 1)
The “master method” is a cookbook method for solving recurrences that is very handy for dealing with many recurrences seen in practice. Suppose you have a recursive function that makes a recursive calls and reduces the problem size by at least a factor of b on each call, and suppose each call takes time h(n).
We can visualize this as a tree of calls, where the nodes in the tree have a branching factor of a. The top node has work h(n) associated with it, the next level has work h(n/b) associated with each nodes, the next level h(n/b2), and so on. The tree has logb n levels, so the total number of leaves in the tree is alogb n = nlogb a.
The time taken is just the sum of the terms h(n/bi) at all the nodes. What this sum looks like depends on how the asymptotic growth of h(n) compares to the asymptotic growth of the number of leaves. There are three cases:
The following function sorts the first two-thirds of a list, then the second two-thirds, then the first two-thirds again:
let sort3(a: int list): int list = match a with ([]|[_]) -> a | [x,y] => [Int.min(x,y), Int.max(x,y)] | _ -> let n = List.length(a) and m = (2*n+2) / 3 in let res1 = sort3(List.take(a, m)) in let res2 = sort3(List.drop(res1, n-m) @ List.drop(a, m)) in let res3 = sort3(List.take(res1, n-m) @ List.take(res2, 2*m-n)) in res3 @ List.drop(res2,2*m-n)
Perhaps surprisingly, this algorithm does sort the list. We
leave the proof that it sorts correctly as an exercise to the reader. The
key is to observe that the first two passes ensure that the tail of the final
list List.drop(res2,2*m-n)
does contain the correct elements
in the correct order.
The run time of the algorithm we can derive from its recurrence. The routine does some O(n) work and then makes three recursive calls on lists of length 2n/3. Therefore its recurrence is:
T(n) = cn + 3T(2n/3)
If we apply the master method to the sort3
algorithm, we see
easily that we are in case 1, so the algorithm is O(nlog3/23) = O(n2.7),
making it slower than than insertion sort!
It turns out that no sorting algorithm can have asymptotic running time lower than O(n lg n), and thus other than constant factors in running time, merge sort is as good an algorithm as we can expect for sorting general data. Its constant factors are also pretty good, so it's a useful algorithm in practice. We can see that O(n lg n) time is needed by thinking about sorting a list of n distinct numbers. There are n! = n×(n−1)×(n−2)×...×3×2×1 possible lists, and the sorting algorithm needs to map all of them to the same sorted list by applying an appropriate inverse permutation. For general data, the algorithm must make enough observations about the input list (by comparing list elements pairwise) to determine which of the n! permutations was given as input, so that the appropriate inverse permutation can be applied and sort the list. Each comparison of two elements to see which is greater generates one bit of information about which permutation was given; at least lg(n!) bits of information are needed. Therefore the algorithm must take at least O(lg(n!)) time. It can be seen easily that n! is O(nn); note that lg nn=n lg n. With a bit more difficulty a stronger result can be shown: lg(n!) is Θ(n lg n). Therefore merge sort is not only much faster than insertion sort on large lists, it is actually optimal to within a constant factor! This shows the value of designing algorithms carefully.
Note: there are sorting algorithms for specialized inputs that have better than O(n lg n) performance: for example, radix sort. This is possible because radix sort doesn't work by comparing elements pairwise; it extracts information about the permutation by using the element itself as an index into an array. This indexing operation can be done in constant time and on average extracts lg n bits of information about the permutation. Thus, radix sort can be performed using O(n) time, assuming that the list is densely populated by integers or by elements that can be mapped monotonically and densely onto integers. By densely, we mean that the largest integer in a list of length n is O(n) in size. By monotonically we mean that the ordering of the integers is the same as the ordering of the corresponding data to be sorted. In general we can't find a dense monotonic mapping, so Θ(n lg n) is the best we can do for sorting arbitrary data.