Consider the following C code:
for (i = 0; i < n; i++) for (j = 0; j < n; j++) f (i, j);
How might we write an equivalent function in SML? Perhaps as follows:
fun applyf (f, n) = let fun lp_i i = if i < n then let fun lp_j j = if j < n then (f (i, j); lp_j (j + 1)) else () in (lp_j 0; lp_i (i + 1)) end else () in lp_i 0 end
While the two loop functions (lp_i and lp_j) are tail-recursive, the call lp_j 0 from the outer loop to the inner loop is not tail-recursive. This non-tail call forces lp_i and lp_j to be mapped into different clusters, which occupy separate procedures with separate stack frames.
What happens when we CPS convert the function? We get the following:
fun applyf (f, n, kt) = let fun lp_i (i, ki) = if i < n then let fun lp_j (j, kj) = if j < n then let fun kf' () = lp_j (j + 1, kj) in f (i, j, kf') end else kj () fun kj' () = lp_i (i + 1, ki) in lp_j (0, kj') end else ki () in lp_i (0, kt) end
A simple control-flow analysis shows that the return continuation of lp_j is always the known function kj', which enables compiling the nested loops into a single machine procedure.