First, let's recall the global CPS transformation:
How does this translate into local CPS conversion? First, we choose
the set of functions that are eligible for CPS conversion:
Now, we define the local CPS conversion as follows:
Returning to our running example, local CPS transformation results in the following program:
fun applyf (f, n) = fun lp_i i = if i < n then fun lp_j (k, j) = if j < n then let foo = f (i, j) in lp_j (k, j + 1) else k () in fun k' (bar) = lp_i (i + 1) in lp_j (k', 0) else () in lp_i 0
Well, that's nice, but have we gained anything? It depends on how we
compute
. A course analysis
says that not all call sites of k' are known (because
k' escapes at the tail-call lp_j (k', 0)), so
k' becomes it's own cluster; lp_i neest to be
accessible from k''s cluster and applyf's cluster,
so it will be placed in its own cluster, but lp_j can be in
lp_i's cluster. Unfortunately, were still required to make
a full function call during each iteration of the outer loop (when
lp_j returns via it's continuation).
But, we know that lp_j has one continuation; if we reverse the definitions of lp_j and k', we can drop lp_j's explicit continuation argument:
fun applyf (f, n) = fun lp_i i = if i < n then fun k' (bar) = lp_i (i + 1) in fun lp_j (j) = if j < n then let foo = f (i, j) in lp_j (j + 1) else k' () in lp_j 0 else () in lp_i 0
In general, this requires mutual recursion in the target language.
Finally, we use light-weight closure conversion to generate the final cluster representation:
fun applyf (f, n) = lp_i (0, f, n) and lp_i (i, f, n) = if i < n then lp-j (0, i, f, n) else () and lp_j (j, i, f, n) = if j < n then let foo = f (i, j) in lp_j (j + 1, i, f, n) else k (i, f, n) and k (i, f, n) = lp_i (i + 1, f, n)