[5/5/99]

===============================================================================

* We have seen and used functions with variable number of arguments, but we
  never saw any way of defining such functions ourselves.  This wasn't avoided
  because it is too complicated but just because it wasn't necessary.  To
  define a function with variable number of arguments you write the argument
  list as usual, then you add a dot and a variable that will get the list of
  the rest of the supplied arguments.

  Here is a quick demonstration:

  --------------------
    (define (average . numbers) (/ (apply + numbers) (length numbers)))
    (average 10 15 20)  -->  15
  --------------------

  Note how this is similar to the way that lists are printed (and read), so you
  can do some strange things such as:

  --------------------
    ((lambda (a . (b . (c . ()))) (+ a b c)) 1 2 3)  -->  6
  --------------------

  but these should be avoided.

  Last remark: the same can be used for lambda expression except that there is
  a question of how to define a function with zero or more arguments, because
  this:

    (lambda (. x) x)

  is an illegal form that wouldn't even read properly.  The solution is to use
  only the symbol without the surrounding parens as in:

    (lambda x x)

  which is equivalent to the list function.  Note also how this settles with
  dotted list notation.

===============================================================================

* A quick note about memoization:

  In future classes (algorithms) you will learn about something called "dynamic
  programming" which is a way to develop algorithms to compute recursive
  functions that would take a long time otherwise.  The basic idea is that you
  identify the recursive nature of the function, then you make a table of
  values and see what is the correct way to fill this table until you get the
  desired value.

  A classic example for that came up when Jeff was trying to get a short
  expression made only of parens, +'s and *'s that will evaluate to 212 (using
  the fact that (*) is 1).  What we want to say is that we can combine strings
  for smaller numbers to make a string for a bigger number in everal ways:

  If S(n) is the set of strings representing a number n and we use + as both
  string and string set concatenation (see below) then:
  1. if n = a + b then the strings "(+" S(a) + S(b) + ")" is in S(n).
  2. if n = a * b then the strings "(*" S(a) + S(b) + ")" is in S(n).
  A concatenation of a string set to something is defined by concatenating all
  of it's elements and getting the result set, also, as an optimization,
  concatenating the two strings "(+", "(+X)", "(+Y)" and ")" can simply result
  in "(+XY)".

  These rules define a very simple recursive function that can be used to get
  the list of all strings that make a number, and selecting the shortest string
  is easy once you have that.

  So, this can be done by dynamic programming, which is basically the same as
  memoization.  However, the nice thing with memoization is that it is doing
  the same thing but without the complexity of the table and changing the
  algorithm to use that table.  The main change on the algorithm would be
  making sure that it is applied in the correct order - memoization is doing
  that automatically, and without planning in advance of the table shape etc.

  [Say that the code is available here as an example.]

  --------------------
    (define (random-elt l)
      (list-ref l (random (length l))))

    (define (remove-dups l)
      (cond ((null? l) '())
            ((memq (car l) (cdr l)) (remove-dups (cdr l)))
            (else (cons (car l) (remove-dups (cdr l))))))

    (define (smallest-factor n)
      (define (iter i)
        (if (and (< i n) (not (zero? (modulo n i))))
          (iter (1+ i))
          i))
      (if (<= n 3)
        n
        (iter 2)))

    (define (prime-factorize n)
      (if (= n 1)
        '()
        (let ((f (smallest-factor n)))
          (cons f (prime-factorize (/ n f))))))

    (define (factorize n)
      (define (power l)
        (if (null? l)
          '(())
          (let ((pcdr (power (cdr l))))
            (append pcdr
                    (map (lambda (x) (cons (car l) x)) pcdr)))))
      (remove-dups (filter (lambda (x)
                             (< 1 x (min n (1+ (ceiling (sqrt n))))))
                           (map (lambda (l) (apply * l))
                                (power (prime-factorize n))))))

    (define (maybe-strip ch s)
      (if (eq? (string-ref s 1) ch)
        (substring s 2 (1- (string-length s)))
        s))

    (define ((combine ch) s1 s2)
      (string-append
       "(" (make-string 1 ch) (maybe-strip ch s1) (maybe-strip ch s2) ")"))

    (define ((rand-bin-op f) x y)
      (if (zero? (random 2)) (f x y) (f y x)))

    (define ((n-ize f) x . xs)
      (if (null? xs)
        x
        (f x (apply (n-ize f) xs))))

    (define add (n-ize (rand-bin-op (combine #\+))))
    (define mul (n-ize (rand-bin-op (combine #\*))))

    (define (shortest n)
      (if (eq? n 1)
        "(*)"
        (let ((min #f) (strs #f))
          (define (consider str)
            (let ((slen (string-length str)))
              (cond ((or (not min) (< slen min))
                     (set! strs (list str))
                     (set! min slen))
                    ((= slen min)
                     (set! strs (cons str strs)))
                    (else #f))))
          (define (try-additions i)
            (let ((x (shortest i))
                  (y (shortest (- n i))))
              (consider (add x y)))
            (when (< i (/ n 2))
              (try-additions (1+ i))))
          (define (try-multiplications fs)
            (unless (null? fs)
              (let* ((f (car fs))
                     (x (shortest f))
                     (y (shortest (/ n f))))
                (consider (mul x y)))
              (try-multiplications (cdr fs))))
          (try-additions 1)
          (try-multiplications (factorize n))
          (random-elt strs))))

    (memoize! shortest)
  --------------------

===============================================================================

* Lazy evaluation:

  Scheme uses eager or applicative-order evaluation.  This means that to apply
  an application expression, we first evaluate its subexpressions and then
  apply the result.  This means that defining non-strict functions (functions
  that some of their arguments can be undefined) is not possible - the
  following function will not work properly because the arguments will always
  be evaluated first:

  --------------------
    (define (my-if b e1 e2) (if b e1 e2))
  --------------------

  Most languages are similar to Scheme in that sense, but there are some
  languages that use lazy or normal-order evaluation.  If we would use a
  Scheme-like language that uses lazy evaluation [note that the Revised Report
  defines Scheme to be an eager language], then the above function would work,
  and also that this function:

  --------------------
    (define (foo x) 1)
  --------------------

  Always return 1 - even with such nonsensical expressions such as (foo (1)).
  [Note that most lazy languages also include strict type checking which would
  make this fail compilation, but with a Scheme-like type system this should
  work.]

  If this is combined with pure functional programming with no side effects at
  all (as in Haskell), then any expression that is determined at compile time
  to be an application of foo: (foo E) can be reduced to simply 1, even if E is
  a function call that might loop forever or even stop with an error.  [Note
  how side-effects  makes such an optimization impossible or at least require
  some code analysis.]

  The basic difference between the two ways to evaluate things can be easily
  demonstrated easily in lambda calculus, using the substitution model:
  whenever we have an expression such as (F (G X)), where F and G are both
  lambda expressions and X isn't, then applicative order evaluation will reduce
  G using X then F using the result, and normal order evaluation will first
  reduce F using the (G X) _expression_ and continue (but in now F might
  actually disappear!).

  So the rules are (redex is some subexpression that can be reduced, i.e,
  something that looks like (X Y) where X is a lambda expression):

  * applicative-order: always reduce the innermost redex first (if we also
    specify leftmost then we determine the evaluation order for function
    arguments).

  * normal-order: always reduce the outermost redex firts.

  It can be shown that applicative order is the most efficient way to evaluate
  things - but only in a case where you don't get stuck in an infinite loop.
  But normal order evaluation aslo have an advantage - it can be shown that if
  it is possible to get a result in any evaluation order, then normal order is
  guaranteed to get it.

  Here is an example of the differences:

  --------------------
    (define (zero x) 0)
    (define (square x) (* x x))
  --------------------

  Applicative-order:

  --------------------
    (square (square (square 2)))
        -->  (square (square (* 2 2)))
        -->  (square (square 4))
        -->  (square (* 4 4))
        -->  (square 16)
        -->  (* 16 16)
        -->  256
    (zero (square (square (square 2))))
        -->  (zero (square (square (* 2 2))))
        -->  (zero (square (square 4)))
        -->  (zero (square (* 4 4)))
        -->  (zero (square 16))
        -->  (zero (* 16 16))
        -->  (zero 256)
        -->  0
  --------------------

  Normal-order:

  --------------------
    (square (square (square 2)))
        -->  (* (square (square 2)) (square (square 2)))
        -->  (* (* (square 2) (square 2)) (square (square 2)))
        -->  (* (* (* 2 2) (square 2)) (square (square 2)))
        -->  (* (* 4 (square 2)) (square (square 2)))
        -->  (* (* 4 (* 2 2)) (square (square 2)))
        -->  (* (* 4 4) (square (square 2)))
        -->  (* 16 (square (square 2)))
        -->  (* 16 (* (square 2) (square 2)))
        -->  (* 16 (* (* 2 2) (square 2)))
        -->  (* 16 (* 4 (square 2)))
        -->  (* 16 (* 4 (* 2 2)))
        -->  (* 16 (* 4 4))
        -->  (* 16 16)
        -->  256
    (zero (square (square (square 2))))
        -->  0
  --------------------

===============================================================================

* delay and force.

  As seen, the delay and force functions are used to create a delayed (lazy)
  object that will actually be evaluated later.  This is used to define lazy
  data structures as streams (see below) so we can benefit from a little
  laziness.  A naive definition can be the following:

  --------------------
    (defmacro (delay exp) `(lambda () exp))
    (define (force promise) (promise))
  --------------------

  Note that delay _must_ be a macro (or a special form) while force can be a
  regular function.  The obvious optimization to the above definition will be
  to memoize any result, this can be done with a helper make-promise function:

  --------------------
    (define (make-promise thunk)
      (let ((value #f))
        (lambda ()
          (or value
              (let ((r ((thunk))))
                (set! value r)
                r)))))
    (defmacro (delay exp) `(make-promise (lambda () ,exp)))
    (define (force promise)
      (promise))
  --------------------

  This works fine except for cases where the memoized value should be #f...
  A quick note: to solve this, the traditional thing to do is to use any
  private boxed object, and to store it as the initial value, if this object is
  not used outside, then everything is OK.  Here is such a solution:

  --------------------
    (define no-value (list 'nothing))
    (define (make-promise thunk)
      (let ((value no-value))
        (lambda ()
          (if (eq? value no-value)
            (let ((r ((thunk))))
              (set! value r)
              r)
            value))))
  --------------------

  And an even better solution that doesn't allow accessing the tag:

  --------------------
    (define (make-promise thunk)
      (let* ((no-value (list 'nothing))
             (value no-value))
        (lambda ()
          (if (eq? value no-value)
            (let ((r ((thunk))))
              (set! value r)
              r)
            value))))
  --------------------

  And the final version doesn't use a different tag for every promise:

  --------------------
    (define make-promise
      (let ((no-value (list 'nothing)))
        (lambda (thunk)
          (let ((value no-value))
            (lambda ()
              (if (eq? value no-value)
                (let ((r ((thunk))))
                  (set! value r)
                  r)
                value))))))
  --------------------

  Note that an equivalent solution to the last one would have been using
  '(nothing) or "nothing" since these would have been inserted in the code
  literally, but it still makes a different binding for every promise, plus the
  above piece of code is a standard Scheme function to do such thngs.

===============================================================================

* Streams - basic idea and pipes.

  Many times we are interested in using lazy features, for example, to
  represent infinite data structures.  One way to get this is to use a lazy
  language - as shown above.  However, we saw that it is possible to use some
  lazy features in an eager language by using a lazy data constructor, this is
  what we get with cons-stream - it doesn't evaluate its tail part until you
  actually need to use it.  This can be defined easily (as it is actually
  defined in Swindle):

  --------------------
    (define empty-stream empty)
    (define null-stream? null?)
    (defmacro (cons-stream x y) `(cons ,x (delay ,y)))
    (define heads car)
    (define (tails s) (force (cdr s)))
  --------------------

  Note that the only thing that should be defined as a macro is the cons-stream
  operation that converts some piece of code to a delay expression which, as
  seen above, is further converted to a lambda expression which is the only way
  to defer evaluation.

  That is hardly the only possible implementation, it is just very close to the
  way that cons is defined in Scheme.  We could, for example, use cons cells
  that have both parts being lazy:

  --------------------
    (define lazy-null empty)
    (define null-lazy? null?)
    (defmacro (cons-lazy x y) `(cons (delay ,x) (delay ,y)))
    (define (lazy-head x) (force (car x)))
    (define (lazy-tail x) (force (cdr x)))
  --------------------

  We could also use a data structure that has exactly one slot which is lazy -
  this is exactly what we get by using the delay macro.

  Another point that makes the streams data structure so intuitive is the fact
  that it is close to the notion of time - we get new values as time goes.
  This is used, for example, to simulate side-effects with streams.  It can
  also be used as pipes - a cute example is that pipes in DOS vs. pipes in Unix
  is very similar to lists vs. streams in Scheme:

  * Unix does activate more than on process - so you get something very similar
    to streams, except that in Unix values are flowing concurrently whereas in
    Scheme you evaluate an expression whenever you need its value.

  * DOS is doing a simulation - run one program, collect the output in a
    temporary file, then run the second on that file.  This is like using
    lists since you must have the complete list (file) before applying further
    functions.

  One example where we see the effect:

    > grep "something" bigfile | more

  Unix - more will display stuff as soon as grep outputs them;
  DOS  - you'll have to wait for grep to finish its work and only then you'll
         see some results.

  A usage for that in Unix is that you have special "pipe" and "device" files
  that are considered as infinite streams of information, for example, digital
  voice input is continuously coming out of /dev/audio, and a running process
  can output information to a pipe.  This allows something like this to run in
  Unix:

    > foo | grep "something"

  this can work even if foo is a program that outputs text forever.

===============================================================================

* So, as we saw, streams are a very convenient tool to handle infinite lists,
  but they can be useful in other situations as well.  This example is the
  eight queens problem: how can we place 8 queens on a chess board so no queen
  threatens another.  We can write a program that does this very easily by
  representing a board configuration as a list of numbers, each representing
  the position of a queen on some row (this is ok since no two queens can be on
  one row).  It is easy to construct the list of all possible results:

  --------------------
    ;;; get a list of lists and append all of them, removing #f's
    (define (flatten lsts)
      (cond
       ((null? lsts)        empty)
       ((null? (head lsts)) (flatten (tail lsts)))
       (else
        (let ((elt  (head (head lsts)))
              (rest (cons (tail (head lsts)) (tail lsts))))
          (if elt
            (cons elt (flatten rest))
            (flatten rest))))))
    ;;; return a new board configuration of #f if it will be invalid.
    (define (make-queens q others)
      (define (iter q-i q+i others)
        (or (null? others)
            (and (let ((o (head others)))
                   (not (or (= o q) (= o q-i) (= o q+i))))
                 (iter (- q-i 1) (+ q+i 1) (tail others)))))
      (and (iter (- q 1) (+ q 1) others)
           (cons q others)))
    (define options '(0 1 2 3 4 5 6 7))
    (define (queens n)
      (if (zero? n)
        (cons empty empty)
        (flatten (map (lambda (new)
                        (map (lambda (solution) (make-queens new solution))
                             (queens (- n 1))))
                      options))))
  --------------------

  Now, this was simple enough, but what would satisfy us is just one solution.
  Converting the above code to return just the first solution and not all of
  them would be a little tricky.  We could even use non-local control
  mechanisms to exit immediately when we get a solution: a simple way would be
  to just use "error" to stop the computation immediately when we get a
  result...  But this is not enough - since for the list of solutions to N
  queens, we first evaluate all of the solutions for N-1 queens, but this
  should be avoided.

  So, generally, converting this program to return only the first solution will
  probably change its structure dramatically, making it quite different than
  the original (simple) solution above.

  But - another way is to simpley use streams!  This is extremely easy - simply
  rewrite the program using stream oprtaions instead of list operations:

  --------------------
    (define (flatten lsts)
      (cond
       ((null-stream? lsts)         empty-stream)
       ((null-stream? (heads lsts)) (flatten (tails lsts)))
       (else
        (let ((elt  (heads (heads lsts)))
              (rest (cons-stream (tails (heads lsts)) (tails lsts))))
          (if elt
            (cons-stream elt (flatten rest))
            (flatten rest))))))
    (define (make-queens q others)
      (define (iter q-i q+i others)
        (or (null? others)
            (and (let ((o (heads others)))
                   (not (or (= o q) (= o q-i) (= o q+i))))
                 (iter (- q-i 1) (+ q+i 1) (tails others)))))
      (and (iter (- q 1) (+ q 1) others)
           (cons-stream q others)))
    (define options (list->stream '(0 1 2 3 4 5 6 7)))
    (define (queens n)
      (if (zero? n)
        (cons-stream empty-stream empty-stream)
        (flatten (map (lambda (new)
                        (map (lambda (solution) (make-queens new solution))
                             (queens (- n 1))))
                      options))))
  --------------------

  We only have to rewrite map to be a streams operation, and two helper
  functions to convert lists to streams and back:

  --------------------
    (define (map f lst)
      (if (null-stream? lst)
        empty-stream
        (cons-stream (f (heads lst)) (map f (tails lst)))))
    (define (list->stream lst)
      (if (null? lst)
        empty-stream
        (cons-stream (head lst) (list->stream (tail lst)))))
    (define (stream->list lst)
      (if (null-stream? lst)
        empty
        (cons (heads lst) (stream->list (tails lst)))))
  --------------------

  And now, if we evaluate (stream->list (heads (queens 8))), then we get the
  first solution, and nothing else was evaluated!  Not even all of the
  solutions to the 7 queens problem.

  However, it can be confusing to use lazy constructors in an eager language:
  you always have to be careful not to evaluate things that you don't want to
  evaluate by mistake.  for example, when I wrote the above example, I first
  used append for flattening, but as soon as you have a function call, you have
  some arguments that you force their evaluation...

  When you do this in a lazy language, you get this benifit without this
  disadvantage because everything you do is lazy anyway.

  Another problem with such a program is that control flows in very
  unpredictable ways - you don't get a defined evaluation, instead things get
  evaluated when they're needed.  This is still the case with lazy languages,
  except that most of them are also purely functional so there is no real
  problem since there are no side effects.

===============================================================================

* Garbage Collection

  1. Memory is allocated in Scheme using cons and other such constructors,
     examples are instantiating an object, creating a lambda expression and
     applying a lambda expression (quick - why does application allocates
     memory objects?).

  2. We're considering the implementation details now, so we really go down to
     handling the difference between immediate values (unboxed) and references
     (boxed).  The main point here is that unboxed values are not a problem
     since they appear as literals, but boxed values are allocated so we need
     to collect the garbage every once in a while.  Actually, we need a "type
     tag" for memory cells to know what they represent.

  3. To identify which memory locations are garbage - we need to define the
     root set which is the things that are accessible in some way - this is
     made of the global environment and the variables on the computation stack.
     Note that if we are implementing environments as lists - then the global
     environment can be held as one pointer (that makes it possible to access
     the global environment variables from a single root pointer).

  4. Garbage collection can either happen periodically or when the programs
     tries to allocate more space and we're out of available space (usually
     this is the thing which is being done).  If that didn't help - Scheme
     allocates more memory from the operating system.  If _that_ doesn't help -
     boom!

  5. There are three major methods for garbage collection: reference pointers,
     Mark&Sweep and Stop&Copy.

    * Reference pointers - a lot of maintenance work - keep the counters
      updated, whenever a reference is added - increase the counter, and
      whenever a reference is lost, the couter decreases, and if its 0 then we
      can recycle the cell.  Complex problems arise when we manage circular
      references - what happens when we have a circle that is garbage (the
      cells inside do have pointers to them).  There is also memory wasted on
      the counter itself (another value in each cell).

    * Mark&Sweep - use an extra "visited" bit, then walk the whole memory,
      marking places you've been to.  This is quite slow, because the whole
      memory should be scanned.  The extra bits are not a big problem.

    * Stop&Copy - splits the memory into two halves, and every garbage
      collection copies all objects to the other side, updating pointers as
      necessary.  The process it self is surprisingly simple!

  6. Ephemeral garbage collection - try to mimic the human memory, similar to
     Stop&Wait in that the memory is being split to several parts.

===============================================================================