Introduction

This document provides a tutorial introduction to Rivl, a Tcl/Tk extension for multimedia processing. Rivl provides primitives for manipulating image, audio, and video objects.

A few examples will give you a feel for Rivl. In the following script, we implement and apply a "picture-in-a-picture" effect:

    proc pip {im1 im2} {
        im_scale! im1 0.3
        im_trans! im1 [expr 0.6*[im_width $im1]] [expr 0.15*[im_height $im1]]
        im_overlay $im1 $im2
    }
    im_write [pip [im_read football.jpg] [im_read red.ppm]] new.jpg

The pip procedure takes two images, im1 and im2. The first two lines of pip reduce im1 and translate it to the upper right corner of its box. The third line of pip overlays the new im1 onto im2. Finally, the im_write line reads two images (a JPEG and a PPM), calls pip to place one inside the other, and writes the result as a new JPEG.

The following script is an example of video assembly editing:

    set andre [seq_read andre.mpg]
    set luxo [seq_read luxo.mpg]
    set out [seq_concat [seq_crop $andre 0.0 10.0] $luxo]
    seq_write $out new.mpg

The first two lines read MPEG files called andre and luxo. The third line pastes together the first ten seconds of andre with luxo. Finally, the seq_write command writes the result as a new MPEG.

Getting started

This tutorial is designed to be used interactively. That is, although you could just read the tutorial, you will get more out of it by trying the commands as you read them.

To use the tutorial, create a temporary working directory:

    % cd ~
    % mkdir tutorial
    % cd tutorial

Then, copy the image and video files used in this tutorial with the following command, replacing RIVL with the root directory of your Rivl source distribution.

    % cp RIVL/data/images/* RIVL/data/movies/* .

Finally, type the command

    % wish-rivl

to invoke wish-rivl, which behaves like an ordinary wish interpreter, reading commands from standard input and writing the results to standard output. Your path must contain the directory in which wish-rivl was installed. For Cornell CS users, wish-rivl is located in /home/sww/arch/bin, where arch is an architecture name like sun4, hpux, or solaris.

We assume that the reader is already familiar with Tcl/Tk. If not, the books by John Ousterhout and Brent Welch [ref,ref] provide excellent introductions.

Image Processing with Rivl

The simplest image processing task involves reading an image from a file, transforming it, previewing the result, and writing it out to a file. The following command reads an image from a file:

    % im_read tiger.jpg
    rivl_im3

The file tiger.jpg is in JPEG [ref] format, one of several file formats Rivl supports. (See rivl_help File-types for more about file formats.) The return value of im_read, rivl_im3 in this case, is a handle to the new image object. The exact name of the handle may differ on your machine. This handle is used in subsequent commands to access the image, much like a file handle is used to access an open file. The next two commands scale and rotate the image around its center:

    % im_scaleC rivl_im2 0.5
    rivl_im6
    % im_rotateC rivl_im5 30
    rivl_im14

These commands, like all Rivl image commands, are non-destructive: they return a handle to a new image rather than modifying an existing image. We can display the original image using im_display:

    % im_display rivl_im2
    .imwin1

im_display brings up a display window like the one shown above, and returns the name of the created window (.imwin1). Because we display images often in this section, we define a procedure:

    proc ? {im} {
        im_display $im .imwin1
    }

The second argument to im_display causes the existing window to be used rather than a new one. To see the scaled and rotated image, we type

    % ? rivl_im13

To save the new image permanently, use im_write:

    % im_write rivl_im13 new-tiger.jpg

This command creates a new image file called new-tiger.jpg.

Issuing commands interactively is a quick way to find out what Rivl commands do, but for complex tasks, one normally writes procedures and refers to image handles through variables. The following procedure scales and rotates an image as a function of a numeric parameter a numeric parameter p:

    % proc whirlpool {im p} {
        set im [im_rotateC $im [expr 360*$p]]
        set im [im_scaleC $im $p]
        return $im
    }

The following commands read an image and show the effect of different values to whirlpool.

    % set tiger [im_read tiger.jpg]
    % ? [whirlpool $tiger 0.8]    
    % ? [whirlpool $tiger 0.6] 
    % ? [whirlpool $tiger 0.4]

The set im... notation used in whirlpool is cumbersome. Most Rivl commands have a destructive form, which is invoked by appending a ! to the command name. That is,

    op! im ... is equivalent to set im [op $im ...]

Using this notation, we can rewrite the procedure whirlpool:

    % proc whirlpool {im p} {
        im_rotateC! im [expr 360*$p]
        im_scaleC! im $p
        return $im
    }

This procedure is equivalent in behavior to the first one, but the notation is more compact. Note that we omit the dollar sign for the first parameter of a destructive command so that its name, rather than its value, is passed.

Size and ROI

Every image has a size that you can access with im_size:

    % im_size $tiger
    320.000000 240.000000

The size of an image is initialized when it is read from a file. You can change the size with im_setsize:

    % ? [im_setsize $tiger 480 360]

Notice that the displayed region is larger than before. When Rivl displays or writes an image, it clips the data inside the image's region of interest (ROI), which is simply the rectangle from (0,0) to (width,height). Because this rectangle is always anchored at the origin, the terms "ROI" and "size" express the same information and are used interchangeably.

Forcing the ROI to anchor at (0,0) is not as restrictive as it seems. It simply means that rather than moving the ROI around the image, you move the image in relation to the ROI. For example, to effectively set the ROI around the rectangle 50,100 - 250,200, you shift the image -50,-100 and then set the size to 200,100:

    % ? [im_setsize [im_trans $tiger -50 -100] 200 100]

The ROI of this image is (0,0) to (200,100); there is no record of the upper corner's original location.

Image data and ROI are independent notions. You can see this by comparing the output of im_setsize with im_scale:

    % ? [im_setsize $tiger 480 360] 
    % ? [im_scale $tiger 1.5]

Im_setsize modifies the size without affecting the data in the image, while im_scale modifies the data without affecting the size. For modifying both size and data, Rivl provides several shortcuts. First, you can scale both image and size by the same factor using the -scaleROI flag to im_scale. Second, you can force the image data and ROI to fit an exact size using im_conform. This command is especially useful for working with images of different aspect ratios.

Overlays and mattes

The commands introduced so far operate on single images. To combine multiple images, one normally uses im_overlay, which lays one image on top of another. To illustrate overlaying, let's implement a pip (picture-in-a-picture) effect.

    % set fb [im_read football.jpg]
    % set red [im_read red.jpg]
    % im_scale! fb 0.3
    % im_trans! fb [expr 0.6*[im_width $fb]] [expr 0.15*[im_height $fb]]

The last two lines reduce and shift over fb so that it resides in the upper right corner of its ROI. Below the two images and their composition are shown.

    % ? $fb
    % ? $red
    % ? [im_overlay $fb $red]

Red can be seen behind fb because fb is transparent outside its box. When two images are overlaid, the bottom image shows through where the top is transparent. An image's transparency/opacity is determined by its matte. A matte is a bi-level image that indicates, for every pixel in an associated image, whether that pixel is opaque or transparent. An image's matte can be extracted using im_matte:

    % ? [im_matte $fb]

An image's matte need not be rectangular, as the following example shows:

    % ? [im_rotate $fb 20] 
    % ? [im_matte [im_rotate $fb 20]] 
    % ? [im_overlay [im_rotate $fb 20] $red]

When an image is displayed or output to a file, it is overlaid onto a solid black background. Thus, transparent pixels inside the ROI appear as black, as in the first image above.

Rivl provides three primitives for clearing parts of an image (i.e., modifying its matte): im_clear, im_clip, and im_crop. Each takes five parameters: an image and the corners of a box. Im_clear clears everything inside the box, and im_clip clears everything outside the box:

    % ? [im_clear $tiger 50 100 250 200] 
    % ? [im_clip $tiger 50 100 250 200]

im_crop is like im_clip, but also sets the ROI to the box. It is equivalent to:

    proc im_crop {im x1 y1 x2 y2} {
        im_clip! im $x1 $y1 $x2 $y2
        im_setsize! im [expr $x2-$x1] [expr $y2-$y1]
        im_trans! im [expr -$x1] [expr -$y1]
    }
    % ? [im_crop $tiger 50 100 250 200]

Transparency

The mattes described so far have been bi-level. In general, a matte is an arbitrary byte-valued image representing a continuous range from opaque (255) to transparent (0). One way to achieve partial transparency is with im_fade:

    % set fb [im_read football.jpg]
    % ? [im_fade $fb 0.7]

im_fade multiplies an image's matte by the specified constant. Thus, im_fade with 1.0 has no effect, and im_fade with 0.0 clears the image. When overlaying a partially transparent image onto another image, the pixels are combined with a weighted sum according to the top matte:

    % set house [im_read house.jpg]
    % ? [im_overlay [im_fade $fb 0.7] $house] 
    % ? [im_overlay [im_fade $fb 0.5] $house] 
    % ? [im_overlay [im_fade $fb 0.2] $house]

im_fade and im_overlay are used to create cross-fade transitions in video, as shown in the next section.

That concludes our introduction to image processing. In this section you saw a small but representative set of image operations, enough to read the next section on video processing. In section 3 you will see more built-in image operations and learn how to create your own.

If you still have an image display window open, type `q' inside it to destroy the window.

Video Processing with Rivl

This section shows you how to read, manipulate, display, and write video sequences through a simple example: assembling a short TV commercial. Before we begin, however, we have to introduce a bit of terminology.

Preliminaries

A sequence is a one-dimensional array of images. All the images in a sequence are the same type (e.g. RGB) and size (e.g. 320x240), and can be read from and written to several formats: MPEG files, motion JPEG files in CMT clipfile [ref] format, and directories full of image files. The following command reads a sequence from an MPEG file:

    % set farmers [seq_read farmers.mpg]

You can preview a sequence with seq_display:

    % seq_display $farmers
    .strip1

This brings up farmers in a sequence previewer. You may wish to expand the width of the window to view more frames at a time. Because we display sequences often in this section, we define a procedure:

    % proc ? {seq} {
        seq_display $seq .strip1
    }

The second argument to seq_display causes the existing window to be used rather than a new one.

As with images, sequences have a ROI, except that a sequence's ROI is expressed in the time dimension. When Rivl displays or writes a sequence, it clips the data inside the ROI. Sequence ROIs are always anchored at 0.0 on the left. Thus, rather than describing an ROI with two coordinates, we simply use the length of a sequence. The sequence length is displayed in the lower right corner of sequences in this document, as well as in the previewer. You can query the length of a sequence with seq_length:

    % seq_length $farmers
    6.0

The effect of Rivl commands on ROIs is different for images and sequences. Most image commands maintain the size of images despite geometric changes to the data (e.g. rotating). In contrast, most sequence commands expand and contract the length of sequences as frames are added or removed.

A commercial

Now that we have defined our vocabulary, we are ready to assemble the commercial. It consists of three video clips and a still sequence of a corporate logo, all connected with fade transitions. farmers (defined above) will be our first video clip, followed by child and house:

    % set child [seq_read child.mpg]
    % set house [seq_read house.mpg]

The corporate logo is an image file in logo.jpg:

    % set logo [im_read logo.jpg]
    % im_display $logo

To combine the logo with child, house, and farmers, it needs to be made into a sequence. This is accomplished with im_to_seq:

    % set logoseq [im_to_seq $logo -length 3.0]
    % ? $logoseq

im_to_seq constructs a video by repeating one still image for the specified length of time.

Now that we have the four pieces of the commercial, we connect them with seq_concat:

    % ? [seq_concat $farmers $child $house $logoseq]

seq_concat is analogous to taping together pieces of film end to end. The resulting sequence switches abruptly between clips (this is called a cut). Suppose we want to join farmers and child with a one-second cross-fade transition instead. This effect is created as follows:

Cut out the last second of farmers and the first second of child.
Fade the last second of farmers from opaque to transparent over time.
Overlay the result onto the first second of child.

To cut out the required pieces from farmers and child, we use seq_crop:

    % set farmers_end [seq_crop $farmers 5.0 6.0]
    % set child_beg [seq_crop $child 0.0 1.0]

seq_crop is analogous to cutting out a piece of film with scissors: it returns a sequence containing just the specified range, anchored at 0.0. Both farmers_end and child_beg are 1 second sequences.

The next step is fading farmers_end to transparency over time. Recall the im_fade command, which multiplies an image's matte by a fraction. The idea is to apply im_fade to the images of farmers_end with a parameter that decreases from 1 to 0 over time. To apply an image operation over a sequence, we use seq_map:

    % seq_map! farmers_end {im_fade %1 [expr 1-%p]}
    % ? $farmers_end

seq_map applies the script {im_fade %1 [expr 1-%p]} to every image in farmers_end. Each time seq_map applies the script, it replaces %1 with the image and %p with the relative time of the image, from 0.0 to 1.0. The results are collected into a new sequence and returned.

To overlay the new farmers_end onto child_beg, you can use another seq_map script:

    % set transition [seq_map "$farmers_end $child_beg" {im_overlay %1 %2}]
    % ? $transition

This time, seq_map maps over two sequences. It applies im_overlay to corresponding pairs of images from farmers_end and child_beg, collecting the results into a new sequence. Since overlaying sequences is a common operation, Rivl provides seq_overlay, defined as

    proc seq_overlay {args} {seq_map $args "im_overlay %*"}

seq_overlay works for any number of sequences. The substitution character %* is equivalent to %1 %2 ... %n, where n is the number of sequences given to seq_map.

The one-second transition is put into place with seq_concat:

    % ? [seq_concat [seq_crop $farmers 0.0 5.0] $transition [seq_crop $child 1.0 6.0]]

To summarize, here is the code to connect farmers to child with a one-second fade:

    % set farmers_end [seq_crop $farmers 5.0 6.0]
    % set child_beg [seq_crop $child 0.0 1.0]
    % seq_map! farmers_end {im_fade %1 [expr 1-%p]}
    % set transition [seq_map "$farmers_end $child_beg" {im_overlay %1 %2}]
    % seq_concat [seq_crop $farmers 0.0 5.0] $transition [seq_crop $child 1.0 
            0 6.0]

The task of connecting two sequences with a transition is generalized in the following procedure, which is included in the standard Rivl library:

    rivl_proc! seq_connectWithTransition {movieA movieB script duration} {
        set lengthA [seq_length $movieA]
        set lengthB [seq_length $movieB]
        set begin [seq_crop $movieA 0.0 [expr $lengthA-$duration]]
        set end [seq_crop $movieB $duration $lengthB]
        set mid1 [seq_crop $movieA [expr $lengthA-$duration] $lengthA]
        set mid2 [seq_crop $movieB 0.0 $duration]
        seq_map! mid1 $script
        set transition [seq_overlay $mid1 $end]
        seq_concat $begin $transition $end
    }

seq_connectWithTransition takes two sequences, a seq_map script, and a duration. First, it crops out the parts of movieA and movieB that are outside the transition. Next, it maps the effect over the end of movieA and overlays the result on the beginning of movieB. Finally, it concatenates the three segments together.

Now, to assemble the commercial, apply seq_connectWithTransition three times:

    % set fadeScript {im_fade %1 [expr 1-%p]}
    % set comm [seq_connectWithTransition $farmers $child $fadeScript 1.5]
    % set comm [seq_connectWithTransition $comm $house $fadeScript 1.5]
    % set comm [seq_connectWithTransition $comm $logoseq $fadeScript 1.5]
    % ? $comm

The clips are connected with 1.5 second fade transitions. You'll have to zoom in, or make your window longer, to see the transitions.

To save the commercial to an MPEG, use seq_write:

    % seq_write $comm out.mpg

This command creates a new MPEG file called out.mpg. The movie can be played with an application such as the Berkeley MPEG player:

    % mpeg_play out.mpg

Temporal manipulation

Temporal manipulation involves changing the time position of frames without changing their contents. Two examples of temporal commands are seq_speedup, which makes a sequence faster or slower, and seq_reverse, which flips a sequence in time.

Suppose we want to stretch our 16.5 second commercial to the standard 30 second slot, without adding new footage. One way to accomplish this task is to make everything slow-motion:

    % set long-comm [seq_speedup $comm [expr 16.5/30.0]]

seq_speedup takes a sequence and a factor. If the factor is greater than 1, the sequence is sped up and becomes shorter; otherwise it is slowed downand becomes longer, as in this case.

Access to frames

So far, seq_map has proved adequate for operating on the frames of a sequence. However, seq_map is limited to applying image operations independently to each frame. For example, you cannot use seq_map to compare pairs of adjacent frames, or to create a sequence whose images are not in one-to-one correspondence with the input sequence.

Several Rivl commands provide low level access to a sequence's frames. seq_sample samples a sequence at a point in time, returning a single image. seq_to_ims samples a sequence repeatedly at a fixed frame rate (see the next section) and returns a list of images. ims_to_seq performs the reverse operation: it takes a list of images and returns a sequence.

If you look at our commercial closely, you'll notice that one of the frames is damaged:

    % im_display [seq_sample $comm [expr 8/24.0]]

The frame at 8/24 seconds is the ninth frame. The following command removes the frame from the sequence:

    % set imlist [seq_to_ims $comm]
    % set comm [ims_to_seq [lreplace $imlist 8 8]]

This alters the length of the movie slightly. You could also use lreplace $imlist 8 8 [lindex $imlist 7], which replaces the damaged frame with its predecessor.

In principle, some finite sequences could be represented and manipulated as lists of images. However, the operations are many times slower, and the abstractions less powerful, than those of the Rivl sequence type. You should use direct sequence commands (rather than converting a sequence to a list of images) whenever possible. The above task, for example, could be accomplished with the seq_replace command.

Frame rate

Every sequence has a default frame rate that you can access with seq_fps

    % seq_fps $comm
    24

and modify with seq_setfps. Its value is used by commands that sample a sequence at a fixed rate, such as seq_write and seq_to_ims. For example, the call to seq_to_ims above returned a list of 16.5 *24 = 396 images, where 16.5 is the length and 24 the frame rate.

A sequence's frame rate is independent from its other properties. For example, setting the frame rate of comm to 12 does not affect its length or contents; it simply halves the number of images written by seq_write or returned from seq_to_ims.

Extending Rivl

Rivl is designed to be a framework for media processing. Philosophy similar to Tcl and Unix - provide a very small core, make it easy to write extensions and dynamically load just the ones you need.

One way to extend Rivl is to use the basic commands in procedures, like whirlpool. You can do some sophisticated things. However, the built-in commands are limited to very common operations like translation and scaling. At some point you'll want to define exactly what happens at the pixel level.

Since Tcl is a convenient high level language, it would be nice to manipulate images at the Tcl level. You could have pixel_read and pixel_write commands and do anything to the pixels in-between. However, this would be many orders of magnitude slower than transforms written in C because of loop interpretation. Therefore, image transforms must be written in C. However, there is no reason why we can't use tcl to generate the C code. Rivl has an interface to do this, called generics. Since this method is so much easier to use, we teach it first.

Generics

Rivl generics is a (system/mini-language) which allows you to specify image transformations with small expressions - usually one or two lines of code. It works by taking C code and inserting into a template. The result is a C module that can be dynamically linked into a running Rivl application and applied to images.

Some kinds of transforms.

Constructive: f(x,y)
Local: f(in1(x,y),...,inn(x,y)).
Positional: f(x,y,in1(x,y),...,inn(x,y)).
Geometric: f(x,y,in(g(x,y),h(x,y))
Global: f(x,y,in1,...,inn).

Once again we define ? to mean display an image. And read in the flowers image.

Here is an example of a generic.

    rivl_generic define im_scalarmult {{image out} {image in}} {{double m}} {
        out(x,y) = in(x,y) * m;
    }

This takes a few seconds. The arguments are:

rivl_generic - the name of the command
define - the commonly used subcommand. As you will see below, it combines a number of lower level subcommands.
{{image out} {image in}} - a nonempty list of image declarations. The first element is the output image; subsequent elements are input images. The names may be anything, although it is typical to use "out" and "in" for one-input transforms.
{{double m}} - a list of numeric parameters, either double or int.
A C expression to be performed at every pixel. The expression forms the body of a double for loop over all the x,y coordinates of the image. out(x,y) and in(x,y) access the pixel of out and in at the current location.

To apply the new command:

    ? [im_scalarmult $flowers 0.6]

Note that this operation is different from im_fade, which affects the matte. This image is dimmed but still opaque.

Constructive transforms

    generic im_ramp {{gray image out}} {
        out = (x+y)%255;
    }

? [im_ramp]

Image types

An image declaration can take two forms: {type image name} or {image name}. If the first form is used, then this image must be a particular type. For an output image, this gives the output type of the function.

If the second form is used, then the image is a wildcard: it can match any type in the wildcard list of the generic, which by default is all types but ptr. See Syntax, below, for how to specify the wildcard type list.

You can have multiple wildcard images in a generic. For a given invocation, all wildcard images must match in type. For example, you can call im_add with two gray images or two rgb images, but not one of each.

If the output image is a wildcard type and there is at least one wildcard input, then the output type of the operation is determined by the type of the wildcard inputs (which must all match). However, if there is no wildcard input, then the type must be specified as the first argument to the generic.

For example, im_random can generate gray, rgb, short, and long - the first argument is a type.

RGB distribution

RGB image parameters are treated differently than wildcard parameters that happen to be instantiated with an RGB input. In the first case, the color fields of the pixel (r, g, b) may be referred to. In the second case, however, the fields are acted on as a group.

For example, in im_mult, inner body is applied three times for each pixel, once for each color field.

Here are some examples that take advantage of access to color fields. im_rotatergb, rgb_to_gray.

Maintenance of generic files

You can customize where generic commands are created and where existing commands are searched for.

The -outDir flag to rivl_generic define specifies the directory in which the C file and shared object file should be created. If -outDir is not specified, then the value of rivl_params(genericOutDir) is used. This is ., the current directory, by default. You may wish to have all of your generic files created in a single subdirectory off your home directory, or a common project group directory.

It is convenient to locate rivl_generic define commands in .tcl scripts, intermixed with other .tcl code. Such a script may be source'd many times, especially during prototyping. Before generating the C file, Rivl searches the directories in a path for a .c and .so file of the same name. If it exists, it checks to see whether the .c file is from exactly the same generic as the one it is creating; if so, it need not generate it. It then checks whether the .so is older than the .c file, and if so compiles and links it. Finally, it loads the .so file. The search path is in rivl_params(dlPath). By default it contains only ., the current directory.

Syntax

As in regular expressions, token* means zero or more of token and token? means optional.

cmd = rivl_generic define {image-param*} {numeric-param*} inner-loop option*

image-param = {value-type? image name}

numeric-param = {numeric-type? name}

numeric-type = double | int

option = -loopSetup C-body | -rowSetup C-body | -loopCleanup C-body | -outDir dirname | -types {value-type*} | -types all | -local

inner-loop = C-body

Option summary

-loopSetup: This C body is inserted before the outer loop. It may begin with variable declarations.

-rowSetup: This C body is inserted inside the outer loop but before the inner loop. It may begin with variable declarations.

-loopCleanup: This C body is inserted after the outer loop.

-outDir: Specifies the directory in which the C file and shared object file should be created. If omitted, the value of rivl_params(genericOutDir) is used.

-types: Specifies which value types are allowed for wildcard images. By default, all types are allowed except for ptr. If all is specified, then all types including ptr are allowed.

Examples

Lower-level commands rivl_generic define im_scalarmult ... performs three tasks:

Uses the specified inner loop to generate a C module called gen-im_scalarmult.c.
Runs your system's compiler to get gen-im_scalarmult.o.
Runs your system's linker to get a shared object file like gen-im_scalarmult.<arch>.so, where <arch> is your machine type. The extension (e.g. .so) varies from machine type.
Dynamically loads the shared object file.

We examine each of these below

C code generation

rivl_generic generate name cFileName

This command takes the same arguments as the define subcommand, plus a cFileName.

    rivl_generic compile cFileName soFileName
    dl_use soFileName