INTRODUCTION

The incorporation of video into our computing environment will change the way we interact with computers as much as the shift from alphanumeric terminals to graphical user interfaces. To realize this vision, video must become as accessible in our computing environment as text and images are today. Because video has different semantic, storage, and timing requirements, the realization of truly programmable video requires research in storage systems, transport protocols, compression methods, and algorithms.

The way we encode video algorithms today is similar to the way we expressed numerical algorithms in the days of assembly language. Video is compsed of a sequence of images, each of which is represented as a two dimensional array of pixel values. In the past, floating point operations were expressed by manipulating individual bits. Today, video and image operations are expressed by manipulating pixel values. Some systems (e.g. Data Explorer or Khoros) provide a graphical programming environment where programs are epressed as flowcharts, and the limitations of flowcharts for expressing complex programs are well known.

What is needed is a language that incorporates video as a first class data type, just as floating point numbers are first class data types in almost all modern programming languages. Thus, just as the floating point addition "A+B" is well-defined regardless of whether A and B are single, double, or even quad-precision floating point numbers, the operation "cut the first five seconds of the video clip" is well defined whether the film resolution is 16 or 30 frames per second, whether the format is MPEG or motion JPEG, and whether the image size is 100x100 or 6000x4000. This tutorial describes one such language, called Rivl (pronounced "Rival"). In Rivl, video operations are expressed independent of the internal representation of video data. Just as it is the responsibility of traditional languages to map a floating-point operation onto the underlying bit manipulations, it is Rivl's responsibility to map image and video clip operations onto the underlying pixel and frame manipulations.