Zeno : The Papers

RESEARCH PAPERS

Quick Jump :

Passive Capture and Structuring of Lectures

ACM Multimedia 1999, Orlando, FL, Oct. 30 - Nov. 4, 1999.

Sugata Mukhopadhyay, Brian C. Smith
Abstract

Despite recent advances in authoring systems and tools, creating multimedia presentations remains a labor-intensive process. This paper describes a system for automatically constructing structured multimedia documents from live presentations. The automatically produced documents contain synchronized and edited audio, video, images, and text. Two essential problems, synchronization of captured data and automatic editing, are identified and solved.

PDF version(424K)
Word 97 version(2.0M)

Techniques For Improving Multimedia Communication Over Wide Area Networks

PhD Thesis, Department of Electrical Engineering, Cornell University, Janurary 1999.

Soam Acharya
Abstract

Despite widespread interest and technical progress, significant barriers exist for video playback over the Internet. These obstacles include network unreliability, client heterogeneity, and server bottlenecks. Although various playback systems have been proposed, none address all the issues satisfactorily. This thesis proposes and investigates MiddleMan, an alternate approach.

MiddleMan is a collection of cooperating caching proxies running in a local area network (LAN). Such a configuration offers several advantages. By caching videos relatively close to the clients, MiddleMan reduces overall startup delays and the possibility of adverse Internet conditions disrupting video playback. Additionally, MiddleMan dramatically reduces server load by intercepting a large number of server accesses and can be easily extended to provide other services.

Several issues must be addressed before MiddleMan can be built and deployed. The first problem involves determining the intrinsic properties of video files on the web and how they are accessed over the Internet. Such an understanding is useful in order to effectively detail the architecture of MiddleMan. Hence, I conducted two studies: one to characterize videos on the web and another that analyzes how users access videos. I then used these results to derive the architecture of MiddleMan.

The second hindrance to building MiddleMan involves evaluating and refining the design. Hence, I developed a simulation environment for MiddleMan to test various configurations and caching algorithms. The final design achieves both high cache hit rates and excellent proxy load distribution.

Finally, MiddleMan supports client heterogeneity by converting video to an intermediate format that allows the system to better adjust to client loads. Thus, techniques for fast conversion of video must be developed and integrated into MiddleMan. Hence, I developed a compressed domain transcoder that converts MPEG to JPEG. The transcoder is about 1.5 to 3 times faster than its spatial domain counterpart

PDF version(1128K)

The Dalí Multimedia Software Library

SPIE Multimedia Computing and Networking 1999, San Jose, CA, January 25-27, 1999.

Brian C. Smith, Wei Tsang Ooi
Abstract

This paper presents a new approach for constructing libraries for building processing-intensive multimedia software. Such software is currently constructed either by using high-level libraries or by writing it "from scratch" using C. We have found that the first approach produces inefficient code, while the second approach is too time-consuming and produces complex code that is difficult to manage or reuse. We therefore designed and implemented Dalí, a set of reusable, high-performance primitives and abstractions that are at an intermediate level of abstraction between C and conventional libraries. By decomposing common multimedia data types and operations into thin abstractions and primitives, programs written using Dalí achieve performance competitive with hand-tuned C code, but are shorter and more reusable. Furthermore, Dalí programs can employ optimizations that are difficult to exploit in C (because the code is so verbose) and impossible using conventional libraries (because the abstractions are too thick). We discuss the design of Dalí, show several example programs written using Dalí, and show that programs written in Dalí achieve performance competitive to hand-tuned C programs.

PDF version(122K)
Gzipped postscript version(293K)
Word 97 version(334K)

Toward a Common Infrastructure for Multimedia-Networking Middleware

The 7th International Workshop on Network and Operating Systems Support for Digital Audio and Video (NOSSDAV 97), St. Louis, Missouri, May 19-21, 1997

Steve McCanne, Brian C. Smith, et. al.
Abstract
Real-time multimedia streams like audio and video are now integral data types in modern programming environments. Although a great deal of research has investigated effective and efficient programming support for manipulating such streams and although the design of digital media “middleware” is fairly well understood, no widely available or commonly accepted programming model exists within the research community. We believe this lack of common practice impedes our collective progress because it prevents disparate research groups from easily leveraging each other’s work. In this paper, we propose a solution to this problem that combines the best features of a number of existing multimedia toolkits — Berkeley’s Continuous Media Toolkit, MIT’s VuSystem, and the LBL/UCB MBone tools — into a fine-grained, extensible, and high-performance toolkit. We describe the convergence of these three toolkits into a common programming infrastructure and argue that the availability and acceptance of our middleware could potentially facilitate and accelerate breakthroughs in multimedia networking.

Postscript version (137K)
Gzipped postscript (46K)
Acrobat version (109K)

Thin Streams: An Architecture for Multicasting Layered Video

The 7th International Workshop on Network and Operating Systems Support for Digital Audio and Video (NOSSDAV 97), St. Louis, Missouri, May 19-21, 1997

Linda Wu, Rosen Sharma, Brian C. Smith
Abstract
Multicast is a common method for distributing audio and video over the Internet. Since receivers are heterogeneous in processing capability, network bandwidth, and requirements for video quality, a single multicast stream is usually insufficient. A common strategy is to use layered video coding with multiple multicast groups. In this scheme, a receiver adjusts its video quality by selecting the number of multicast groups, and thereby video layers, it receives. Implementing this scheme requires the receivers to decide when to join a new group or leave a subscribed group.
This paper presents a new solution to the join/leave problem using ThinStreams. In ThinStreams, a single video layer is multicast over several multicast groups, each with identical bandwidth. ThinStreams separates the coding scheme (i.e., the video layers) from control (i.e., the multicast groups), helping to bound network oscillations caused by receivers joining and leaving high bandwidth multicast groups.

This work evaluates the join/leave algorithms used in ThinStreams using simulations and preliminary experiments on the MBONE. It also addresses fairness among independent video broadcasts and shows how to prevent interference between them.

Postscript version (750K)
Word 97 version (500K)
Acrobat version (124K)

Motion and Feature Based Video Metamorphosis

To appear in ACM Multimedia 97

Robert Szewczyk , Andras Ferencz , Henry Andrews , Brian C. Smith
Abstract
We present a new technique for morphing two video sequences. Our approach extends still image metamorphosis techniques to video by performing motion tracking on the objects. Besides reducing the amount of user input required to morph two sequences by an order of magnitude, the additional motion information helps us to segment the image into foreground and background parts. By morphing these parts independently and overlaying the results, output quality is improved. We compare our approach to conventional motion image morphing techniques in terms of the quality of the output image and the human input required.

HTML version (45K + 893K of images)
Postscript version (2.1M)
Gzipped postscript version (1.2M)
Word 97 version (1.1M)
Acrobat version (815K)

An Experiment To Characterize Videos Stored On The Web

Multimedia Computing and Networking 1998

Soam Acharya, Brian C. Smith
Abstract
The design of file systems is strongly influenced by measuring the use of existing file systems, such file size distribution and patterns of access. We believe that a similar characterization of video stored on the Internet will help network engineers, codec designers, and other multimedia researchers. We therefore executed an experiment to measure how video data is used on the Web today. In this experiment, we downloaded and analyzed over 57000 AVI, QuickTime and MPEG files stored on the Web -- approximately 100 Gigabytes of data. Among our more interesting discoveries, we found that the most common video technology in use today is QuickTime, and that the image resolution and frame rate of video files that include audio are much more uniform than video-only files. The majority of all audio/video files have dimensions of CIF or QCIF (or very similar) at 10, 12, 15, or 30 fps, whereas the dimensions and frame rates of video-only files are more uniformly distributed. We also experimentally verified the conjecture that current Internet bandwidth is at least an order of magnitude too slow to support streaming playback of video. We present these results and other statistical information characterizing video on the web in this paper.

Postscript version (871K)
Acrobat version (139K)

Compressed Domain Transcoding of MPEG

Proceeding of IEEE Multimedia 1998

Soam Acharya, Brian C. Smith
Abstract
Current compression formats optimize for either compression or editing. For example, motion JPEG (MJPEG) provides excellent random and moderate overall compression, while MPEG optimizes for compression at the expense of random access. Converting from one format to another, a process called transcoding, is often desirable over the life of a video segment. In this paper, we show how to transcode MPEG video to motion-JPEG without fully decompressing the MPEG source. Our compressed domain transcoding technique differs from previous work because it uses a new technique that is optimized for software implementation and because we compare the performance of a working implementation of our compressed domain transcoder, instead of just counting the number of multiplies needed to transcode. Our experiments show that our compressed domain transcoder is 1.5 to 3 times faster than an optimized spatial domain transcoder, and offers another benefit: a single parameter can improve the speed of transcoding at the expense of the quality of the resulting images. This speed/quality trade-off is important to many real-time applications.

Postscript (1.1M)
Sample Images (68K)
Acrobat (282K)

CU-SeeMe VR: Immersive Desktop Teleconferencing

To appear in ACM Multimedia '96

Jefferson Han, Brian C. Smith
Abstract
Current video-conferencing systems provide a video-in-a-window user interface. This paper presents a video-conferencing application called CU-SeeMe VR that provides a richer interface. CU-SeeMe VR is a distributed video-conferencing system that allows users to connect to 3D worlds and interact with other using live video and audio embedded in a virtual space. This paper describes a prototype implementation of CU-SeeMe VR, including the user interface, system architecture, and a detailed look at the enabling technologies. Future directions and metaphors for this space are discussed.

HTML version
Acrobat version (211K)

Compressed Domain Processing of JPEG-encoded Images

To appear in Real-Time Imaging Journal

Brian C. Smith, Lawrence A. Rowe, July, 1996
Abstract
This paper addresses the problem of processing motion-JPEG video data in the compressed domain. The operations covered are those where a pixel in the output image is an arbitrary linear combination of pixels in the input image, which includes convolution, scaling, rotation, translation, morphing, de-interlacing, image composition, and transcoding. This paper further develops an approximation technique called condensation to improve performance and evaluates condensations in terms of processing speed and image quality. Using condensation, motion-JPEG video can be processed at near real-time rates on current generation workstations.

Acrobat version (931K)

Massively Distributed Video File Server Simulation: Investigating Intelligent Caching Schemes

Alexander Castro, C. Edward Lazzerini, Vivekananda Kolla December, 1995
Abstract
This paper, the final report in CS631, a graduate multimedia systems course, presents the results of a simulation study that compares the effectivesness of different caching schemes within the DVFS architecture.

HTML version
Acrobat version (34K)

A Survey of Compressed Domain Processing Techniques

Reconnecting Science and Humanities in Digital Libraries, University of Kentuky

Brian C. Smith, Oct 1995
Abstract
This short paper surveys current techniques for compressing compressed multimedia data, including compressed audio, video, and images.

HTML version
Acrobat version (160K)

A Resolution Independent Video Language

Presented at ACM Multimedia 95.

Jonathan Swartz, Brian C. Smith, November, 1995
Abstract
As common as video processing is, programmers still implement video programs as manipulations of arrays of pixels. This paper presents a language extension called Rivl (pronounced "rival") where video is a first class data type. Programs in Rivl use high level operators that are independent of video resolution and format, increasing a program's portability, simplifying code reuse, and reducing development time. This paper also describes a Rivl interpreter and the strategies the interpreter uses to optimize Rivl programs. These optimizations include classical programming language optimizations, such as common subexpression elimination and out of order execution, image and video specific optimizations, such as computing only those images that will affect the output, and an optimized memory manager.

HTML version
Acrobat version (822K)

Query By Humming -- Musical Information Retrieval in an Audio Database

Presented at ACM Multimedia 95.

Asif Ghias, Jonathan Logan, David Chamberlin, Brian C. Smith, November, 1995
Abstract
The emergence of audio and video data types in databases will require new information retrieval methods adapted to the specific characteristics and needs of these data types. An effective and natural way of querying a musical audio database is by humming the tune of a song. In this paper, a system for querying an audio database by humming is described along with a scheme for representing the melodic information in a song as relative pitch changes. Relevant difficulties involved with tracking pitch are enumerated, along with the approach we followed, and the performance results of system indicating its effectiveness are presented.

HTML version
Acrobat version (82K)

Tcl-DP Name Server

Presented at the 1995 Tcl/Tk Workshop.

Peter T. Liu, Brian Smith, Lawrence Rowe
July, 1995
Abstract
This paper describes a general purpose name server for Tcl-DP. This name server maintains host addresses and port numbers of services running in a distributed en- vironment and allows clients to query about them. It starts services on demand so services are guaranteed to be available, and it provides a simple authentication protocol for better security. The Tcl-DP name server is also designed to be fault- tolerant. Multiple backup servers can be started on different hosts, and a failover occurs when the main server goes down. In addition, the name server provides mechanisms to interface with external modules for extending its functionality.

Acrobat version (90K)