|
RESEARCH PAPERS
|
Quick Jump :
Passive Capture and Structuring of Lectures
ACM Multimedia 1999, Orlando, FL, Oct. 30 - Nov. 4, 1999.
- Sugata Mukhopadhyay, Brian C. Smith
Abstract
Despite recent advances in authoring systems and tools, creating multimedia
presentations remains a labor-intensive process. This paper describes a system for
automatically constructing structured multimedia documents from live presentations.
The automatically produced documents contain synchronized and edited audio, video,
images, and text. Two essential problems, synchronization of captured data and
automatic editing, are identified and solved.
PDF version(424K)
Word 97 version(2.0M)
Techniques For Improving Multimedia Communication Over
Wide Area Networks
PhD Thesis, Department of Electrical Engineering, Cornell University, Janurary 1999.
- Soam Acharya
Abstract
Despite widespread interest and technical progress, significant barriers exist for
video playback over the Internet. These obstacles include network unreliability, client
heterogeneity, and server bottlenecks. Although various playback systems have been
proposed, none address all the issues satisfactorily. This thesis proposes and
investigates MiddleMan, an alternate approach.
MiddleMan is a collection of cooperating caching proxies running in a
local area network (LAN). Such a configuration offers several advantages. By caching
videos relatively close to the clients, MiddleMan reduces overall startup delays and the
possibility of adverse Internet conditions disrupting video playback. Additionally,
MiddleMan dramatically reduces server load by intercepting a large number of server
accesses and can be easily extended to provide other services.
Several issues must be addressed before MiddleMan can be built and
deployed. The first problem involves determining the intrinsic properties of video files
on the web and how they are accessed over the Internet. Such an understanding is useful in
order to effectively detail the architecture of MiddleMan. Hence, I conducted two studies:
one to characterize videos on the web and another that analyzes how users access videos. I
then used these results to derive the architecture of MiddleMan.
The second hindrance to building MiddleMan involves evaluating and
refining the design. Hence, I developed a simulation environment for MiddleMan to test
various configurations and caching algorithms. The final design achieves both high cache
hit rates and excellent proxy load distribution.
Finally, MiddleMan supports client heterogeneity by converting video to
an intermediate format that allows the system to better adjust to client loads. Thus,
techniques for fast conversion of video must be developed and integrated into MiddleMan.
Hence, I developed a compressed domain transcoder that converts MPEG to JPEG. The
transcoder is about 1.5 to 3 times faster than its spatial domain counterpart
The Dalí Multimedia Software Library
SPIE Multimedia Computing and Networking 1999, San Jose, CA, January 25-27, 1999.
- Brian C. Smith, Wei Tsang Ooi
Abstract
This paper presents a new approach for constructing libraries for building
processing-intensive multimedia software. Such software is currently constructed either by
using high-level libraries or by writing it "from scratch" using C. We have
found that the first approach produces inefficient code, while the second approach is too
time-consuming and produces complex code that is difficult to manage or reuse. We
therefore designed and implemented Dalí, a set of reusable, high-performance primitives
and abstractions that are at an intermediate level of abstraction between C and
conventional libraries. By decomposing common multimedia data types and operations into
thin abstractions and primitives, programs written using Dalí achieve performance
competitive with hand-tuned C code, but are shorter and more reusable. Furthermore, Dalí
programs can employ optimizations that are difficult to exploit in C (because the code is
so verbose) and impossible using conventional libraries (because the abstractions are too
thick). We discuss the design of Dalí, show several example programs written using Dalí,
and show that programs written in Dalí achieve performance competitive to hand-tuned C
programs.
PDF version(122K)
Gzipped postscript version(293K)
Word 97 version(334K)
Toward a Common Infrastructure for Multimedia-Networking
Middleware
The 7th International Workshop on Network and Operating Systems Support for Digital
Audio and Video (NOSSDAV 97), St. Louis, Missouri, May 19-21, 1997
- Steve McCanne, Brian C. Smith, et. al.
Abstract
Real-time multimedia streams like audio and video are now integral data types in modern
programming environments. Although a great deal of research has investigated effective and
efficient programming support for manipulating such streams and although the design of
digital media middleware is fairly well understood, no widely available or
commonly accepted programming model exists within the research community. We believe this
lack of common practice impedes our collective progress because it prevents disparate
research groups from easily leveraging each others work. In this paper, we propose a
solution to this problem that combines the best features of a number of existing
multimedia toolkits Berkeleys Continuous Media Toolkit, MITs VuSystem,
and the LBL/UCB MBone tools into a fine-grained, extensible, and high-performance
toolkit. We describe the convergence of these three toolkits into a common programming
infrastructure and argue that the availability and acceptance of our middleware could
potentially facilitate and accelerate breakthroughs in multimedia networking.
Postscript version
(137K)
Gzipped postscript (46K)
Acrobat version (109K)
The 7th International Workshop on Network and Operating Systems Support for Digital
Audio and Video (NOSSDAV 97), St. Louis, Missouri, May 19-21, 1997
To appear in ACM Multimedia 97
- Robert Szewczyk , Andras Ferencz , Henry Andrews , Brian C. Smith
Abstract
We present a new technique for morphing two video sequences. Our approach extends still
image metamorphosis techniques to video by performing motion tracking on the objects.
Besides reducing the amount of user input required to morph two sequences by an order of
magnitude, the additional motion information helps us to segment the image into foreground
and background parts. By morphing these parts independently and overlaying the results,
output quality is improved. We compare our approach to conventional motion image morphing
techniques in terms of the quality of the output image and the human input required.
HTML version
(45K + 893K of images)
Postscript
version (2.1M)
Gzipped
postscript version (1.2M)
Word 97 version
(1.1M)
Acrobat version
(815K)
Multimedia Computing and Networking 1998
- Soam Acharya, Brian C. Smith
Abstract
The design of file systems is strongly influenced by measuring the use of
existing file systems, such file size distribution and patterns of access. We believe that
a similar characterization of video stored on the Internet will help network engineers,
codec designers, and other multimedia researchers. We therefore executed an experiment to
measure how video data is used on the Web today. In this experiment, we downloaded and
analyzed over 57000 AVI, QuickTime and MPEG files stored on the Web -- approximately 100
Gigabytes of data. Among our more interesting discoveries, we found that the most common
video technology in use today is QuickTime, and that the image resolution and frame rate
of video files that include audio are much more uniform than video-only files. The
majority of all audio/video files have dimensions of CIF or QCIF (or very similar) at 10,
12, 15, or 30 fps, whereas the dimensions and frame rates of video-only files are more
uniformly distributed. We also experimentally verified the conjecture that current
Internet bandwidth is at least an order of magnitude too slow to support streaming
playback of video. We present these results and other statistical information
characterizing video on the web in this paper.
Postscript
version (871K)
Acrobat version (139K)
Proceeding of IEEE Multimedia 1998
- Soam Acharya, Brian C. Smith
Abstract
Current compression formats optimize for either compression or editing. For example,
motion JPEG (MJPEG) provides excellent random and moderate overall compression, while MPEG
optimizes for compression at the expense of random access. Converting from one format to
another, a process called transcoding, is often desirable over the life of a video
segment. In this paper, we show how to transcode MPEG video to motion-JPEG without fully
decompressing the MPEG source. Our compressed domain transcoding technique differs from
previous work because it uses a new technique that is optimized for software
implementation and because we compare the performance of a working implementation of our
compressed domain transcoder, instead of just counting the number of multiplies needed to
transcode. Our experiments show that our compressed domain transcoder is 1.5 to 3 times
faster than an optimized spatial domain transcoder, and offers another benefit: a single
parameter can improve the speed of transcoding at the expense of the quality of the
resulting images. This speed/quality trade-off is important to many real-time
applications.
Postscript (1.1M)
Sample Images (68K)
Acrobat (282K)
CU-SeeMe VR: Immersive Desktop Teleconferencing
To appear in ACM Multimedia '96
- Jefferson Han, Brian C. Smith
Abstract
Current video-conferencing systems provide a video-in-a-window user interface. This
paper presents a video-conferencing application called CU-SeeMe VR that provides a richer
interface. CU-SeeMe VR is a distributed video-conferencing system that allows users to
connect to 3D worlds and interact with other using live video and audio embedded in a
virtual space. This paper describes a prototype implementation of CU-SeeMe VR, including
the user interface, system architecture, and a detailed look at the enabling technologies.
Future directions and metaphors for this space are discussed.
HTML version
Acrobat version (211K)
To appear in Real-Time Imaging Journal
- Brian C. Smith, Lawrence A. Rowe, July, 1996
Abstract
This paper addresses the problem of processing motion-JPEG video data in the compressed
domain. The operations covered are those where a pixel in the output image is an arbitrary
linear combination of pixels in the input image, which includes convolution, scaling,
rotation, translation, morphing, de-interlacing, image composition, and transcoding. This
paper further develops an approximation technique called condensation to improve
performance and evaluates condensations in terms of processing speed and image quality.
Using condensation, motion-JPEG video can be processed at near real-time rates on current
generation workstations.
Acrobat version (931K)
Reconnecting Science and Humanities in Digital Libraries, University
of Kentuky
Presented at ACM Multimedia
95.
- Jonathan Swartz, Brian C. Smith, November, 1995
Abstract
As common as video processing is, programmers still implement video programs as
manipulations of arrays of pixels. This paper presents a language extension called Rivl
(pronounced "rival") where video is a first class data type. Programs in Rivl
use high level operators that are independent of video resolution and format, increasing a
program's portability, simplifying code reuse, and reducing development time. This paper
also describes a Rivl interpreter and the strategies the interpreter uses to optimize Rivl
programs. These optimizations include classical programming language optimizations, such
as common subexpression elimination and out of order execution, image and video specific
optimizations, such as computing only those images that will affect the output, and an
optimized memory manager.
HTML version
Acrobat version (822K)
Presented at ACM Multimedia
95.
- Asif Ghias, Jonathan Logan, David Chamberlin, Brian C. Smith, November, 1995
Abstract
The emergence of audio and video data types in databases will require new information
retrieval methods adapted to the specific characteristics and needs of these data types.
An effective and natural way of querying a musical audio database is by humming the tune
of a song. In this paper, a system for querying an audio database by humming is described
along with a scheme for representing the melodic information in a song as relative pitch
changes. Relevant difficulties involved with tracking pitch are enumerated, along with the
approach we followed, and the performance results of system indicating its effectiveness
are presented.
HTML version
Acrobat version (82K)
Presented at the 1995 Tcl/Tk Workshop.
- Peter T. Liu, Brian Smith, Lawrence Rowe
July, 1995
Abstract
This paper describes a general purpose name server for Tcl-DP. This name server maintains
host addresses and port numbers of services running in a distributed en- vironment and
allows clients to query about them. It starts services on demand so services are
guaranteed to be available, and it provides a simple authentication protocol for better
security. The Tcl-DP name server is also designed to be fault- tolerant. Multiple backup
servers can be started on different hosts, and a failover occurs when the main server goes
down. In addition, the name server provides mechanisms to interface with external modules
for extending its functionality.
Acrobat version
(90K)
Presented at the 1993 Tcl/Tk Workshop.
Unpublished manuscript
Unpublished manuscript
Unpublished manuscript
Master's Report
[ Home | People
| Mission | Projects | Software | Links | Potpourri
| Directions ]