Bruce Randall Donald
Associate Professor
brd@cs.cornell.edu

Graphics/Multimedia Research and Physical Geometric Algorithms

This document outlines a vision for Graphics/Multimedia research, and in particular tries to define what the science base for this research would be. The latter, I believe is necessary, in order to prescribe the role that Multimedia should have in a Computer Science department. In particular, the impact of Multimedia on communication and society in the future is, by itself, not sufficient justification for organizing this activity in CS. To see this, consider the analogy of television. While television reaches an enormous number of people, we do not have professors of television. Instead, we must argue that there is something about Multimedia which especially suits our algorithmic and software design methodologies. In particular, I claim that Multimedia research provides a wealth of geometric and algorithmic problems in the domain of physical geometric algorithms, which is our primary research area.

This research agenda, while broad, is targeted to a particular subset of Multimedia, and naturally many interesting topics exist outside this framework.

What is to be Done?

Our research in Multimedia concentrates in several areas:

Authoring tools. Multimedia content can currently be played by millions but authored by few. We are working to develop authoring tools, particularly for animation, that will greatly expand the authoring population.
Direct manipulation. Multimedia content should be editable and extensible through intuitive user interfaces, such as directly tugging on a graphic or animation through a graphical input device.
Haptics. It should be possible to provide haptic interfaces for direct manipulation of computer graphics, and to manipulate/navigate through animations, sound spaces, the Web, and authoring environments. Haptics should be explored both as a novel input device, and as a method for expressing content (shape, texture, and design choices).
Interoperability. Multimedia content should be interoperable: it should be possible, for example, to define clip-characters and clip-motions, and paste one onto another. Characters and motions should be reusable and editable.
Seamless interfaces. Multimedia systems should provide a seamless, orthogonal interface to graphics, animation, audio, haptics, and virtual reality.
Active objects. The user should experience Multimedia content using active objects, including servoable cameras and sensors, which can also take capture data from the user to influence the Multimedia presentation (e.g. to control an avatar). Multimedia playback should be an immersive, interactive experience, in which a virtual reality system can manipulate the user's physical environment to generate realistic effects.

Previous Work: Progress So Far

At Cornell University, our initial efforts (joint work with Jed Lengyel, Mark Reichert, and Don Greenberg) in this area have concerned model-based geometric algorithms for animation and Multimedia. We first investigated methods for generating animations from very high-level scripts. The method is applicable whenever the script has a geometric encoding. We developed a method for automatically rendering animation sequences of moving objects without keyframes. The user specifies the start and goal for each object and the motions (which may necessarily be quite complex) are synthesized automatically. We use configuration-space (motion-planning) algorithms to quickly synthesize object motions subject to kinematic constraints. Motion synthesis for multiple moving objects is possible, and the action may be synchronized to music. This work, together with a video entitled ``Enchanted Furniture,'' was presented at SIGGRAPH.

This research represents the first attempt to employ motion-planning algorithms for animation. Since our paper, this has become a fairly visible area in animation and computer graphics. Several startup companies, including Jean-Claude Latombe's The Motion Factory, Inc., base their core technologies on these concepts; there is also significant activity in this area at some larger research labs (probably, Microsoft Research).

In joint work with Amy Briggs, we have also pursued algorithms for automatic camera control, with applications to the generation of Multimedia content, and to teleconferencing. We focused on the problem of controlling a group of cameras that record an event (eg., a lecture or a meeting) so as to obtain a video stream documenting the event. One problem in automatic generation of video for teleconferencing is camera placement and control. We have developed some tools for task-directed and visually-cued control of camera motions.

Task-directed camera control is specified as "Given a geometric region we wish to monitor, move the camera to observe all activity in the region." The system takes into account occlusion relationships to guarantee partial or total visibility of the surveillance area.

Visually-Cued camera control waits until a particular visual event is observed before switching camera control strategies. For example, one can program a camera to wait until a particular speaker comes through a doorway, and then to intercept and follow (track) the target. In particular, the event may a cue non-visual control strategy (for example, a physical motion, virtual motion, or a pure computation).

We have developed algorithms that take steps towards solving these problems, and that provide a framework for posing and analyzing such control strategies. Our three papers with Amy Briggs describe a system we built to demonstrate these concepts, and a video of the system working. This work invites discussion on online vs. offline approaches to automatic video editing and control. The last arises when considering the difference between online transmission of a lecture vs. offline editing for an archival copy.

I spent 1994-7 in Silicon Valley at Interval Research Corporation working with Tom Ngo to develop advanced pre-competitive technologies in graphics, animation and multimedia. We hope to collaborate with Interval in the future as well, on some of the projects below.

Current Projects: Research Topics

This section describes a cluster of Multimedia research topics I'd like to explore in the future, building on the work described above.

The goal of this research would focus, broadly, on reusable, recyclable, interoperable, editable, extensible multimedia authoring and playback tools, specifically computer graphics and image content. This research would focus on the following topics:

Converting capture data from humans (e.g., still images, video) into computer graphics format (splines, control points, texture maps) with auto-correspondence, to permit direct manipulation (editing), morphing, and interpolation. We are working in collaboration with Professor Ramin Zabih in the Cornell Robotics and Vision Laboratory.
- Human capture data can be used to drive animations with phenomenonally good results. The resulting motion is often far more realistic than that obtained by key-framing or using physically-based simulation. Here is an example:
  - Biometrics Mpeg (video) . Video of one guy walking into a line of 3 people. Video footage of people instrumented with calibrated sensors. The sensor data is processed through a Kalman filter to recover the motion. Biometrically-correct physical models of humans are used. The motion is then used to animate a computer-graphics rendering:
  - Biometrics Mpeg (computer graphics reconstruction) of the scene above.
  Not only is the Kalman filter used to recover the motion, but the motion estimation is used to refine the biometric model.
  Biometrics is a company based in Santa Clara, CA.
- We are also exploring the use of machine vision techniques for obtaining a model of human body motion. Here is a paper by a student who worked with me and Professor Ramin Zabih in the Cornell Robotics and Vision Laboratory:
  - Fast Human Body Tracking Using 2D Deformable Templates by George Hoffman.

Haptic interfaces to animation and video. Novel methods for direct manipulation of animations, using haptic interfaces. Using haptics to manipulate, edit, and author animations. Editing/morphing of bundles of trajectories.

Real-time, parametric X, where X = graphics, 3d graphics, texture, morphing...

Design of high-dimensional splines, shapes, volumes, surfaces, and kinematic maps.

Topological data-structures for polyhedral, simplicial, and CW complexes, particularly in high dimensions. Tools for authoring, editing, visualization, topological verification, and computation of geometric and topological properties.

References