- About
- Events
- Calendar
- Graduation Information
- Cornell Learning Machines Seminar
- Student Colloquium
- BOOM
- Fall 2024 Colloquium
- Conway-Walker Lecture Series
- Salton 2024 Lecture Series
- Seminars / Lectures
- Big Red Hacks
- Cornell University - High School Programming Contests 2024
- Game Design Initiative
- CSMore: The Rising Sophomore Summer Program in Computer Science
- Explore CS Research
- ACSU Research Night
- Cornell Junior Theorists' Workshop 2024
- People
- Courses
- Research
- Undergraduate
- M Eng
- MS
- PhD
- Admissions
- Current Students
- Computer Science Graduate Office Hours
- Advising Guide for Research Students
- Business Card Policy
- Cornell Tech
- Curricular Practical Training
- A & B Exam Scheduling Guidelines
- Fellowship Opportunities
- Field of Computer Science Ph.D. Student Handbook
- Graduate TA Handbook
- Field A Exam Summary Form
- Graduate School Forms
- Instructor / TA Application
- Ph.D. Requirements
- Ph.D. Student Financial Support
- Special Committee Selection
- Travel Funding Opportunities
- Travel Reimbursement Guide
- The Outside Minor Requirement
- Diversity and Inclusion
- Graduation Information
- CS Graduate Minor
- Outreach Opportunities
- Parental Accommodation Policy
- Special Masters
- Student Spotlights
- Contact PhD Office
Title: On the Untapped Potential of 3D (Foundation) Models
Abstract: The recent wave of generative AI has led to unprecedented success in various fields, such as natural language processing (NLP) and 2D computer vision. In comparison, advancements in 3D understanding and generation, while still noteworthy, have been relatively modest. For instance, most 3D generative models focus primarily on objects rather than scenes, and existing multimodal large lanugage models (LLMs) still struggle with spatial understanding.
In this talk, I will argue that unlocking the full potential of 3D foundation models hinges on sourcing the right data and developing a principled approach to evaluation. Specifically, I will advocate for the use of 360-degree videos, as opposed to traditional videos, as the preferred data source for next-generation 3D models. I will demonstrate how we extract scalable and diverse data from these videos, enabling, for the first time, the synthesis of large-scale real-world 3D scenes and the reconstruction of their geometry from a single image. Next, I will present our recent efforts in evaluating 3D generative models and multimodal LLMs, discussing how these findings inform the future design of 3D models. Finally, I will showcase how we distill knowledge from multimodal LLMs into existing 3D systems, making them interactable, actionable, and thus suitable for physical intelligence.
Bio: Wei-Chiu Ma is an Assistant Professor at Cornell University. His research lies in the intersection of computer vision and robotics, with a focus on in-the-wild 3D modeling and simulation and their applications to autonomous systems. Wei-Chiu is a recipient of the Siebel Scholarship and was selected as a rising star in Cyber Physical Systems. His work has been covered by media outlets such as WIRED, DeepLearning.AI, MIT News, etc. Previously, Wei-Chiu was a Sr. Research Scientist at UberATG and Waabi, where he served as the technical lead of the sensor simulation team. His contribution to autonomy and simulation have led to 15+ patents. He received his Ph.D. in EECS from MIT and his M.S. in Robotics from CMU.