- About
- Events
- Calendar
- Graduation Information
- Cornell Learning Machines Seminar
- Student Colloquium
- BOOM
- Fall 2024 Colloquium
- Conway-Walker Lecture Series
- Salton 2024 Lecture Series
- Seminars / Lectures
- Big Red Hacks
- Cornell University - High School Programming Contests 2024
- Game Design Initiative
- CSMore: The Rising Sophomore Summer Program in Computer Science
- Explore CS Research
- ACSU Research Night
- Cornell Junior Theorists' Workshop 2024
- People
- Courses
- Research
- Undergraduate
- M Eng
- MS
- PhD
- Admissions
- Current Students
- Computer Science Graduate Office Hours
- Advising Guide for Research Students
- Business Card Policy
- Cornell Tech
- Curricular Practical Training
- A & B Exam Scheduling Guidelines
- Fellowship Opportunities
- Field of Computer Science Ph.D. Student Handbook
- Graduate TA Handbook
- Field A Exam Summary Form
- Graduate School Forms
- Instructor / TA Application
- Ph.D. Requirements
- Ph.D. Student Financial Support
- Special Committee Selection
- Travel Funding Opportunities
- Travel Reimbursement Guide
- The Outside Minor Requirement
- Diversity and Inclusion
- Graduation Information
- CS Graduate Minor
- Outreach Opportunities
- Parental Accommodation Policy
- Special Masters
- Student Spotlights
- Contact PhD Office
Adaptive Discretization for Decision Making in Large Continuous Spaces
Abstract: In this talk, I will present a sequence of two works that explore adaptive discretization for decision making in continuous state and action spaces. In the first work, I will present a Q-learning policy with adaptive data-driven discretization for episodic RL on continuous state-action spaces. We recover the regret guarantees of prior algorithms for continuous state-action spaces, and experiments demonstrate how our algorithm automatically adapts to the underlying structure of the problem, resulting in better performance compared both to heuristics and Q-learning with uniform discretization. In the second work, we consider the question of learning the metric amongst actions when it is unknown, for a nonparametric contextual multi-arm bandit problem. Suppose that there is a large set of actions, yet there is a simple but unknown structure amongst the actions, e.g. finite types or smooth with respect to an unknown metric space. We present an algorithm which learns data-driven similarities amongst the arms, in order to implement adaptive partitioning of the context-arm space for more efficient learning. We will discuss regret bounds and show simulations that illustrate the potential gains for learning data driven similarities.