- About
- Events
- Events
- Calendar
- Graduation Information
- Cornell Learning Machines Seminar
- Student Colloquium
- BOOM
- Spring 2025 Colloquium
- Conway-Walker Lecture Series
- Salton 2024 Lecture Series
- Seminars / Lectures
- Big Red Hacks
- Cornell University / Cornell Tech - High School Programming Workshop and Contest 2025
- Game Design Initiative
- CSMore: The Rising Sophomore Summer Program in Computer Science
- Explore CS Research
- ACSU Research Night
- Cornell Junior Theorists' Workshop 2024
- People
- Courses
- Research
- Undergraduate
- M Eng
- MS
- PhD
- PhD
- Admissions
- Current Students
- Computer Science Graduate Office Hours
- Advising Guide for Research Students
- Business Card Policy
- Cornell Tech
- Curricular Practical Training
- A & B Exam Scheduling Guidelines
- Fellowship Opportunities
- Field of Computer Science Ph.D. Student Handbook
- Graduate TA Handbook
- Field A Exam Summary Form
- Graduate School Forms
- Instructor / TA Application
- Ph.D. Requirements
- Ph.D. Student Financial Support
- Special Committee Selection
- Travel Funding Opportunities
- Travel Reimbursement Guide
- The Outside Minor Requirement
- Robotics Ph. D. prgram
- Diversity and Inclusion
- Graduation Information
- CS Graduate Minor
- Outreach Opportunities
- Parental Accommodation Policy
- Special Masters
- Student Spotlights
- Contact PhD Office
A Framework for Interactive Learning
In many settings, learning algorithms are deployed "in the wild", where their output is used before the classifier has been fully trained. In this online context, a natural model of interaction is Angluin's (1988) Equivalence Query model: the algorithm proposes an output classifier; the user responds with either the feedback that the classifier is correct, or otherwise provides the algorithm with a point that is mislabeled. The algorithm takes this feedback into account in producing a new, potentially very different, classifier, and the process repeats. Angluin (1988) and much subsequent work have shown - among others - that a classifier from a class F can be learned using at most log_2 |F| iterations.
We propose and analyze a generalization of the Equivalence Query model beyond classifiers. The goal is to learn a "model" (e.g., a classifier) from a general class: in each iteration, the algorithm proposes such a model, and is either told that the model is correct, or otherwise is shown a "local correction". The feedback may be probabilistically incorrect (with probability 1-p < 0.5). Some natural classes of "models" to learn (beyond classifiers) are permutations/rankings (useful, e.g., in web search) and clusterings of data items (e.g., for image segmentation).
Our main contribution is to exhibit a natural condition which allows the log_2 |F| bound to carry over to many domains, and to degrade very gracefully as a function of p. Specifically, we show that if there is a (undirected, weighted) graph whose nodes are the models, such that corrections are always the first edge on a shortest path to the (unknown) ground-truth model, then log_2 |F| iterations are enough, and O(log_2 |F|) queries suffice in the presence of noise. This framework recovers as special cases several classical and recent bounds for learning classifiers and clusterings, and implies new bounds for learning a permutation. The algorithm is the natural generalization of Binary Search to undirected weighted graphs, and can be extended to directed graphs under additional conditions.
This talk is based on joint work with Ehsan Emamjomah-Zadeh, from STOC 2016, NIPS 2017, and SODA 2018.