On the Limits of Function Approximation in Large-Scale MDP Planning and Reinforcement Learning

Abstract:
At the dawn of the computer age in the 1960s, Bellman and his collaborators found it beneficial to use what is now called linear function approximation to address certain multistage stochastic planning problems. Their approach was straightforward: use linear value function approximation to avoid state-space discretization, thereby maintaining polynomial-time computation while also controlling accuracy. However, the question of when and how this approach is feasible has eluded researchers for over 50 years, even as the prospect of using function approximation to overcome the curse of dimensionality has continued to fuel much of the excitement around reinforcement learning. Early results focused on connecting the approximation spaces with the structure of the underlying problem, and some indicated that it might not be enough for the target function (such as the optimal value function) to simply lie within this space. As it turns out, the emerging picture of when function approximation can help in multistage problems is intricate. In this talk, we will explore recent results, primarily from my group, that contribute to this complex understanding. I will conclude with an outlook on current research directions.

Bio:
Csaba Szepesvári (Ph.D. '99) is the team lead of DeepMind's Foundation team, in addition to serving as a Professor of Computing Science of the University of Alberta, and as a Principal Investigator of the Alberta Machine Intelligence Institute. Prof. Szepesvári is best known for his theoretical work on reinforcement learning, and as the co-inventor of a Monte-Carlo tree search method "UCT", which inspired much work in the AI community. Prof. Szepesvári has published three books, one in control theory, one about reinforcement learning, and the most recent specializing to the theory of bandit algorithms. Prof. Szepesvári currently serves on the editorial board of the Journal of Machine Learning Research, the Mathematics of Operations Research, and the Foundations and Trends in Machine Learning. He was a co-chair for ICML in 2022, while earlier he co-chaired COLT and ALT. Since 2021, Prof. Szepesvári enthusiastically co-organizes and co-hosts the weekly "Reinforcement Learning Theory" virtual seminar that is aimed at everyone in the world who want to learn about the latest advances in this fast moving field.