Randomness-Aware Testing of Machine Learning-based Systems

Talk title: Randomness-Aware Testing of Machine Learning-based Systems
Abstract: Machine Learning is rapidly revolutionizing the way modern-day systems are developed. However, testing Machine Learning-based systems is challenging due to 1) the presence of non-determinism, both internal (e.g., stochastic algorithms) and external (e.g., execution environment), and 2) the absence of well-defined accuracy specifications. Most traditional software testing techniques widely used today cannot tackle these challenges because they often assume determinism and require a precise test oracle.
In this talk, I will present my work on automated testing of Machine Learning-based systems and on improving developer-written tests in such systems. To achieve these goals, I develop principled techniques that build on solid mathematical foundations from probability theory and statistics to reason about the underlying non-determinism and accuracy. To date, my research has exposed more than 50 bugs and improved the quality of more than 200 tests in over 60 popular open-source ML libraries, many of which are widely used at companies like Google, Meta, Microsoft, and Uber as well as in many academic and scientific communities.
Finally, I will also briefly talk about my recent research on leveraging Large Language Models to solve software engineering tasks, such as detecting security bugs via static analysis.
Bio:
Saikat Dutta is a tenure-track Assistant Professor at Cornell University. His research lies at the intersection of software engineering and machine learning. In particular, his research focuses on developing novel testing techniques and tools to improve the reliability of machine learning-based systems and leveraging machine learning to solve challenging tasks in software engineering. His research has been recognized by several awards including the Facebook PhD Fellowship, the 3M Foundation Fellowship, and the Mavis Future Faculty Fellowship. Saikat received his PhD in Computer Science from UIUC in 2023 and spent a year as a postdoc at the University of Pennsylvania before joining Cornell.