PIMS-UWashington Mathematics Colloquium: Courtney Paquette
Topic
Hitting the High-D(imensional) Notes: Stochastic Gradient Descent Learning Dynamics
Speakers
Details
Modern machine learning (ML) applications grapple with the challenges posed by high-dimensional datasets and high-dimensional parameter spaces. Stochastic gradient descent (SGD) and its variants have emerged as the go-to algorithms for optimizing in this expansive space. However, the classical analysis of algorithms, rooted in low-dimensional geometry, often leads to misconceptions in the realm of modern ML. In this talk, we delve into the intricacies of high-dimensional phenomena in machine learning optimization landscapes. The talk begins with an exploration of typical stochastic algorithms and emphasizes the crucial role that high-dimensionality plays in shaping their behavior. Drawing on tools from high-dimensional probability, stochastic differential equations (SDEs), random matrix theory, and optimization, we present a framework for analyzing the dynamics of SGD in scenarios where both the number of samples and parameters are large. The resulting limiting dynamics are governed by an ordinary differential equation (ODE), providing valuable insights into algorithmic choices such as hyper-parameters. We demonstrate the applicability of this framework on real datasets, highlighting its alignment with actual performance.