UWashington Distinguished Seminar in Optimization and Data: Lin Xiao
Topic
Non-negative Gauss-Newton Methods for Empirical Risk Minimization
Speakers
Details
We consider the problem of minimizing the average of a large number of smooth but possibly non-convex functions. For machine learning applications, the loss functions are mostly non-negative and thus can be written as the composition of the square and their real-valued square roots. With such a simple reformulation, we can apply the Gauss-Newton method, or the Levenberg-Marquardt method with an extra quadratic regularization. We show that the resulting algorithms are highly adaptive and can automatically warm up and decay the effective step size while tracking the loss landscape. We provide convergence analysis of the methods in convex, non-convex and stochastic settings. Both the convergence rates and empirical evaluations compare favorably to the classical (stochastic) gradient method. This is joint work with Antonio Orvieto.
Additional Information
A livestream option is available.