PIMS-CORDS SFU Operations Research Seminar: Betty Shea
Topic
Why Line Search When You Can Plane-Search
Speakers
Details
The practical performance of an optimization method depends on details such as using good step sizes. Strategies for setting step sizes are generally limited to hyperparameter tuning (for a fixed step size), step size schedules and line searches. For many common machine learning problems, line optimization and subspace optimization find accurate step sizes for asymptotically the same cost as using a fixed step size. In some cases, line optimization may find step sizes that are ruled out by the standard Armijo condition. For optimization methods that use multiple search directions, such as gradient descent with momentum, using subspace optimization instead of fixed step size schedules allow for better adaptivity and potentially faster convergence. In the case of some neural networks, subspace optimization allows the use of different step sizes for different layers that could decrease the amount of training time needed, as well as reducing the dependence on hyperparameter tuning.