Buenos Aires Machine Learning Summer School 2018
Optimization for Machine Learning
We survey the optimization viewpoint to machine learning and the basic methods that are used to train learning machines. We describe stochastic and online gradient descent, as well as recent improvements: adaptive regularization, acceleration and variance reduction. We survey the state of the art in second order stochastic optimization, conditional gradient algoithm and provable guarantees non-convex continuous optimization.
Textbooks and surveys:
Convex Optimization Boyd and Vandenberghe
Introductory Lectures on Convex Programming Y. Nesterov
Convex Optimization: Algorithms and
Complexity S. Bubeck
Introduction to Online Convex Optimization E. Hazan
Online Learning and Online Convex Optimization S. Shalev-Shwartz
The Multiplicative Weights Update Method: a Meta-Algorithm and Applications S. Arora, E. Hazan and S. Kale
- M. Schmidt, N. Le Roux, F. Bach. "Minimizing Finite Sums with the Stochastic Average Gradient"
- R. Johnson, T. Zhang. "Stochastic Gradient Descent with Variance Reduction"
- S. Shalev-Shwartz, T. Zhang. "Stochastic Dual Coordinate Ascent Methods for Regularized Loss
- M. Mahdavi, L. Zhang, R. Jin . "Mixed Optimization for Smooth Functions"
Online gradient descent, logarithmic regret and applications to soft-margin SVM:
- Martin Zinkevich. "Online Convex Programming and Generalized Infinitesimal Gradient Ascent." ICML 2003: 928-936
- Elad Hazan, Amit Agarwal, Satyen Kale. "Logarithmic regret algorithms for online convex optimization." Machine Learning 69(2-3): 169-192 (2007)
- Shai Shalev-Shwartz, Yoram Singer, Nathan Srebro, Andrew Cotter. "Pegasos: primal estimated sub-gradient solver for SVM." Math. Program. 127(1): 3-30 (2011)
Stochastic Second Order Methods
- Murat A. Erdogdu, Andrea Montanari. "Convergence rates of sub-sampled Newton methods
- Naman Agarwal, Brian Bullins, Elad Hazan. "Second Order Stochastic Optimization in Linear Time"
Non-convex optimization for ML
- Rong Ge, Furong Huang, Chi Jin and Yang Yuan. "Escaping From Saddle Points --- Online Stochastic Gradient for Tensor Decomposition". COLT 2015.
- Naman Agarwal, Zeyuan Allen-Zhu, Brian Bullins, Elad Hazan, Tengyu Ma. "Finding Approximate Local Minima for Nonconvex Optimization in Linear Time"
- Yair Carmon, John Duchi, Oliver Hinder, Aaron Sidford. "Accelerated Methods for Non-Convex Optimization"
- Shai Shalev-Shwartz, Yoram Singer. "A primal-dual perspective of online learning algorithms." Machine Learning 69(2-3): 115-142 (2007)
- Jacob Abernethy, Elad Hazan, Alexander Rakhlin. "Interior-Point Methods for Full-Information and Bandit Online Learning." IEEE Trans. Information Theory 58(7): 4164-4175 (2012)
- Elad Hazan, Satyen Kale. "Extracting certainty from uncertainty: regret bounded by variation in costs." Machine Learning 80(2-3): 165-188 (2010)
- H. Brendan McMahan. "Follow-the-Regularized-Leader and Mirror Descent: Equivalence Theorems and L1 Regularization." AISTATS 2011: 525-533
Adaptive regularization and AdaGrad
- John C. Duchi, Elad Hazan, Yoram Singer. "Adaptive Subgradient Methods for Online Learning and Stochastic Optimization." Journal of Machine Learning Research 12: 2121-2159 (2011)
- H. Brendan McMahan, Matthew J. Streeter. "Adaptive Bound Optimization for Online Convex Optimization." COLT 2010: 244-256
- Naman Agarwal, Brian Bullins, Xinyi Chen, Elad Hazan, Karan Singh, Cyril Zhang, Yi Zhang.
"The Case for Full-Matrix Adaptive Regularization."" arXiv:1806.02958
- Marguerite Frank, Philip Wolfe. "An algorithm for quadratic programming." Naval Research Logistics Quarterly 3: 95. (1956)
- Elad Hazan, Satyen Kale. "Projection-free Online Learning." ICML 2012
- Dan Garber, Elad Hazan. "Playing Non-linear Games with Linear Oracles." FOCS 2013: 420-428
- Yossi Arjevani, Ohad Shamir. "Oracle Complexity of Second-Order Methods for Finite-Sum Problems"
- Blake Woodworth, Nathan Srebro. "Tight Complexity Bounds for Optimizing Composite Objectives"
- Jacob Abernethy, Peter L. Bartlett, Alexander Rakhlin, and Ambuj Tewari. "Optimal strategies and minimax lower bounds for online convex games." In Proceedings of the 21st Annual Conference on Learning Theory (COLT 2008), pages 415-423
- Elad Hazan, Tomer Koren. "The computational power of optimization in online learning." STOC 2016: 128-141
- Naman Agarwal, Elad Hazan. "Lower Bounds for Higher-Order Convex Optimization." COLT 2018.