Abstract
We give nearly matching upper and lower bounds on the oracle complexity of
finding $\epsilon$-stationary points ($\| F(x) \| łeq\epsilon$) in
stochastic convex optimization. We jointly analyze the oracle complexity in
both the local stochastic oracle model and the global oracle (or, statistical
learning) model. This allows us to decompose the complexity of finding
near-stationary points into optimization complexity and sample complexity, and
reveals some surprising differences between the complexity of stochastic
optimization versus learning. Notably, we show that in the global
oracle/statistical learning model, only logarithmic dependence on smoothness is
required to find a near-stationary point, whereas polynomial dependence on
smoothness is necessary in the local stochastic oracle model. In other words,
the separation in complexity between the two models can be exponential, and
that the folklore understanding that smoothness is required to find stationary
points is only weakly true for statistical learning.
Our upper bounds are based on extensions of a recent "recursive
regularization" technique proposed by Allen-Zhu (2018). We show how to extend
the technique to achieve near-optimal rates, and in particular show how to
leverage the extra information available in the global oracle model. Our
algorithm for the global model can be implemented efficiently through finite
sum methods, and suggests an interesting new computational-statistical
tradeoff.
Users
Please
log in to take part in the discussion (add own reviews or comments).