Zusammenfassung
Variance reduction (VR) techniques for convergence rate acceleration of
stochastic gradient descent (SGD) algorithm have been developed with great
efforts recently. VR's two variants, stochastic variance-reduced-gradient
(SVRG-SGD) and importance sampling (IS-SGD) have achieved remarkable
progresses. Meanwhile, asynchronous SGD (ASGD) is becoming more critical due to
the ever-increasing scale of the optimization problems. The application of VR
in ASGD to accelerate its convergence rate has therefore attracted much
interest and SVRG-ASGDs are therefore proposed. However, we found that SVRG
suffers dissatisfying performance in accelerating ASGD when the datasets are
sparse and large-scale. In such case, SVRG-ASGD's iterative computation cost is
magnitudes higher than ASGD which makes it very slow. On the other hand, IS
achieves improved convergence rate with few extra computation cost and is
invariant to the sparsity of dataset. This advantage makes it very suitable for
the acceleration of ASGD for large-scale sparse datasets. In this paper we
propose a novel IS-combined ASGD for effective convergence rate acceleration,
namely, IS-ASGD. We theoretically prove the superior convergence bound of
IS-ASGD. Experimental results also demonstrate our statements.
Nutzer