@kirk86

Exponentiated Gradient versus Gradient Descent for Linear Predictors

, and . Information and Computation, 132 (1): 1 - 63 (1997)
DOI: https://doi.org/10.1006/inco.1996.2612

Abstract

We consider two algorithms for on-line prediction based on a linear model. The algorithms are the well-known gradient descent (GD) algorithm and a new algorithm, which we call EG±. They both maintain a weight vector using simple updates. For the GD algorithm, the update is based on subtracting the gradient of the squared error made on a prediction. The EG±algorithm uses the components of the gradient in the exponents of factors that are used in updating the weight vector multiplicatively. We present worst-case loss bounds for EG±and compare them to previously known bounds for the GD algorithm. The bounds suggest that the losses of the algorithms are in general incomparable, but EG±has a much smaller loss if only few components of the input are relevant for the predictions. We have performed experiments which show that our worst-case upper bounds are quite tight already on simple artificial data.

Description

Exponentiated Gradient versus Gradient Descent for Linear Predictors - ScienceDirect

Links and resources

Tags

community

  • @kirk86
  • @dblp
@kirk86's tags highlighted