Techreport,

Comparing Value-Function Estimation Algorithms in Undiscounted Problems

, , and .
TR-99-02. Mindmaker Ltd., Budapest 1121, Konkoly Th. M. u. 29-33, Hungary, (1999)

Abstract

We compare scaling properties of several value-function estimation algorithms. In particular, we prove that Q-learning can scale exponentially slowly with the number of states. We identify the reasons of the slow convergence and show that both TD($łambda$) and learning with a fixed learning-rate enjoy rather fast convergence, just like the model-based method.

Tags

Users

  • @csaba

Comments and Reviews