Finite Time Bounds for Sampling Based Fitted Value Iteration

Abstract

In this paper we consider sampling based fitted value iteration for discounted, large (possibly infinite) state space, finite action Markovian Decision Problems where only a generative model of the transition probabilities and rewards is available. At each step the image of the current estimate of the optimal value function under a Monte-Carlo approximation to the Bellman-operator is projected onto some function space. PAC-style bounds on the weighted $L^p$-norm approximation error are obtained as a function of the covering number and the approximation power of the function space, the iteration number and the sample size.

BibTeX key: munos2005
entry type: inproceedings
booktitle: ICML
year: 2005
pages: 881---886
presentation: talks/icml2005_talk.pdf
pdf: papers/savi_icml2005.pdf
date-modified: 2010-11-25 00:49:47 -0700
date-added: 2010-08-28 17:38:14 -0600

BibSonomy

Finite Time Bounds for Sampling Based Fitted Value Iteration

Abstract

Tags

Users

Comments and Reviewsshow / hide

Cite this publication

More citation styles

search on