Inproceedings,

Finite Time Bounds for Sampling Based Fitted Value Iteration

, and .
ICML, page 881---886. (2005)

Abstract

In this paper we consider sampling based fitted value iteration for discounted, large (possibly infinite) state space, finite action Markovian Decision Problems where only a generative model of the transition probabilities and rewards is available. At each step the image of the current estimate of the optimal value function under a Monte-Carlo approximation to the Bellman-operator is projected onto some function space. PAC-style bounds on the weighted $L^p$-norm approximation error are obtained as a function of the covering number and the approximation power of the function space, the iteration number and the sample size.

Tags

Users

  • @csaba

Comments and Reviews