Inproceedings,

Universal Option Models

H. Yao, {. Szepesvári, R. Sutton, J. Modayil, and S. Bhatnagar.
NIPS, page 990--998. (September 2014)

Abstract

We consider the problem of learning models of options for real-time abstract planning, in the setting where reward functions can be specified at any time and their expected returns must be efficiently computed. We introduce a new model for an option that is independent of any reward function, called the universal option model (UOM). We prove that the UOM of an option can construct a traditional option model given a reward function, and the option-conditional return is computed directly by a single dot-product of the UOM with the reward function. We extend the UOM to linear function approximation, and we show it gives the TD solution of option returns and value functions of policies over options. We provide a stochastic approximation algorithm for incrementally learning UOMs from data and prove its consistency. We demonstrate our method in two domains. The first domain is document recommendation, where each user query defines a new reward function and a document's relevance is the expected return of a simulated random-walk through the document's references. The second domain is a real-time strategy game, where the controller must select the best game unit to accomplish dynamically-specified tasks. Our experiments show that UOMs are substantially more efficient in evaluating option returns and policies than previously known methods.

BibTeX key: YaoSzeSuMoBha14
entry type: inproceedings
booktitle: NIPS
year: 2014
month: September
pages: 990--998
pdf: papers/lamapi.pdf
date-modified: 2015-08-02 00:49:25 +0000
date-added: 2014-09-09 00:10:25 -0600

Users

Comments and Reviewsshow / hide

Please log in to take part in the discussion (add own reviews or comments).

Cite this publication

@inproceedings{YaoSzeSuMoBha14, abstract = {We consider the problem of learning models of options for real-time abstract planning, in the setting where reward functions can be specified at any time and their expected returns must be efficiently computed. We introduce a new model for an option that is independent of any reward function, called the {\it universal option model (UOM)}. We prove that the UOM of an option can construct a traditional option model given a reward function, and the option-conditional return is computed directly by a single dot-product of the UOM with the reward function. We extend the UOM to linear function approximation, and we show it gives the TD solution of option returns and value functions of policies over options. We provide a stochastic approximation algorithm for incrementally learning UOMs from data and prove its consistency. We demonstrate our method in two domains. The first domain is document recommendation, where each user query defines a new reward function and a document's relevance is the expected return of a simulated random-walk through the document's references. The second domain is a real-time strategy game, where the controller must select the best game unit to accomplish dynamically-specified tasks. Our experiments show that UOMs are substantially more efficient in evaluating option returns and policies than previously known methods.}, added-at = {2020-03-17T03:03:01.000+0100}, author = {Yao, H. and Szepesv{\'a}ri, {Cs}. and Sutton, R.S. and Modayil, J. and Bhatnagar, S.}, biburl = {https://www.bibsonomy.org/bibtex/298c952f57c6fd58f75d71f4146a6a741/csaba}, booktitle = {NIPS}, date-added = {2014-09-09 00:10:25 -0600}, date-modified = {2015-08-02 00:49:25 +0000}, interhash = {2e6599ca25953ef6b99fd8906c13529e}, intrahash = {98c952f57c6fd58f75d71f4146a6a741}, keywords = {Decision LSTD Markov Processes,function approximation, control control, difference learning, planning, reinforcement temporal}, month = {September}, pages = {990--998}, pdf = {papers/lamapi.pdf}, timestamp = {2020-03-17T03:03:01.000+0100}, title = {Universal Option Models}, year = 2014 }

BibSonomy

Universal Option Models

Abstract

Tags

Users

Comments and Reviewsshow / hide

Cite this publication

More citation styles

search on