copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation

Y. Wu, E. Mansimov, R. Grosse, S. Liao, and J. Ba. NIPS, page 5279-5288. (2017)

Abstract

In this work, we propose to apply trust region optimization to deep reinforcement learning using a recently proposed Kronecker-factored approximation to the curvature. We extend the framework of natural policy gradient and propose to optimize both the actor and the critic using Kronecker-factored approximate curvature (K-FAC) with trust region; hence we call our method Actor Critic using Kronecker-Factored Trust Region (ACKTR). To the best of our knowledge, this is the first scalable trust region natural gradient method for actor-critic methods. It is also a method that learns non-trivial tasks in continuous control as well as discrete control policies directly from raw pixel inputs. We tested our approach across discrete domains in Atari games as well as continuous domains in the MuJoCo environment. With the proposed methods, we are able to achieve higher rewards and a 2- to 3-fold improvement in sample efficiency on average, compared to previous state-of-the-art on-policy actor-critic methods. Code is available at this https URL

Links and resources

BibTeX key: wu2017acktr
entry type: inproceedings
booktitle: NIPS
year: 2017
pages: 5279-5288
crossref: conf/nips/2017
ee: http://papers.nips.cc/paper/7112-scalable-trust-region-method-for-deep-reinforcement-learning-using-kronecker-factored-approximation
url: http://dblp.uni-trier.de/db/conf/nips/nips2017.html#WuMGLB17

@lanteunis's tags highlighted

Cite this publication

@inproceedings{wu2017acktr, abstract = {In this work, we propose to apply trust region optimization to deep reinforcement learning using a recently proposed Kronecker-factored approximation to the curvature. We extend the framework of natural policy gradient and propose to optimize both the actor and the critic using Kronecker-factored approximate curvature (K-FAC) with trust region; hence we call our method Actor Critic using Kronecker-Factored Trust Region (ACKTR). To the best of our knowledge, this is the first scalable trust region natural gradient method for actor-critic methods. It is also a method that learns non-trivial tasks in continuous control as well as discrete control policies directly from raw pixel inputs. We tested our approach across discrete domains in Atari games as well as continuous domains in the MuJoCo environment. With the proposed methods, we are able to achieve higher rewards and a 2- to 3-fold improvement in sample efficiency on average, compared to previous state-of-the-art on-policy actor-critic methods. Code is available at this https URL}, added-at = {2019-12-16T18:30:30.000+0100}, author = {Wu, Yuhuai and Mansimov, Elman and Grosse, Roger B. and Liao, Shun and Ba, Jimmy}, biburl = {https://www.bibsonomy.org/bibtex/2e1da457c9a3a8004025dbc4730291c33/lanteunis}, booktitle = {NIPS}, crossref = {conf/nips/2017}, editor = {Guyon, Isabelle and von Luxburg, Ulrike and Bengio, Samy and Wallach, Hanna M. and Fergus, Rob and Vishwanathan, S. V. N. and Garnett, Roman}, ee = {http://papers.nips.cc/paper/7112-scalable-trust-region-method-for-deep-reinforcement-learning-using-kronecker-factored-approximation}, interhash = {a90966c72041a33ba6f8a3b1bf6b4468}, intrahash = {e1da457c9a3a8004025dbc4730291c33}, keywords = {DRLAlgoComparison acktr reinforcement_learning}, pages = {5279-5288}, timestamp = {2019-12-29T16:29:37.000+0100}, title = {Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation}, url = {http://dblp.uni-trier.de/db/conf/nips/nips2017.html#WuMGLB17}, year = 2017 }

BibSonomy

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation

Abstract

Links and resources

Tags

community

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews
(0)

BibSonomy

copydeleteadd this publication to your clipboardcommunity posthistory of this postURLDOIBibTeXEndNoteAPAChicagoDIN 1505HarvardMSOffice XML Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation

Abstract

Links and resources

Tags

community

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews (0)

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation

Comments and Reviews
(0)