Author of the publication

Efficient Bias-Span-Constrained Exploration-Exploitation in Reinforcement Learning.

, , , and . ICML, volume 80 of Proceedings of Machine Learning Research, page 1573-1581. PMLR, (2018)

Please choose a person to relate this publication to

To differ between persons with the same name, the academic degree and the title of an important publication will be displayed. You can also use the button next to the name to display some publications already assigned to the person.

 

Other publications of authors with the same name

Regret bounds for restless Markov bandits., , , and . Theor. Comput. Sci., (2014)Online Regret Bounds for Markov Decision Processes with Deterministic Transitions.. ALT, volume 5254 of Lecture Notes in Computer Science, page 123-137. Springer, (2008)Exploiting Similarity Information in Reinforcement Learning - Similarity Models for Multi-Armed Bandits and MDPs.. ICAART (1), page 203-210. INSTICC Press, (2010)Variational Regret Bounds for Reinforcement Learning., , and . UAI, volume 115 of Proceedings of Machine Learning Research, page 81-90. AUAI Press, (2019)Regret Bounds for Learning State Representations in Reinforcement Learning., , , , and . NeurIPS, page 12717-12727. (2019)Improved Rates for the Stochastic Continuum-Armed Bandit Problem., , and . COLT, volume 4539 of Lecture Notes in Computer Science, page 454-468. Springer, (2007)Pseudometrics for State Aggregation in Average Reward Markov Decision Processes.. ALT, volume 4754 of Lecture Notes in Computer Science, page 373-387. Springer, (2007)Online Regret Bounds for Undiscounted Continuous Reinforcement Learning, and . CoRR, (2013)Pareto Front Identification from Stochastic Bandit Feedback., , , and . AISTATS, volume 51 of JMLR Workshop and Conference Proceedings, page 939-947. JMLR.org, (2016)Achieving Optimal Dynamic Regret for Non-stationary Bandits without Prior Information., , , , , , and . COLT, volume 99 of Proceedings of Machine Learning Research, page 159-163. PMLR, (2019)