This program aims to reunite researchers across disciplines that have played a role in developing the theory of reinforcement learning. It will review past developments and identify promising directions of research, with an emphasis on addressing existing open problems, ranging from the design of efficient, scalable algorithms for exploration to how to control learning and planning. It also aims to deepen the understanding of model-free vs. model-based learning and control, and the design of efficient methods to exploit structure and adapt to easier environments.
- Aug. 31 – Sep. 4, 2020
- Csaba Szepesvari (University of Alberta, Google DeepMind; chair), Emma Brunskill (Stanford University), Sébastien Bubeck (MSR), Alan Malek (DeepMind), Sean Meyn (University of Florida), Ambuj Tewari (University of Michigan), Mengdi Wang (Princeton)
- Sep. 28 – Oct. 2, 2020
- Lihong Li (Google Brain; chair), Marc G. Bellemare (Google Brain)
- The success of deep neural networks in modeling complicated functions has recently been applied by the reinforcement learning community, resulting in algorithms that are able to learn in environments previously thought to be much too large. Successful applications span domains from robotics to health care. However, the success is not well understood from a theoretical perspective. What are the modeling choices necessary for good performance, and how does the flexibility of deep neural nets help learning? This workshop will connect practitioners to theoreticians with the goal of understanding the most impactful modeling decisions and the properties of deep neural networks that make them so successful. Specifically, we will study the ability of deep neural nets to approximate in the context of reinforcement learning.
A paper by DeepMind scientist triggered much debate about the path to artificial intelligence. Here, we'll try to draw the line between theory and practice.
A research team from McGill University, Université de Montréal, DeepMind and Mila presents an end-to-end, model-based deep reinforcement learning (RL) agent that dynamically attends to relevant parts of its environments to facilitate out-of-distribution (OOD) and systematic generalization.
In this article, we will try to understand where On-Policy learning, Off-policy learning and offline learning algorithms fundamentally differ. Though there is a fair amount of intimidating jargon in…
This is a PyTorch implementation/tutorial of Deep Q Networks (DQN) from paper Playing Atari with Deep Reinforcement Learning. This includes dueling network architecture, a prioritized replay buffer and double-Q-network training.
In Q-Learning, we represent the Q-value as a table. However, in many real-world problems, there are enormous state and/or action spaces and tabular representation is insufficient. For instance…
When the agent interacts with the environment, the sequence of experienced tuples can be highly correlated. The naive Q-Learning algorithm that learns from each of these experience tuples in…
Hi Geeks, welcome to Part-3 of our Reinforcement Learning Series. In the last two blogs, we covered some basic concepts in RL and also studied the multi-armed bandit problem and its solution methods…
One of the key enablers of the ChatGPT magic can be traced back to 2017 under the obscure name of reinforcement learning with human feedback(RLHF).
Large language models(LLMs) have become one of the most interesting environments for applying modern reinforcement learning(RL) techniques. While LLMs are great at deriving knowledge from vast amounts of text, RL can help to translate that knowledge into actions. That has been the secret behind RLHF.
A. Slivkins. (2019)cite arxiv:1904.07272Comment: The manuscript is complete, but comments are very welcome! To be published with Foundations and Trends in Machine Learning.
K. Lee, K. Lee, J. Shin, and H. Lee. (2019)cite arxiv:1910.05396Comment: Accepted in ICLR 2020 and NeurIPS Workshop on Deep RL 2019 / First two authors are equally contributed.
C. Lyle, P. Castro, and M. Bellemare. (2019)cite arxiv:1901.11084Comment: To appear in the Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence.