In Model-based Reinforcement Learning, Generative And Temporal Models Of Environments Can Be Leveraged To Boost Agent Performance, Either By Tuning The Agent's Representations During Training Or Via Use As Part Of An Explicit Planning Mechanism. However, Their Application In Practice Has Been Limited To Simplistic Environments, Due To The Difficulty Of Training Such Models In Larger, Potentially Partially-observed And 3d Environments. In This Work We Introduce A Novel Action-conditioned Generative Model Of Such Challenging Environments. The Model Features A Non-parametric Spatial Memory System In Which We Store Learned, Disentangled Representations Of The Environment. Low-dimensional Spatial Updates Are Computed Using A State-space Model That Makes Use Of Knowledge On The Prior Dynamics Of The Moving Agent, And High-dimensional Visual Observations Are Modelled With A Variational Auto-encoder. The Result Is A Scalable Architecture Capable Of Performing Coherent Predictions Over Hundreds Of Time Steps Across A Range Of Partially Observed 2d And 3d Environments.
The Score Function Estimator Is Widely Used For Estimating Gradients Of Stochastic Objectives In Stochastic Computation Graphs (scg), Eg. In Reinforcement Learning And Meta-learning. While Deriving The First-order Gradient Estimators By Differentiating A Surrogate Loss (sl) Objective Is Computationally And Conceptually Simple, Using The Same Approach For Higher-order Gradients Is More Challenging. Firstly, Analytically Deriving And Implementing Such Estimators Is Laborious And Not Compliant With Automatic Differentiation. Secondly, Repeatedly Applying Sl To Construct New Objectives For Each Order Gradient Involves Increasingly Cumbersome Graph Manipulations. Lastly, To Match The First-order Gradient Under Differentiation, Sl Treats Part Of The Cost As A Fixed Sample, Which We Show Leads To Missing And Wrong Terms For Higher-order Gradient Estimators. To Address All These Shortcomings In A Unified Way, We Introduce Dice, Which Provides A Single Objective That Can Be Differentiated Repeatedly, Generating Correct Gradient Estimators Of Any Order In Scgs. Unlike Sl, Dice Relies On Automatic Differentiation For Performing The Requisite Graph Manipulations. We Verify The Correctness Of Dice Both Through A Proof And Through Numerical Evaluation Of The Dice Gradient Estimates. We Also Use Dice To Propose And Evaluate A Novel Approach For Multi-agent Learning. Our Code Is Available At Https://goo.gl/xkkgxn.
Multiple Instance Learning (mil) Is A Variation Of Supervised Learning Where A Single Class Label Is Assigned To A Bag Of Instances. In This Paper, We State The Mil Problem As Learning The Bernoulli Distribution Of The Bag Label Where The Bag Label Probability Is Fully Parameterized By Neural Networks. Furthermore, We Propose A Neural Network-based Permutation-invariant Aggregation Operator That Corresponds To The Attention Mechanism. Notably, An Application Of The Proposed Attention-based Operator Provides Insight Into The Contribution Of Each Instance To The Bag Label. We Show Empirically That Our Approach Achieves Comparable Performance To The Best Mil Methods On Benchmark Mil Datasets And It Outperforms Other Methods On A Mnist-based Mil Dataset And Two Real-life Histopathology Datasets Without Sacrificing Interpretability.
Interacting Systems Are Prevalent In Nature, From Dynamical Systems In Physics To Complex Societal Dynamics. The Interplay Of Components Can Give Rise To Complex Behavior, Which Can Often Be Explained Using A Simple Model Of The System's Constituent Parts. In This Work, We Introduce The Neural Relational Inference (nri) Model: An Unsupervised Model That Learns To Infer Interactions While Simultaneously Learning The Dynamics Purely From Observational Data. Our Model Takes The Form Of A Variational Auto-encoder, In Which The Latent Code Represents The Underlying Interaction Graph And The Reconstruction Is Based On Graph Neural Networks. In Experiments On Simulated Physical Systems, We Show That Our Nri Model Can Accurately Recover Ground-truth Interactions In An Unsupervised Manner. We Further Demonstrate That We Can Find An Interpretable Structure And Predict Complex Dynamics In Real Motion Capture And Sports Tracking Data.
A Key Challenge In Complex Visuomotor Control Is Learning Abstract Representations That Are Effective For Specifying Goals, Planning, And Generalization. To This End, We Introduce Universal Planning Networks (upn). Upns Embed Differentiable Planning Within A Goal-directed Policy. This Planning Computation Unrolls A Forward Model In A Latent Space And Infers An Optimal Action Plan Through Gradient Descent Trajectory Optimization. The Plan-by-gradient-descent Process And Its Underlying Representations Are Learned End-to-end To Directly Optimize A Supervised Imitation Learning Objective. We Find That The Representations Learned Are Not Only Effective For Goal-directed Visual Imitation Via Gradient-based Trajectory Optimization, But Can Also Provide A Metric For Specifying Goals Using Images. The Learned Representations Can Be Leveraged To Specify Distance-based Rewards To Reach New Target States For Model-free Reinforcement Learning, Resulting In Substantially More Effective Learning When Solving New Tasks Described Via Image-based Goals. We Were Able To Achieve Successful Transfer Of Visuomotor Planning Strategies Across Robots With Significantly Different Morphologies And Actuation Capabilities.
Today I successfully submitted my first paper to arXiv! We've submitted this paper to a journal, but it hasn't been published yet, so we wanted to get a pre-print up before advertising the corresponding software packages. Unfortunately, the process of submitting to arXiv wasn't painless. Now that I've figured out some of the quirks, however, hopefully your…
S. Laine, and T. Karras. Proceedings of the 2010 ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games, page 55--63. New York, NY, USA, ACM, (2010)