From post

Non-Linear Reinforcement Learning in Large Action Spaces: Structural Conditions and Sample-efficiency of Posterior Sampling.

, и . COLT, том 178 из Proceedings of Machine Learning Research, стр. 2776-2814. PMLR, (2022)

Please choose a person to relate this publication to

To differ between persons with the same name, the academic degree and the title of an important publication will be displayed.

 

Другие публикации лиц с тем же именем

FASt global convergence of gradient methods for solving regularized M-estimation., , и . SSP, стр. 409-412. IEEE, (2012)Leveraging User-Triggered Supervision in Contextual Bandits., , и . CoRR, (2023)Provably Filtering Exogenous Distractors using Multistep Inverse Dynamics., , , , и . ICLR, OpenReview.net, (2022)Stochastic optimization and sparse statistical recovery: Optimal algorithms for high dimensions., , и . NIPS, стр. 1547-1555. (2012)Stochastic Gradient Succeeds for Bandits., , , , , и . ICML, том 202 из Proceedings of Machine Learning Research, стр. 24325-24360. PMLR, (2023)Model-Based Reinforcement Learning with a Generative Model is Minimax Optimal., , и . COLT, том 125 из Proceedings of Machine Learning Research, стр. 67-83. PMLR, (2020)The Non-linear F-Design and Applications to Interactive Learning., , , и . ICML, OpenReview.net, (2024)Model-based RL in Contextual Decision Processes: PAC bounds and Exponential Improvements over Model-free Approaches, , , , и . (2018)cite arxiv:1811.08540Comment: COLT 2019.Off-Policy Policy Gradient with State Distribution Correction., , , и . CoRR, (2019)Warm-starting Contextual Bandits: Robustly Combining Supervised and Bandit Feedback., , , , и . ICML, том 97 из Proceedings of Machine Learning Research, стр. 7335-7344. PMLR, (2019)