Author of the publication

Conditional Language Policy: A General Framework For Steerable Multi-Objective Finetuning.

, , , , , , , , , , , , , , , , , , , and . EMNLP (Findings), page 2153-2186. Association for Computational Linguistics, (2024)

Please choose a person to relate this publication to

To differ between persons with the same name, the academic degree and the title of an important publication will be displayed. You can also use the button next to the name to display some publications already assigned to the person.

 

Other publications of authors with the same name

FASt global convergence of gradient methods for solving regularized M-estimation., , and . SSP, page 409-412. IEEE, (2012)Leveraging User-Triggered Supervision in Contextual Bandits., , and . CoRR, (2023)Stochastic optimization and sparse statistical recovery: Optimal algorithms for high dimensions., , and . NIPS, page 1547-1555. (2012)Provably Filtering Exogenous Distractors using Multistep Inverse Dynamics., , , , and . ICLR, OpenReview.net, (2022)Stochastic Gradient Succeeds for Bandits., , , , , and . ICML, volume 202 of Proceedings of Machine Learning Research, page 24325-24360. PMLR, (2023)Model-Based Reinforcement Learning with a Generative Model is Minimax Optimal., , and . COLT, volume 125 of Proceedings of Machine Learning Research, page 67-83. PMLR, (2020)The Non-linear F-Design and Applications to Interactive Learning., , , and . ICML, OpenReview.net, (2024)Off-Policy Policy Gradient with State Distribution Correction., , , and . CoRR, (2019)Model-based RL in Contextual Decision Processes: PAC bounds and Exponential Improvements over Model-free Approaches, , , , and . (2018)cite arxiv:1811.08540Comment: COLT 2019.Noisy matrix decomposition via convex relaxation: Optimal rates in high dimensions., , and . ICML, page 1129-1136. Omnipress, (2011)