Article,

Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs.

A. Ahmadian, C. Cremer, M. Gallé, M. Fadaee, J. Kreutzer, O. Pietquin, A. Üstün, and S. Hooker.
CoRR, (2024)

Meta data

BibTeX key: journals/corr/abs-2402-14740
entry type: article
year: 2024
journal: CoRR
volume: abs/2402.14740
ee: https://doi.org/10.48550/arXiv.2402.14740
url: http://dblp.uni-trier.de/db/journals/corr/corr2402.html#abs-2402-14740

Tags

dblp

Users

Comments and Reviewsshow / hide

Please log in to take part in the discussion (add own reviews or comments).

Cite this publication

search on