Misc,

Direct Preference Optimization: Your Language Model is Secretly a Reward Model

R. Rafailov, A. Sharma, E. Mitchell, S. Ermon, C. Manning, and C. Finn.
(2023)

Meta data

BibTeX key: rafailov2023direct
entry type: misc
year: 2023
eprint: 2305.18290
archiveprefix: arXiv
primaryclass: cs.LG

Tags

Users

Comments and Reviewsshow / hide

Please log in to take part in the discussion (add own reviews or comments).

Cite this publication

search on