Description

[2305.18290] Direct Preference Optimization: Your Language Model is Secretly a Reward Model

Links and resources

Tags

community

  • @jonas.kaiser
  • @dblp
@jonas.kaiser's tags highlighted