tag :: llms OpenAI | BibSonomy

bookmarks (hide)2
display
all
bookmarks only
bookmarks per page
5
10
20
50
100
sort by
added at
title
RSS
BibTeX
XML

1The Magic Behind ChatGPT: Reinforcement Learning with Human Feedback
One of the key enablers of the ChatGPT magic can be traced back to 2017 under the obscure name of reinforcement learning with human feedback(RLHF). Large language models(LLMs) have become one of the most interesting environments for applying modern reinforcement learning(RL) techniques. While LLMs are great at deriving knowledge from vast amounts of text, RL can help to translate that knowledge into actions. That has been the secret behind RLHF.
11 months ago by @ghagerer
show all tags
chatgpt
llms
openai
reinforcement-learning
chatgptllmsopenaireinforcement-learning
(0)
copydelete
- community post
- history of this post
3Google "We Have No Moat, And Neither Does OpenAI"
We’ve done a lot of looking over our shoulders at OpenAI. Who will cross the next milestone? What will the next move be? But the uncomfortable truth is, we aren’t positioned to win this arms race and neither is OpenAI. While we’ve been squabbling, a third faction has been quietly eating our lunch. I’m talking, of course, about open source. Plainly put, they are lapping us. Things we consider “major open problems” are solved and in people’s hands today.
a year ago by @ghagerer
show all tags
LLMs
google
open-source
openai
LLMsgoogleopen-sourceopenai
(0)
copydelete
- community post
- history of this post

⟨⟨
⟨
1
⟩
⟩⟩

publications (hide)3
display
all
publications only
publications per page
5
10
20
50
100
sort by
added at
title
author
publication date
entry type
help for advanced sorting...
RSS
BibTeX
RDF
more...

7Improving language understanding by generative pre-training
A. Radford, K. Narasimhan, T. Salimans, and I. Sutskever. (2018)
10 months ago by @ghagerer
show all tags
chatgpt
codefreeze
llms
openai
reinforcement-learning
chatgptcodefreezellmsopenaireinforcement-learning
(1)
copydeleteadd this publication to your clipboard
1Fine-Tuning Language Models from Human Preferences.
D. Ziegler, N. Stiennon, J. Wu, T. Brown, A. Radford, D. Amodei, P. Christiano, and G. Irving. CoRR, (2019)
10 months ago by @ghagerer
show all tags
ChatGPT
OpenAI
codefreeze
llms
reinforcement-learning
ChatGPTOpenAIcodefreezellmsreinforcement-learning
(0)
copydeleteadd this publication to your clipboard
3Learning to summarize from human feedback.
N. Stiennon, L. Ouyang, J. Wu, D. Ziegler, R. Lowe, C. Voss, A. Radford, D. Amodei, and P. Christiano. CoRR, (2020)
11 months ago by @ghagerer
show all tags
ChatGPT
OpenAI
abstractive
llms
reinforcement-learning
summarization
ChatGPTOpenAIabstractivellmsreinforcement-learningsummarization
(0)
copydeleteadd this publication to your clipboard

⟨⟨
⟨
1
⟩
⟩⟩