Kopieren Löschen Diese Publikation zur Ablage hinzufügen
Community-Eintrag
Versionsverlauf dieses Eintrags
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Evaluating Large Language Models Trained on Code

M. Chen, J. Tworek, H. Jun, Q. Yuan, H. Pinto, J. Kaplan, H. Edwards, Y. Burda, N. Joseph, G. Brockman, A. Ray, R. Puri, G. Krueger, M. Petrov, H. Khlaaf, G. Sastry, P. Mishkin, B. Chan, S. Gray, N. Ryder, M. Pavlov, A. Power, L. Kaiser, M. Bavarian, C. Winter, P. Tillet, F. Such, D. Cummings, M. Plappert, F. Chantzis, E. Barnes, A. Herbert-Voss, W. Guss, A. Nichol, A. Paino, N. Tezak, J. Tang, I. Babuschkin, S. Balaji, S. Jain, W. Saunders, C. Hesse, A. Carr, J. Leike, J. Achiam, V. Misra, E. Morikawa, A. Radford, M. Knight, M. Brundage, M. Murati, K. Mayer, P. Welinder, B. McGrew, D. Amodei, S. McCandlish, I. Sutskever, und W. Zaremba. arXiv:cs.LG/2107.03374. arXiv, (2021)

Zusammenfassung

We introduce Codex, a GPT language model fine-tuned on publicly available code from GitHub, and study its Python code-writing capabilities. A distinct production version of Codex powers GitHub Copilot. On HumanEval, a new evaluation set we release to measure functional correctness for synthesizing programs from docstrings, our model solves 28.8% of the problems, while GPT-3 solves 0% and GPT-J solves 11.4%. Furthermore, we find that repeated sampling from the model is a surprisingly effective strategy for producing working solutions to difficult prompts. Using this method, we solve 70.2% of our problems with 100 samples per problem. Careful investigation of our model reveals its limitations, including difficulty with docstrings describing long chains of operations and with binding operations to variables. Finally, we discuss the potential broader impacts of deploying powerful code generation technologies, covering safety, security, and economics.

Beschreibung

Demonstration of code generation by copilot arXiv:cs.LG/2107.03374 https://arxiv.org/abs/2107.03374

Links und Ressourcen

BibTeX-Schlüssel: chen2021evaluating
Eintragstyp: techreport
Jahr: 2021
Institution: arXiv
Nummer: arXiv:cs.LG/2107.03374
URL: http://arxiv.org/abs/2107.03374

@brusilovskys Tags hervorgehoben

Zitieren Sie diese Publikation

%0 Report %1 chen2021evaluating %A Chen, Mark %A Tworek, Jerry %A Jun, Heewoo %A Yuan, Qiming %A Pinto, Henrique Ponde de Oliveira %A Kaplan, Jared %A Edwards, Harri %A Burda, Yuri %A Joseph, Nicholas %A Brockman, Greg %A Ray, Alex %A Puri, Raul %A Krueger, Gretchen %A Petrov, Michael %A Khlaaf, Heidy %A Sastry, Girish %A Mishkin, Pamela %A Chan, Brooke %A Gray, Scott %A Ryder, Nick %A Pavlov, Mikhail %A Power, Alethea %A Kaiser, Lukasz %A Bavarian, Mohammad %A Winter, Clemens %A Tillet, Philippe %A Such, Felipe Petroski %A Cummings, Dave %A Plappert, Matthias %A Chantzis, Fotios %A Barnes, Elizabeth %A Herbert-Voss, Ariel %A Guss, William Hebgen %A Nichol, Alex %A Paino, Alex %A Tezak, Nikolas %A Tang, Jie %A Babuschkin, Igor %A Balaji, Suchir %A Jain, Shantanu %A Saunders, William %A Hesse, Christopher %A Carr, Andrew N. %A Leike, Jan %A Achiam, Josh %A Misra, Vedant %A Morikawa, Evan %A Radford, Alec %A Knight, Matthew %A Brundage, Miles %A Murati, Mira %A Mayer, Katie %A Welinder, Peter %A McGrew, Bob %A Amodei, Dario %A McCandlish, Sam %A Sutskever, Ilya %A Zaremba, Wojciech %D 2021 %K llm programming %N arXiv:cs.LG/2107.03374 %T Evaluating Large Language Models Trained on Code %U http://arxiv.org/abs/2107.03374 %X We introduce Codex, a GPT language model fine-tuned on publicly available code from GitHub, and study its Python code-writing capabilities. A distinct production version of Codex powers GitHub Copilot. On HumanEval, a new evaluation set we release to measure functional correctness for synthesizing programs from docstrings, our model solves 28.8% of the problems, while GPT-3 solves 0% and GPT-J solves 11.4%. Furthermore, we find that repeated sampling from the model is a surprisingly effective strategy for producing working solutions to difficult prompts. Using this method, we solve 70.2% of our problems with 100 samples per problem. Careful investigation of our model reveals its limitations, including difficulty with docstrings describing long chains of operations and with binding operations to variables. Finally, we discuss the potential broader impacts of deploying powerful code generation technologies, covering safety, security, and economics.

@techreport{chen2021evaluating, abstract = {We introduce Codex, a GPT language model fine-tuned on publicly available code from GitHub, and study its Python code-writing capabilities. A distinct production version of Codex powers GitHub Copilot. On HumanEval, a new evaluation set we release to measure functional correctness for synthesizing programs from docstrings, our model solves 28.8% of the problems, while GPT-3 solves 0% and GPT-J solves 11.4%. Furthermore, we find that repeated sampling from the model is a surprisingly effective strategy for producing working solutions to difficult prompts. Using this method, we solve 70.2% of our problems with 100 samples per problem. Careful investigation of our model reveals its limitations, including difficulty with docstrings describing long chains of operations and with binding operations to variables. Finally, we discuss the potential broader impacts of deploying powerful code generation technologies, covering safety, security, and economics.}, added-at = {2023-12-06T00:47:49.000+0100}, author = {Chen, Mark and Tworek, Jerry and Jun, Heewoo and Yuan, Qiming and Pinto, Henrique Ponde de Oliveira and Kaplan, Jared and Edwards, Harri and Burda, Yuri and Joseph, Nicholas and Brockman, Greg and Ray, Alex and Puri, Raul and Krueger, Gretchen and Petrov, Michael and Khlaaf, Heidy and Sastry, Girish and Mishkin, Pamela and Chan, Brooke and Gray, Scott and Ryder, Nick and Pavlov, Mikhail and Power, Alethea and Kaiser, Lukasz and Bavarian, Mohammad and Winter, Clemens and Tillet, Philippe and Such, Felipe Petroski and Cummings, Dave and Plappert, Matthias and Chantzis, Fotios and Barnes, Elizabeth and Herbert-Voss, Ariel and Guss, William Hebgen and Nichol, Alex and Paino, Alex and Tezak, Nikolas and Tang, Jie and Babuschkin, Igor and Balaji, Suchir and Jain, Shantanu and Saunders, William and Hesse, Christopher and Carr, Andrew N. and Leike, Jan and Achiam, Josh and Misra, Vedant and Morikawa, Evan and Radford, Alec and Knight, Matthew and Brundage, Miles and Murati, Mira and Mayer, Katie and Welinder, Peter and McGrew, Bob and Amodei, Dario and McCandlish, Sam and Sutskever, Ilya and Zaremba, Wojciech}, biburl = {https://www.bibsonomy.org/bibtex/2461002004e4cce693570b5435f694570/brusilovsky}, description = {Demonstration of code generation by copilot arXiv:cs.LG/2107.03374 https://arxiv.org/abs/2107.03374}, institution = {arXiv}, interhash = {612004c4b9c4bdd58c95b8e5127b9f9f}, intrahash = {461002004e4cce693570b5435f694570}, keywords = {llm programming}, number = {arXiv:cs.LG/2107.03374}, timestamp = {2023-12-06T00:54:23.000+0100}, title = {Evaluating Large Language Models Trained on Code}, url = {http://arxiv.org/abs/2107.03374}, year = 2021 }

BibSonomy

Kopieren Löschen Diese Publikation zur Ablage hinzufügen
Community-Eintrag
Versionsverlauf dieses Eintrags
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Evaluating Large Language Models Trained on Code

Zusammenfassung

Beschreibung

Links und Ressourcen

Tags

Community

Zitieren Sie diese Publikation

Mehr Zitationsstile

Suchen auf

Metadaten

Kommentare und Rezensionen
(0)

BibSonomy

KopierenLöschenDiese Publikation zur Ablage hinzufügenCommunity-EintragVersionsverlauf dieses EintragsURLDOIBibTeXEndNoteAPAChicagoDIN 1505HarvardMSOffice XML Evaluating Large Language Models Trained on Code

Zusammenfassung

Beschreibung

Links und Ressourcen

Tags

Community

Zitieren Sie diese Publikation

Mehr Zitationsstile

Suchen auf

Metadaten

Kommentare und Rezensionen (0)

Kopieren Löschen Diese Publikation zur Ablage hinzufügen
Community-Eintrag
Versionsverlauf dieses Eintrags
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Evaluating Large Language Models Trained on Code

Kommentare und Rezensionen
(0)