We continue the investigation into the power of smaller Transformer-based
language models as initiated by TinyStories -- a 10 million parameter
model that can produce coherent English -- and the follow-up work on
phi-1, a 1.3 billion parameter model with Python coding performance
close to the state-of-the-art. The latter work proposed to use existing Large
Language Models (LLMs) to generate ``textbook quality" data as a way to enhance
the learning process compared to traditional web data. We follow the
``Textbooks Are All You Need" approach, focusing this time on common sense
reasoning in natural language, and create a new 1.3 billion parameter model
named phi-1.5, with performance on natural language tasks comparable
to models 5x larger, and surpassing most non-frontier LLMs on more complex
reasoning tasks such as grade-school mathematics and basic coding. More
generally, phi-1.5 exhibits many of the traits of much larger LLMs,
both good -- such as the ability to ``think step by step" or perform some
rudimentary in-context learning -- and bad, including hallucinations and the
potential for toxic and biased generations -- encouragingly though, we are
seeing improvement on that front thanks to the absence of web data. We
open-source phi-1.5 to promote further research on these urgent
topics.
Description
Textbooks Are All You Need II: phi-1.5 technical report
%0 Generic
%1 li2023textbooks
%A Li, Yuanzhi
%A Bubeck, Sébastien
%A Eldan, Ronen
%A Del Giorno, Allie
%A Gunasekar, Suriya
%A Lee, Yin Tat
%D 2023
%K llm
%T Textbooks Are All You Need II: phi-1.5 technical report
%U http://arxiv.org/abs/2309.05463
%X We continue the investigation into the power of smaller Transformer-based
language models as initiated by TinyStories -- a 10 million parameter
model that can produce coherent English -- and the follow-up work on
phi-1, a 1.3 billion parameter model with Python coding performance
close to the state-of-the-art. The latter work proposed to use existing Large
Language Models (LLMs) to generate ``textbook quality" data as a way to enhance
the learning process compared to traditional web data. We follow the
``Textbooks Are All You Need" approach, focusing this time on common sense
reasoning in natural language, and create a new 1.3 billion parameter model
named phi-1.5, with performance on natural language tasks comparable
to models 5x larger, and surpassing most non-frontier LLMs on more complex
reasoning tasks such as grade-school mathematics and basic coding. More
generally, phi-1.5 exhibits many of the traits of much larger LLMs,
both good -- such as the ability to ``think step by step" or perform some
rudimentary in-context learning -- and bad, including hallucinations and the
potential for toxic and biased generations -- encouragingly though, we are
seeing improvement on that front thanks to the absence of web data. We
open-source phi-1.5 to promote further research on these urgent
topics.
@misc{li2023textbooks,
abstract = {We continue the investigation into the power of smaller Transformer-based
language models as initiated by \textbf{TinyStories} -- a 10 million parameter
model that can produce coherent English -- and the follow-up work on
\textbf{phi-1}, a 1.3 billion parameter model with Python coding performance
close to the state-of-the-art. The latter work proposed to use existing Large
Language Models (LLMs) to generate ``textbook quality" data as a way to enhance
the learning process compared to traditional web data. We follow the
``Textbooks Are All You Need" approach, focusing this time on common sense
reasoning in natural language, and create a new 1.3 billion parameter model
named \textbf{phi-1.5}, with performance on natural language tasks comparable
to models 5x larger, and surpassing most non-frontier LLMs on more complex
reasoning tasks such as grade-school mathematics and basic coding. More
generally, \textbf{phi-1.5} exhibits many of the traits of much larger LLMs,
both good -- such as the ability to ``think step by step" or perform some
rudimentary in-context learning -- and bad, including hallucinations and the
potential for toxic and biased generations -- encouragingly though, we are
seeing improvement on that front thanks to the absence of web data. We
open-source \textbf{phi-1.5} to promote further research on these urgent
topics.},
added-at = {2023-09-12T15:32:28.000+0200},
author = {Li, Yuanzhi and Bubeck, Sébastien and Eldan, Ronen and Del Giorno, Allie and Gunasekar, Suriya and Lee, Yin Tat},
biburl = {https://www.bibsonomy.org/bibtex/2d83de556991994f97d2c93c0f59cfb4a/vincentqb},
description = {Textbooks Are All You Need II: phi-1.5 technical report},
interhash = {dbeab91cd2ecbb814a0177506aae2adb},
intrahash = {d83de556991994f97d2c93c0f59cfb4a},
keywords = {llm},
note = {cite arxiv:2309.05463},
timestamp = {2023-09-12T15:32:28.000+0200},
title = {Textbooks Are All You Need II: phi-1.5 technical report},
url = {http://arxiv.org/abs/2309.05463},
year = 2023
}