JournalArticle,

TinyBERT: Distilling BERT for Natural Language Understanding

X. Jiao, Y. Yin, L. Shang, X. Jiang, X. Chen, L. Li, F. Wang, und Q. Liu.
(23.09.2019)
DOI: 10.18653/v1/2020.findings-emnlp.372

Zusammenfassung

Language model pre-training, such as BERT, has significantly improved the performances of many natural language processing tasks. However, pre-trained language models are usually computationally expensive, so it is difficult to efficiently execute them on resource-restricted devices. To accelerate inference and reduce model size while maintaining accuracy, we first propose a novel Transformer distillation method that is specially designed for knowledge distillation (KD) of the Transformer-based models. By leveraging this new KD method, the plenty of knowledge encoded in a large “teacher” BERT can be effectively transferred to a small “student” TinyBERT. Then, we introduce a new two-stage learning framework for TinyBERT, which performs Transformer distillation at both the pre-training and task-specific learning stages. This framework ensures that TinyBERT can capture the general-domain as well as the task-specific knowledge in BERT. TinyBERT4 with 4 layers is empirically effective and achieves more than 96.8% the performance of its teacher BERT-Base on GLUE benchmark, while being 7.5x smaller and 9.4x faster on inference. TinyBERT4 is also significantly better than 4-layer state-of-the-art baselines on BERT distillation, with only ~28% parameters and ~31% inference time of them. Moreover, TinyBERT6 with 6 layers performs on-par with its teacher BERT-Base.

BibTeX-Schlüssel: Xiaoqi2019
Eintragstyp: JournalArticle
Jahr: 2019
Monat: 9
Tag: 23
Seiten: 4163-4174
DOI: 10.18653/v1/2020.findings-emnlp.372
URL: https://www.semanticscholar.org/paper/0cbf97173391b0430140117027edcaf1a37968c7

Nutzer

Kommentare und Rezensionenanzeigen / verbergen

Bitte melden Sie sich an um selbst Rezensionen oder Kommentare zu erstellen.

Zitieren Sie diese Publikation

@JournalArticle{Xiaoqi2019, abstract = {Language model pre-training, such as BERT, has significantly improved the performances of many natural language processing tasks. However, pre-trained language models are usually computationally expensive, so it is difficult to efficiently execute them on resource-restricted devices. To accelerate inference and reduce model size while maintaining accuracy, we first propose a novel Transformer distillation method that is specially designed for knowledge distillation (KD) of the Transformer-based models. By leveraging this new KD method, the plenty of knowledge encoded in a large “teacher” BERT can be effectively transferred to a small “student” TinyBERT. Then, we introduce a new two-stage learning framework for TinyBERT, which performs Transformer distillation at both the pre-training and task-specific learning stages. This framework ensures that TinyBERT can capture the general-domain as well as the task-specific knowledge in BERT. TinyBERT4 with 4 layers is empirically effective and achieves more than 96.8% the performance of its teacher BERT-Base on GLUE benchmark, while being 7.5x smaller and 9.4x faster on inference. TinyBERT4 is also significantly better than 4-layer state-of-the-art baselines on BERT distillation, with only ~28% parameters and ~31% inference time of them. Moreover, TinyBERT6 with 6 layers performs on-par with its teacher BERT-Base.}, added-at = {2024-01-05T23:13:52.000+0100}, author = {Jiao, Xiaoqi and Yin, Yichun and Shang, Lifeng and Jiang, Xin and Chen, Xiao and Li, Linlin and Wang, F. and Liu, Qun}, biburl = {https://www.bibsonomy.org/bibtex/29ff3da0fbd99c4322ca71c932d2df118/tomvoelker}, day = 23, description = {This paper introduces TinyBERT, a distilled version of the original BERT model, focusing on natural language understanding. It represents a significant advancement in NLP by providing a more efficient yet effective model.}, doi = {10.18653/v1/2020.findings-emnlp.372}, interhash = {b66bc8077eb34a01a4e40790ac8c0fa3}, intrahash = {9ff3da0fbd99c4322ca71c932d2df118}, keywords = {TinyBERT NLP BERT LanguageModel AI posted_with_chatgpt}, month = {9}, pages = {4163-4174}, timestamp = {2024-01-05T23:13:52.000+0100}, title = {TinyBERT: Distilling BERT for Natural Language Understanding}, url = {https://www.semanticscholar.org/paper/0cbf97173391b0430140117027edcaf1a37968c7}, year = 2019 }

BibSonomy

TinyBERT: Distilling BERT for Natural Language Understanding

Zusammenfassung

Tags

Nutzer

Kommentare und Rezensionenanzeigen / verbergen

Zitieren Sie diese Publikation

Mehr Zitationsstile

Suchen auf