Misc,

GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding

A. Wang, A. Singh, J. Michael, F. Hill, O. Levy, and S. Bowman.
(2018)cite arxiv:1804.07461Comment: https://gluebenchmark.com/.

Abstract

For natural language understanding (NLU) technology to be maximally useful, both practically and as a scientific object of study, it must be general: it must be able to process language in a way that is not exclusively tailored to any one specific task or dataset. In pursuit of this objective, we introduce the General Language Understanding Evaluation benchmark (GLUE), a tool for evaluating and analyzing the performance of models across a diverse range of existing NLU tasks. GLUE is model-agnostic, but it incentivizes sharing knowledge across tasks because certain tasks have very limited training data. We further provide a hand-crafted diagnostic test suite that enables detailed linguistic analysis of NLU models. We evaluate baselines based on current methods for multi-task and transfer learning and find that they do not immediately give substantial improvements over the aggregate performance of training a separate model per task, indicating room for improvement in developing general and robust NLU systems.

BibTeX key: wang2018multitask
entry type: misc
year: 2018
url: http://arxiv.org/abs/1804.07461
note: cite arxiv:1804.07461Comment: https://gluebenchmark.com/

Users

Comments and Reviewsshow / hide

@michan 3 years ago
Der GLUE Benchmark wird in der Ausarbeitung genannt, da die Performance von KEPLER auf GLUE ein wichtiger Anhaltspunkt ist, der zeigt, dass KEPLER trotz KE-Training noch in NLP sehr ähnlich gut performt wie das Base-Modell RoBERTa.
References
Bookmarks
deleting comment

Please log in to take part in the discussion (add own reviews or comments).

BibSonomy

GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding

Abstract

Tags

Users

Comments and Reviewsshow / hide

Cite this publication

More citation styles

search on