Abstract
Current state-of-the-art approaches for named entity recognition (NER) using
BERT-style transformers typically use one of two different approaches: (1) The
first fine-tunes the transformer itself on the NER task and adds only a simple
linear layer for word-level predictions. (2) The second uses the transformer
only to provide features to a standard LSTM-CRF sequence labeling architecture
and thus performs no fine-tuning. In this paper, we perform a comparative
analysis of both approaches in a variety of settings currently considered in
the literature. In particular, we evaluate how well they work when
document-level features are leveraged. Our evaluation on the classic CoNLL
benchmark datasets for 4 languages shows that document-level features
significantly improve NER quality and that fine-tuning generally outperforms
the feature-based approaches. We present recommendations for parameters as well
as several new state-of-the-art numbers. Our approach is integrated into the
Flair framework to facilitate reproduction of our experiments.
Users
Please
log in to take part in the discussion (add own reviews or comments).