Annotating and Recognising Named Entities in Clinical Notes
Y. Wang. Proceedings of the ACL-IJCNLP 2009 Student Research Workshop, page 18-26. (August 2009)
Abstract
This paper presents ongoing research in
clinical information extraction. This work
introduces a new genre of text which are
not well-written, noise prone, ungrammatical
and with much cryptic content. A corpus
of clinical progress notes drawn form
an Intensive Care Service has been manually
annotated with more than 15000 clinical
named entities in 11 entity types. This
paper reports on the challenges involved in
creating the annotation schema, and recognising
and annotating clinical named entities.
The information extraction task has
initially used two approaches: a rule based
system and a machine learning system
using Conditional Random Fields (CRF).
Different features are investigated to assess
the interaction of feature sets and the
supervised learning approaches to establish
the combination best suited to this
data set. The rule based and CRF systems
achieved an F-score of 64.12% and
81.48% respectively.
%0 Conference Paper
%1 ER.clinical.2009
%A Wang, Yefeng
%B Proceedings of the ACL-IJCNLP 2009 Student Research Workshop
%D 2009
%K CAT CAT-CLINICAL CAT-NER ER clinical notes
%P 18-26
%T Annotating and Recognising Named Entities in Clinical Notes
%U http://www.aclweb.org/anthology/P/P09/P09-3003.pdf
%X This paper presents ongoing research in
clinical information extraction. This work
introduces a new genre of text which are
not well-written, noise prone, ungrammatical
and with much cryptic content. A corpus
of clinical progress notes drawn form
an Intensive Care Service has been manually
annotated with more than 15000 clinical
named entities in 11 entity types. This
paper reports on the challenges involved in
creating the annotation schema, and recognising
and annotating clinical named entities.
The information extraction task has
initially used two approaches: a rule based
system and a machine learning system
using Conditional Random Fields (CRF).
Different features are investigated to assess
the interaction of feature sets and the
supervised learning approaches to establish
the combination best suited to this
data set. The rule based and CRF systems
achieved an F-score of 64.12% and
81.48% respectively.
@inproceedings{ER.clinical.2009,
abstract = {This paper presents ongoing research in
clinical information extraction. This work
introduces a new genre of text which are
not well-written, noise prone, ungrammatical
and with much cryptic content. A corpus
of clinical progress notes drawn form
an Intensive Care Service has been manually
annotated with more than 15000 clinical
named entities in 11 entity types. This
paper reports on the challenges involved in
creating the annotation schema, and recognising
and annotating clinical named entities.
The information extraction task has
initially used two approaches: a rule based
system and a machine learning system
using Conditional Random Fields (CRF).
Different features are investigated to assess
the interaction of feature sets and the
supervised learning approaches to establish
the combination best suited to this
data set. The rule based and CRF systems
achieved an F-score of 64.12% and
81.48% respectively.},
added-at = {2010-04-12T04:01:42.000+0200},
author = {Wang, Yefeng},
biburl = {https://www.bibsonomy.org/bibtex/217b1d25e53bf1d8aafd59ade67df1a19/huiyangsfsu},
booktitle = {Proceedings of the ACL-IJCNLP 2009 Student Research Workshop},
interhash = {86e62ef2ccfbe72ae9ac5783e6793c23},
intrahash = {17b1d25e53bf1d8aafd59ade67df1a19},
keywords = {CAT CAT-CLINICAL CAT-NER ER clinical notes},
month = {August},
pages = {18-26},
timestamp = {2010-11-12T04:53:43.000+0100},
title = {Annotating and Recognising Named Entities in Clinical Notes},
url = {http://www.aclweb.org/anthology/P/P09/P09-3003.pdf},
year = 2009
}