Cheap and Fast---but is It Good?: Evaluating Non-expert Annotations for Natural Language Tasks
R. Snow, B. O'Connor, D. Jurafsky, and A. Ng. Proceedings of the Conference on Empirical Methods in Natural Language Processing, page 254--263. Stroudsburg, PA, USA, Association for Computational Linguistics, (2008)
Abstract
Human linguistic annotation is crucial for many natural language processing tasks but can be expensive and time-consuming. We explore the use of Amazon's Mechanical Turk system, a significantly cheaper and faster method for collecting annotations from a broad base of paid non-expert contributors over the Web. We investigate five tasks: affect recognition, word similarity, recognizing textual entailment, event temporal ordering, and word sense disambiguation. For all five, we show high agreement between Mechanical Turk non-expert annotations and existing gold standard labels provided by expert labelers. For the task of affect recognition, we also show that using non-expert labels for training machine learning algorithms can be as effective as using gold standard annotations from experts. We propose a technique for bias correction that significantly improves annotation quality on two tasks. We conclude that many large labeling tasks can be effectively designed and carried out in this method at a fraction of the usual expense.
%0 Conference Paper
%1 snow2008cheap
%A Snow, Rion
%A O'Connor, Brendan
%A Jurafsky, Daniel
%A Ng, Andrew Y.
%B Proceedings of the Conference on Empirical Methods in Natural Language Processing
%C Stroudsburg, PA, USA
%D 2008
%I Association for Computational Linguistics
%K crowdsourcing mturk quality
%P 254--263
%T Cheap and Fast---but is It Good?: Evaluating Non-expert Annotations for Natural Language Tasks
%U http://dl.acm.org/citation.cfm?id=1613715.1613751
%X Human linguistic annotation is crucial for many natural language processing tasks but can be expensive and time-consuming. We explore the use of Amazon's Mechanical Turk system, a significantly cheaper and faster method for collecting annotations from a broad base of paid non-expert contributors over the Web. We investigate five tasks: affect recognition, word similarity, recognizing textual entailment, event temporal ordering, and word sense disambiguation. For all five, we show high agreement between Mechanical Turk non-expert annotations and existing gold standard labels provided by expert labelers. For the task of affect recognition, we also show that using non-expert labels for training machine learning algorithms can be as effective as using gold standard annotations from experts. We propose a technique for bias correction that significantly improves annotation quality on two tasks. We conclude that many large labeling tasks can be effectively designed and carried out in this method at a fraction of the usual expense.
@inproceedings{snow2008cheap,
abstract = {Human linguistic annotation is crucial for many natural language processing tasks but can be expensive and time-consuming. We explore the use of Amazon's Mechanical Turk system, a significantly cheaper and faster method for collecting annotations from a broad base of paid non-expert contributors over the Web. We investigate five tasks: affect recognition, word similarity, recognizing textual entailment, event temporal ordering, and word sense disambiguation. For all five, we show high agreement between Mechanical Turk non-expert annotations and existing gold standard labels provided by expert labelers. For the task of affect recognition, we also show that using non-expert labels for training machine learning algorithms can be as effective as using gold standard annotations from experts. We propose a technique for bias correction that significantly improves annotation quality on two tasks. We conclude that many large labeling tasks can be effectively designed and carried out in this method at a fraction of the usual expense.},
acmid = {1613751},
added-at = {2017-12-17T17:47:36.000+0100},
address = {Stroudsburg, PA, USA},
author = {Snow, Rion and O'Connor, Brendan and Jurafsky, Daniel and Ng, Andrew Y.},
biburl = {https://www.bibsonomy.org/bibtex/2d4b0157b7abf67498c19f48d4cf23e50/thoni},
booktitle = {Proceedings of the Conference on Empirical Methods in Natural Language Processing},
description = {Cheap and fast---but is it good?},
interhash = {fa24c8db32accb61e8bf025b23574986},
intrahash = {d4b0157b7abf67498c19f48d4cf23e50},
keywords = {crowdsourcing mturk quality},
location = {Honolulu, Hawaii},
numpages = {10},
pages = {254--263},
publisher = {Association for Computational Linguistics},
series = {EMNLP '08},
timestamp = {2017-12-17T17:47:36.000+0100},
title = {Cheap and Fast---but is It Good?: Evaluating Non-expert Annotations for Natural Language Tasks},
url = {http://dl.acm.org/citation.cfm?id=1613715.1613751},
year = 2008
}