Automatically creating datasets for measures of semantic relatedness
T. Zesch, and I. Gurevych. Proceedings of the Workshop on Linguistic Distances, page 16--24. Stroudsburg, PA, USA, Association for Computational Linguistics, (2006)
Abstract
Semantic relatedness is a special form of linguistic distance between words. Evaluating semantic relatedness measures is usually performed by comparison with human judgments. Previous test datasets had been created analytically and were limited in size. We propose a corpus-based system for automatically creating test datasets. Experiments with human subjects show that the resulting datasets cover all degrees of relatedness. As a result of the corpus-based approach, test datasets cover all types of lexical-semantic relations and contain domain-specific words naturally occurring in texts.
Description
Automatically creating datasets for measures of semantic relatedness
%0 Conference Paper
%1 Zesch:2006:ACD:1641976.1641980
%A Zesch, Torsten
%A Gurevych, Iryna
%B Proceedings of the Workshop on Linguistic Distances
%C Stroudsburg, PA, USA
%D 2006
%I Association for Computational Linguistics
%K automatic discovery documents similarity
%P 16--24
%T Automatically creating datasets for measures of semantic relatedness
%U http://dl.acm.org/citation.cfm?id=1641976.1641980
%X Semantic relatedness is a special form of linguistic distance between words. Evaluating semantic relatedness measures is usually performed by comparison with human judgments. Previous test datasets had been created analytically and were limited in size. We propose a corpus-based system for automatically creating test datasets. Experiments with human subjects show that the resulting datasets cover all degrees of relatedness. As a result of the corpus-based approach, test datasets cover all types of lexical-semantic relations and contain domain-specific words naturally occurring in texts.
%@ 1-932432-83-3
@inproceedings{Zesch:2006:ACD:1641976.1641980,
abstract = {Semantic relatedness is a special form of linguistic distance between words. Evaluating semantic relatedness measures is usually performed by comparison with human judgments. Previous test datasets had been created analytically and were limited in size. We propose a corpus-based system for automatically creating test datasets. Experiments with human subjects show that the resulting datasets cover all degrees of relatedness. As a result of the corpus-based approach, test datasets cover all types of lexical-semantic relations and contain domain-specific words naturally occurring in texts.},
acmid = {1641980},
added-at = {2012-12-14T10:12:50.000+0100},
address = {Stroudsburg, PA, USA},
author = {Zesch, Torsten and Gurevych, Iryna},
biburl = {https://www.bibsonomy.org/bibtex/26e3ec656c3367d3c6231bf19adc39bfa/martin-zenker},
booktitle = {Proceedings of the Workshop on Linguistic Distances},
description = {Automatically creating datasets for measures of semantic relatedness},
interhash = {d50d84a32203397fdcadcc3d52c2dc2a},
intrahash = {6e3ec656c3367d3c6231bf19adc39bfa},
isbn = {1-932432-83-3},
keywords = {automatic discovery documents similarity},
location = {Sydney, Australia},
numpages = {9},
pages = {16--24},
publisher = {Association for Computational Linguistics},
series = {LD '06},
timestamp = {2012-12-14T10:12:50.000+0100},
title = {Automatically creating datasets for measures of semantic relatedness},
url = {http://dl.acm.org/citation.cfm?id=1641976.1641980},
year = 2006
}