Distance Based clustering of Semantic Web Resources
G. Grimnes, P. Edwards, and A. Preece. Proceedings of the 5th European Semantic Web Conference, Berlin, Heidelberg, Springer Verlag, (June 2008)
Abstract
The original Semantic Web vision was explicit in the need for intelligent autonomous agents that would represent users and help them navigate the Semantic Web. We argue that an essential feature for such agents is the capability to analyse data and learn. In this paper we outline the challenges and issues surrounding the application of clustering algorithms to Semantic Web data. We present several ways to extract instances from a large RDF graph and computing the distance between these. We evaluate our approaches on three different data-sets, one representing a typical relational database to RDF conversion, one based on data from a ontologically rich Semantic Web enabled application, and one consisting of a crawl of FOAF documents; applying both supervised and unsupervised evaluation metrics. Our evaluation did not support choosing a single combination of instance extraction method and similarity metric as superior in all cases, and as expected the behaviour depends greatly on the data being clustered. Instead, we attempt to identify characteristics of data that make particular methods more suitable.
%0 Conference Paper
%1 grimnes2008distance
%A Grimnes, Gunnar
%A Edwards, Peter
%A Preece, Alun
%B Proceedings of the 5th European Semantic Web Conference
%C Berlin, Heidelberg
%D 2008
%E Hauswirth, Manfred
%E Koubarakis, Manolis
%E Bechhofer, Sean
%I Springer Verlag
%K rdf measure clustering distance learning
%T Distance Based clustering of Semantic Web Resources
%U http://data.semanticweb.org/conference/eswc/2008/papers/246
%X The original Semantic Web vision was explicit in the need for intelligent autonomous agents that would represent users and help them navigate the Semantic Web. We argue that an essential feature for such agents is the capability to analyse data and learn. In this paper we outline the challenges and issues surrounding the application of clustering algorithms to Semantic Web data. We present several ways to extract instances from a large RDF graph and computing the distance between these. We evaluate our approaches on three different data-sets, one representing a typical relational database to RDF conversion, one based on data from a ontologically rich Semantic Web enabled application, and one consisting of a crawl of FOAF documents; applying both supervised and unsupervised evaluation metrics. Our evaluation did not support choosing a single combination of instance extraction method and similarity metric as superior in all cases, and as expected the behaviour depends greatly on the data being clustered. Instead, we attempt to identify characteristics of data that make particular methods more suitable.
@inproceedings{grimnes2008distance,
abstract = {The original Semantic Web vision was explicit in the need for intelligent autonomous agents that would represent users and help them navigate the Semantic Web. We argue that an essential feature for such agents is the capability to analyse data and learn. In this paper we outline the challenges and issues surrounding the application of clustering algorithms to Semantic Web data. We present several ways to extract instances from a large RDF graph and computing the distance between these. We evaluate our approaches on three different data-sets, one representing a typical relational database to RDF conversion, one based on data from a ontologically rich Semantic Web enabled application, and one consisting of a crawl of FOAF documents; applying both supervised and unsupervised evaluation metrics. Our evaluation did not support choosing a single combination of instance extraction method and similarity metric as superior in all cases, and as expected the behaviour depends greatly on the data being clustered. Instead, we attempt to identify characteristics of data that make particular methods more suitable.},
added-at = {2008-05-28T14:50:01.000+0200},
address = {Berlin, Heidelberg},
author = {Grimnes, Gunnar and Edwards, Peter and Preece, Alun},
biburl = {https://www.bibsonomy.org/bibtex/21a97459b80d2cac3fd8b935452fe0418/eswc2008},
booktitle = {Proceedings of the 5th European Semantic Web Conference},
editor = {Hauswirth, Manfred and Koubarakis, Manolis and Bechhofer, Sean},
interhash = {ca0a3d3694050b29f6b68ba5242a22fb},
intrahash = {1a97459b80d2cac3fd8b935452fe0418},
keywords = {rdf measure clustering distance learning},
month = {June},
publisher = {Springer Verlag},
series = {LNCS},
timestamp = {2008-05-28T14:50:01.000+0200},
title = {Distance Based clustering of Semantic Web Resources},
url = {http://data.semanticweb.org/conference/eswc/2008/papers/246},
year = 2008
}