C. Wartena, and R. Brussee. DEXA Workshops, page 54-58. IEEE Computer Society, (2008)
Abstract
We consider topic detection without any prior knowledge
of category structure or possible categories. Keywords are
extracted and clustered based on different similarity measures
using the induced k-bisecting clustering algorithm.
Evaluation on Wikipedia articles shows that clusters of keywords
correlate strongly with the Wikipedia categories of
the articles. In addition, we find that a distance measure
based on the Jensen-Shannon divergence of probability distributions
outperforms the cosine similarity. In particular,
a newly proposed term distribution taking co-occurrence of
terms into account gives best results.
%0 Conference Paper
%1 DBLP:conf/dexaw/WartenaB08
%A Wartena, Christian
%A Brussee, Rogier
%B DEXA Workshops
%D 2008
%I IEEE Computer Society
%K clustering con_ocurance tag topic
%P 54-58
%T Topic Detection by Clustering Keywords
%X We consider topic detection without any prior knowledge
of category structure or possible categories. Keywords are
extracted and clustered based on different similarity measures
using the induced k-bisecting clustering algorithm.
Evaluation on Wikipedia articles shows that clusters of keywords
correlate strongly with the Wikipedia categories of
the articles. In addition, we find that a distance measure
based on the Jensen-Shannon divergence of probability distributions
outperforms the cosine similarity. In particular,
a newly proposed term distribution taking co-occurrence of
terms into account gives best results.
%@ 978-0-7695-3299-8
@inproceedings{DBLP:conf/dexaw/WartenaB08,
abstract = {We consider topic detection without any prior knowledge
of category structure or possible categories. Keywords are
extracted and clustered based on different similarity measures
using the induced k-bisecting clustering algorithm.
Evaluation on Wikipedia articles shows that clusters of keywords
correlate strongly with the Wikipedia categories of
the articles. In addition, we find that a distance measure
based on the Jensen-Shannon divergence of probability distributions
outperforms the cosine similarity. In particular,
a newly proposed term distribution taking co-occurrence of
terms into account gives best results.},
added-at = {2014-03-04T07:10:18.000+0100},
author = {Wartena, Christian and Brussee, Rogier},
bibsource = {DBLP, http://dblp.uni-trier.de},
biburl = {https://www.bibsonomy.org/bibtex/2848c358fab6dae7f498b0a391cc72211/inmantang},
booktitle = {DEXA Workshops},
crossref = {DBLP:conf/dexaw/2008},
ee = {http://doi.ieeecomputersociety.org/10.1109/DEXA.2008.120},
interhash = {c506dc044945fef757211c9eafdd932e},
intrahash = {848c358fab6dae7f498b0a391cc72211},
isbn = {978-0-7695-3299-8},
keywords = {clustering con_ocurance tag topic},
pages = {54-58},
publisher = {IEEE Computer Society},
timestamp = {2014-03-04T07:11:38.000+0100},
title = {Topic Detection by Clustering Keywords},
year = 2008
}