ConDist: A Context-Driven Categorical Distance Measure
M. Ring, F. Otto, M. Becker, T. Niebler, D. Landes, and A. Hotho. Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases, volume 9284 of Lecture Notes in Computer Science, page 251-266. Springer International Publishing, (2015)
Abstract
A distance measure between objects is a key requirement for many data mining tasks like clustering, classification or outlier detection. However, for objects characterized by categorical attributes, defining meaningful distance measures is a challenging task since the values within such attributes have no inherent order, especially without additional domain knowledge. In this paper, we propose an unsupervised distance measure for objects with categorical attributes based on the idea that categorical attribute values are similar if they appear with similar value distributions on correlated context attributes. Thus, the distance measure is automatically derived from the given data set. We compare our new distance measure to existing categorical distance measures and evaluate on different data sets from the UCI machine-learning repository. The experiments show that our distance measure is recommendable, since it achieves similar or better results in a more robust way than previous approaches.
%0 Conference Paper
%1 ring2015condist
%A Ring, Markus
%A Otto, Florian
%A Becker, Martin
%A Niebler, Thomas
%A Landes, Dieter
%A Hotho, Andreas
%B Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases
%D 2015
%E Appice, Annalisa
%E Rodrigues, Pedro Pereira
%E Costa, Vítor Santos
%E Soares, Carlos
%E Gama, João
%E Jorge, Alípio
%I Springer International Publishing
%K categorical distance measure myown published
%P 251-266
%T ConDist: A Context-Driven Categorical Distance Measure
%V 9284
%X A distance measure between objects is a key requirement for many data mining tasks like clustering, classification or outlier detection. However, for objects characterized by categorical attributes, defining meaningful distance measures is a challenging task since the values within such attributes have no inherent order, especially without additional domain knowledge. In this paper, we propose an unsupervised distance measure for objects with categorical attributes based on the idea that categorical attribute values are similar if they appear with similar value distributions on correlated context attributes. Thus, the distance measure is automatically derived from the given data set. We compare our new distance measure to existing categorical distance measures and evaluate on different data sets from the UCI machine-learning repository. The experiments show that our distance measure is recommendable, since it achieves similar or better results in a more robust way than previous approaches.
@inproceedings{ring2015condist,
abstract = {A distance measure between objects is a key requirement for many data mining tasks like clustering, classification or outlier detection. However, for objects characterized by categorical attributes, defining meaningful distance measures is a challenging task since the values within such attributes have no inherent order, especially without additional domain knowledge. In this paper, we propose an unsupervised distance measure for objects with categorical attributes based on the idea that categorical attribute values are similar if they appear with similar value distributions on correlated context attributes. Thus, the distance measure is automatically derived from the given data set. We compare our new distance measure to existing categorical distance measures and evaluate on different data sets from the UCI machine-learning repository. The experiments show that our distance measure is recommendable, since it achieves similar or better results in a more robust way than previous approaches.},
added-at = {2016-10-13T16:57:21.000+0200},
author = {Ring, Markus and Otto, Florian and Becker, Martin and Niebler, Thomas and Landes, Dieter and Hotho, Andreas},
author+an = {4=highlight},
biburl = {https://www.bibsonomy.org/bibtex/2e07aaaecc57af3e9882a822ad6fa7133/thoni},
booktitle = {Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases},
editor = {Appice, Annalisa and Rodrigues, Pedro Pereira and Costa, Vítor Santos and Soares, Carlos and Gama, João and Jorge, Alípio},
interhash = {c062a57a17a0910d6c27ecd664502ac1},
intrahash = {e07aaaecc57af3e9882a822ad6fa7133},
keywords = {categorical distance measure myown published},
pages = {251-266},
publisher = {Springer International Publishing},
series = {Lecture Notes in Computer Science},
timestamp = {2018-12-29T12:27:47.000+0100},
title = {ConDist: A Context-Driven Categorical Distance Measure},
volume = 9284,
year = 2015
}