Inproceedings,

What is the Dimension of Your Binary Data?

N. Tatti, T. Mielikainen, A. Gionis, and H. Mannila.
Proceedings of the Sixth IEEE International Conference on Data Mining (ICDM 2006), page 603--612. IEEE, (December 2006)
DOI: 10.1109/ICDM.2006.167

Abstract

Many 0/1 datasets have a very large number of variables; however, they are sparse and the dependency structure of the variables is simpler than the number of variables would suggest. Defining the effective dimensionality of such a dataset is a nontrivial problem. We consider the problem of defining a robust measure of dimension for 0/1 datasets, and show that the basic idea of fractal dimension can be adapted for binary data. However, as such the fractal dimension is difficult to interpret. Hence we introduce the concept of normalized fractal dimension. For a dataset D, its normalized fractal dimension counts the number of independent columns needed to achieve the unnormalized fractal dimension of D. The normalized fractal dimension measures the degree of dependency structure of the data. We study the properties of the normalized fractal dimension and discuss its computation. We give empirical results on the normalized fractal dimension, comparing it against PCA.

BibTeX key: tatti2006dimension
entry type: inproceedings
booktitle: Proceedings of the Sixth IEEE International Conference on Data Mining (ICDM 2006)
year: 2006
month: dec
organization: IEEE
pages: 603--612
issn: 1550-4786
DOI: 10.1109/ICDM.2006.167
url: http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4053086

BibSonomy

What is the Dimension of Your Binary Data?

Abstract

Tags

Users

Comments and Reviewsshow / hide

Cite this publication

More citation styles

search on