D. Dürrschnabel, and G. Stumme. (2023)cite arxiv:2302.11554Comment: 11 pages, 6 figures, 2 tables, 3 algorithms.
Abstract
In large datasets, it is hard to discover and analyze structure. It is thus
common to introduce tags or keywords for the items. In applications, such
datasets are then filtered based on these tags. Still, even medium-sized
datasets with a few tags result in complex and for humans hard-to-navigate
systems. In this work, we adopt the method of ordinal factor analysis to
address this problem. An ordinal factor arranges a subset of the tags in a
linear order based on their underlying structure. A complete ordinal
factorization, which consists of such ordinal factors, precisely represents the
original dataset. Based on such an ordinal factorization, we provide a way to
discover and explain relationships between different items and attributes in
the dataset. However, computing even just one ordinal factor of high
cardinality is computationally complex. We thus propose the greedy algorithm in
this work. This algorithm extracts ordinal factors using already existing fast
algorithms developed in formal concept analysis. Then, we leverage to propose a
comprehensive way to discover relationships in the dataset. We furthermore
introduce a distance measure based on the representation emerging from the
ordinal factorization to discover similar items. To evaluate the method, we
conduct a case study on different datasets.
%0 Generic
%1 durrschnabel2023greedy
%A Dürrschnabel, Dominik
%A Stumme, Gerd
%D 2023
%K 2023 greedy_algorithm itegpub myown ordinal_factor_analysis
%T Greedy Discovery of Ordinal Factors
%U http://arxiv.org/abs/2302.11554
%X In large datasets, it is hard to discover and analyze structure. It is thus
common to introduce tags or keywords for the items. In applications, such
datasets are then filtered based on these tags. Still, even medium-sized
datasets with a few tags result in complex and for humans hard-to-navigate
systems. In this work, we adopt the method of ordinal factor analysis to
address this problem. An ordinal factor arranges a subset of the tags in a
linear order based on their underlying structure. A complete ordinal
factorization, which consists of such ordinal factors, precisely represents the
original dataset. Based on such an ordinal factorization, we provide a way to
discover and explain relationships between different items and attributes in
the dataset. However, computing even just one ordinal factor of high
cardinality is computationally complex. We thus propose the greedy algorithm in
this work. This algorithm extracts ordinal factors using already existing fast
algorithms developed in formal concept analysis. Then, we leverage to propose a
comprehensive way to discover relationships in the dataset. We furthermore
introduce a distance measure based on the representation emerging from the
ordinal factorization to discover similar items. To evaluate the method, we
conduct a case study on different datasets.
@misc{durrschnabel2023greedy,
abstract = {In large datasets, it is hard to discover and analyze structure. It is thus
common to introduce tags or keywords for the items. In applications, such
datasets are then filtered based on these tags. Still, even medium-sized
datasets with a few tags result in complex and for humans hard-to-navigate
systems. In this work, we adopt the method of ordinal factor analysis to
address this problem. An ordinal factor arranges a subset of the tags in a
linear order based on their underlying structure. A complete ordinal
factorization, which consists of such ordinal factors, precisely represents the
original dataset. Based on such an ordinal factorization, we provide a way to
discover and explain relationships between different items and attributes in
the dataset. However, computing even just one ordinal factor of high
cardinality is computationally complex. We thus propose the greedy algorithm in
this work. This algorithm extracts ordinal factors using already existing fast
algorithms developed in formal concept analysis. Then, we leverage to propose a
comprehensive way to discover relationships in the dataset. We furthermore
introduce a distance measure based on the representation emerging from the
ordinal factorization to discover similar items. To evaluate the method, we
conduct a case study on different datasets.},
added-at = {2023-04-24T11:03:45.000+0200},
author = {Dürrschnabel, Dominik and Stumme, Gerd},
biburl = {https://www.bibsonomy.org/bibtex/25187cd8ecdb39f7e42117f524c8ca092/duerrschnabel},
interhash = {af8d748c658c4995150a1d70901bf72b},
intrahash = {5187cd8ecdb39f7e42117f524c8ca092},
keywords = {2023 greedy_algorithm itegpub myown ordinal_factor_analysis},
note = {cite arxiv:2302.11554Comment: 11 pages, 6 figures, 2 tables, 3 algorithms},
timestamp = {2024-05-21T10:22:49.000+0200},
title = {Greedy Discovery of Ordinal Factors},
url = {http://arxiv.org/abs/2302.11554},
year = 2023
}