Article,

Utilising a statistical inequality for efficiently finding term sets

M. Melucci.
Information Processing & Management, 52 (6): 1086--1121 (November 2016)
DOI: 10.1016/j.ipm.2016.04.011

Abstract

Information Retrieval (IR) systems aim to find sets of terms that discriminate documents and often exploit frequency as an evidence that signals a non-random set of terms. Frequent Itemset (FI) mining refers to a class of algorithms that can be applied to IR to find non-random set of terms. Finding FIs is a very expensive computational task because of the exponential number of itemsets. To reduce this cost, many approaches to mining FIs are based on the monotonicity property that an itemset is frequent only if all its subsets are frequent. However, it is still uncertain whether an itemset is frequent if all its subsets are frequent, thus requiring additional scans and eventually computational cost. We introduce a statistical inequality called Bell-Wigner Inequality (BWI) as a conceptual enhancement of monotonicity to predict with certainty when an itemset is frequent and when it is infrequent. Using both data mining datasets and a large IR test collection, an empirical validation shows that the BWI can significantly reduce computational cost.

BibTeX key: melucci_utilising_2016
entry type: article
year: 2016
month: nov
journal: Information Processing & Management
number: 6
pages: 1086--1121
volume: 52
issn: 03064573
DOI: 10.1016/j.ipm.2016.04.011

BibSonomy

Utilising a statistical inequality for efficiently finding term sets

Abstract

Tags

Users

Comments and Reviewsshow / hide

Cite this publication

More citation styles

search on