Inproceedings,

SECODA: Segmentation- and Combination-Based Detection of Anomalies

.
2017 IEEE International Conference on Data Science and Advanced Analytics (DSAA), page 755-764. (October 2017)
DOI: 10.1109/DSAA.2017.35

Abstract

This study introduces SECODA, a novel general-purpose unsupervised non-parametric anomaly detection algorithm for datasets containing continuous and categorical attributes. The method is guaranteed to identify cases with unique or sparse combinations of attribute values. Continuous attributes are discretized repeatedly in order to correctly determine the frequency of such value combinations. The concept of constellations, exponentially increasing weights and discretization cut points, as well as a pruning heuristic are used to detect anomalies with an optimal number of iterations. Moreover, the algorithm has a low memory imprint and its runtime performance scales linearly with the size of the dataset. An evaluation with simulated and real-life datasets shows that this algorithm is able to identify many different types of anomalies, including complex multidimensional instances. An evaluation in terms of a data quality use case with a real dataset demonstrates that SECODA can bring relevant and practical value to real-world settings.

Tags

Users

  • @nonancourt
  • @dblp

Comments and Reviews