копировать удалить добавить публикацию в буфер
Запись сообщества
посмотреть историю данной записи
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Are Outlier Detection Methods Resilient to Sampling?

L. Berti-Equille, J. Loh, и S. Thirumuruganathan. (2019)cite arxiv:1907.13276Comment: 18 pages.

Аннотация

Outlier detection is a fundamental task in data mining and has many applications including detecting errors in databases. While there has been extensive prior work on methods for outlier detection, modern datasets often have sizes that are beyond the ability of commonly used methods to process the data within a reasonable time. To overcome this issue, outlier detection methods can be trained over samples of the full-sized dataset. However, it is not clear how a model trained on a sample compares with one trained on the entire dataset. In this paper, we introduce the notion of resilience to sampling for outlier detection methods. Orthogonal to traditional performance metrics such as precision/recall, resilience represents the extent to which the outliers detected by a method applied to samples from a sampling scheme matches those when applied to the whole dataset. We propose a novel approach for estimating the resilience to sampling of both individual outlier methods and their ensembles. We performed an extensive experimental study on synthetic and real-world datasets where we study seven diverse and representative outlier detection methods, compare results obtained from samples versus those obtained from the whole datasets and evaluate the accuracy of our resilience estimates. We observed that the methods are not equally resilient to a given sampling scheme and it is often the case that careful joint selection of both the sampling scheme and the outlier detection method is necessary. It is our hope that the paper initiates research on designing outlier detection algorithms that are resilient to sampling.

Описание

[1907.13276v1] Are Outlier Detection Methods Resilient to Sampling?

Линки и ресурсы

ключ BibTeX: bertiequille2019outlier
тип записи: article
год: 2019
url: http://arxiv.org/abs/1907.13276
Примечание: cite arxiv:1907.13276Comment: 18 pages

тэги

Цитировать эту публикацию

@article{bertiequille2019outlier, abstract = {Outlier detection is a fundamental task in data mining and has many applications including detecting errors in databases. While there has been extensive prior work on methods for outlier detection, modern datasets often have sizes that are beyond the ability of commonly used methods to process the data within a reasonable time. To overcome this issue, outlier detection methods can be trained over samples of the full-sized dataset. However, it is not clear how a model trained on a sample compares with one trained on the entire dataset. In this paper, we introduce the notion of resilience to sampling for outlier detection methods. Orthogonal to traditional performance metrics such as precision/recall, resilience represents the extent to which the outliers detected by a method applied to samples from a sampling scheme matches those when applied to the whole dataset. We propose a novel approach for estimating the resilience to sampling of both individual outlier methods and their ensembles. We performed an extensive experimental study on synthetic and real-world datasets where we study seven diverse and representative outlier detection methods, compare results obtained from samples versus those obtained from the whole datasets and evaluate the accuracy of our resilience estimates. We observed that the methods are not equally resilient to a given sampling scheme and it is often the case that careful joint selection of both the sampling scheme and the outlier detection method is necessary. It is our hope that the paper initiates research on designing outlier detection algorithms that are resilient to sampling.}, added-at = {2019-08-22T20:46:24.000+0200}, author = {Berti-Equille, Laure and Loh, Ji Meng and Thirumuruganathan, Saravanan}, biburl = {https://www.bibsonomy.org/bibtex/225cfaef3b0d4d8f6409e8db13d10396e/kirk86}, description = {[1907.13276v1] Are Outlier Detection Methods Resilient to Sampling?}, interhash = {f8f520b7ce626c052dc8c285a096d5b7}, intrahash = {25cfaef3b0d4d8f6409e8db13d10396e}, keywords = {anomaly-detection outliers sampling}, note = {cite arxiv:1907.13276Comment: 18 pages}, timestamp = {2019-08-22T20:46:24.000+0200}, title = {Are Outlier Detection Methods Resilient to Sampling?}, url = {http://arxiv.org/abs/1907.13276}, year = 2019 }

искать в

Метаданные

Последнее изменение 5 лет назад
Создан 5 лет назад

Комментарии и рецензии
(0)

Комментарии, или рецензии отсутствуют. Вы можете их написать!