Anomaly-based Fault Detection System in Distributed System
B. uk Kim, und S. Hariri. SERA '07: Proceedings of the 5th ACIS International Conference on Software Engineering Research, Management & Applications, Seite 782--789. Washington, DC, USA, IEEE Computer Society, (2007)
DOI: http://dx.doi.org/10.1109/SERA.2007.55
Zusammenfassung
One of the important design criteria for distributed systems and their applications is their reliability and robustness to hardware and software failures. The increase in complexity, interconnectedness, dependency and the asynchronous interactions between the components that include hardware resources (computers, servers, network devices), and software (application services, middleware, web services, etc.) makes the fault detection and tolerance a challenging research problem. In this paper, we present an innovative approach based on statistical and data mining techniques to detect faults (hardware or software) and also identify the source of the fault. In our approach, we monitor and analyze in realtime all the interactions between all the components of a distributed system. We used data mining and supervised learning techniques to obtain the rules that can accurately model the normal interactions among these components. Our anomaly analysis engine will immediately produce an alert whenever one or more of the interaction rules that capture normal operations is violated due to a software or hardware failure. We evaluate the effectiveness of our approach and its performance to detect software faults that we inject asynchronously, and compare the results for different noise level.
Beschreibung
Anomaly-based Fault Detection System in Distributed System
%0 Conference Paper
%1 Kim07_ABF
%A uk Kim, Byoung
%A Hariri, Salim
%B SERA '07: Proceedings of the 5th ACIS International Conference on Software Engineering Research, Management & Applications
%C Washington, DC, USA
%D 2007
%I IEEE Computer Society
%K ac autonomic autonomic_computing distributed_system imported self-healing unread
%P 782--789
%R http://dx.doi.org/10.1109/SERA.2007.55
%T Anomaly-based Fault Detection System in Distributed System
%U http://portal.acm.org/citation.cfm?id=1307430
%X One of the important design criteria for distributed systems and their applications is their reliability and robustness to hardware and software failures. The increase in complexity, interconnectedness, dependency and the asynchronous interactions between the components that include hardware resources (computers, servers, network devices), and software (application services, middleware, web services, etc.) makes the fault detection and tolerance a challenging research problem. In this paper, we present an innovative approach based on statistical and data mining techniques to detect faults (hardware or software) and also identify the source of the fault. In our approach, we monitor and analyze in realtime all the interactions between all the components of a distributed system. We used data mining and supervised learning techniques to obtain the rules that can accurately model the normal interactions among these components. Our anomaly analysis engine will immediately produce an alert whenever one or more of the interaction rules that capture normal operations is violated due to a software or hardware failure. We evaluate the effectiveness of our approach and its performance to detect software faults that we inject asynchronously, and compare the results for different noise level.
%@ 0-7695-2867-8
@inproceedings{Kim07_ABF,
abstract = {One of the important design criteria for distributed systems and their applications is their reliability and robustness to hardware and software failures. The increase in complexity, interconnectedness, dependency and the asynchronous interactions between the components that include hardware resources (computers, servers, network devices), and software (application services, middleware, web services, etc.) makes the fault detection and tolerance a challenging research problem. In this paper, we present an innovative approach based on statistical and data mining techniques to detect faults (hardware or software) and also identify the source of the fault. In our approach, we monitor and analyze in realtime all the interactions between all the components of a distributed system. We used data mining and supervised learning techniques to obtain the rules that can accurately model the normal interactions among these components. Our anomaly analysis engine will immediately produce an alert whenever one or more of the interaction rules that capture normal operations is violated due to a software or hardware failure. We evaluate the effectiveness of our approach and its performance to detect software faults that we inject asynchronously, and compare the results for different noise level.},
added-at = {2009-04-24T13:17:58.000+0200},
address = {Washington, DC, USA},
author = {uk Kim, Byoung and Hariri, Salim},
biburl = {https://www.bibsonomy.org/bibtex/2a324f2c8a20c456e1cca2d17d3368592/rgolombe},
booktitle = {SERA '07: Proceedings of the 5th ACIS International Conference on Software Engineering Research, Management \& Applications},
description = {Anomaly-based Fault Detection System in Distributed System},
doi = {http://dx.doi.org/10.1109/SERA.2007.55},
interhash = {ed94f12baabeb50dc663ccd06ab6a256},
intrahash = {a324f2c8a20c456e1cca2d17d3368592},
isbn = {0-7695-2867-8},
keywords = {ac autonomic autonomic_computing distributed_system imported self-healing unread},
pages = {782--789},
publisher = {IEEE Computer Society},
timestamp = {2009-04-24T13:17:58.000+0200},
title = {Anomaly-based Fault Detection System in Distributed System},
url = {http://portal.acm.org/citation.cfm?id=1307430},
year = 2007
}