Inproceedings,

Availability modeling and analysis on high performance cluster computing systems

H. Song, C. Leangsuksun, R. Nassar, N. Gottumukkala, and S. Scott.
Availability, Reliability and Security, 2006. ARES 2006. The First International Conference on, page 8 pp.-. (April 2006)
DOI: 10.1109/ARES.2006.37

Abstract

Cluster computing has been attracting more and more attention from both the industry and the academia for its enormous computing power, cost effectiveness, and scalability. Availability is a key system attribute that needs to be considered both at system design stage and must reflect the actuality. System monitoring and logging enables identifying unplanned events to reflect the actual system's availability. This paper proposes a single framework that coordinates event monitoring, filtering, data analysis and dynamic availability modeling. The availability model is abstracted and categorized based on functionality. We describe the proposed architecture, and a sample analysis of real time event logs from a 512 node cluster from Lawrence Livermore National Laboratory.

BibTeX key: song2006availability
entry type: inproceedings
booktitle: Availability, Reliability and Security, 2006. ARES 2006. The First International Conference on
year: 2006
month: April
pages: 8 pp.-
DOI: 10.1109/ARES.2006.37
url: http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=1625325

BibSonomy

Availability modeling and analysis on high performance cluster computing systems

Abstract

Tags

Users

Comments and Reviewsshow / hide

Cite this publication

More citation styles

search on