Article,

Hybrid data mining-regression for infrastructure risk assessment based on zero-inflated data

, and .
Reliability Engineering & System Safety, (March 2012)
DOI: 10.1016/j.ress.2011.10.012

Abstract

Infrastructure disaster risk assessment seeks to estimate the probability of a given customer or area losing service during a disaster, sometimes in conjunction with estimating the duration of each outage. This is often done on the basis of past data about the effects of similar events impacting the same or similar systems. In many situations this past performance data from infrastructure systems is zero-inflated; it has more zeros than can be appropriately modeled with standard probability distributions. The data are also often non-linear and exhibit threshold effects due to the complexities of infrastructure system performance. Standard zero-inflated statistical models such as zero-inflated Poisson and zero-inflated negative binomial regression models do not adequately capture these complexities. In this paper we develop a novel method that is a hybrid classification tree/regression method for complex, zero-inflated data sets. We investigate its predictive accuracy based on a large number of simulated data sets and then demonstrate its practical usefulness with an application to hurricane power outage risk assessment for a large utility based on actual data from the utility. While formulated for infrastructure disaster risk assessment, this method is promising for data-driven analysis for other situations with zero-inflated, complex data exhibiting response thresholds.

Tags

Users

  • @yourwelcome

Comments and Reviews