Abstract

During data collection and analysis, it is often necessary to identify and possibly remove outliers that exist. An objective method for identifying outliers to be removed is critical. Many automated outlier detection methods are available. However, many are limited by assumptions of a distribution or require upper and lower predefined boundaries in which the data should exist. If there is a known distribution for the data, then using that distribution can aid in finding outliers. Often, a distribution is not known, or the experimenter does not want to make an assumption about a certain distribution. Also, enough information may not exist about a set of data to be able to determine reliable upper and lower boundaries. For these cases, an outlier detection method, using the empirical data and based upon Chebyshev’s inequality, was formed. This method allows for detection of multiple outliers, not just one at a time. This method also assumes that the data are independent measurements and that a relatively small percentage of outliers is contained in the data.

Links and resources

Tags