Abstract

<b>Motivation:</b> Ranking gene feature sets is a key issue for both phenotype classification, for instance, tumor classification in a DNA microarray experiment, and prediction in the context of genetic regulatory networks. Two broad methods are available to estimate the error (misclassification rate) of a classifier. Resubstitution fits a single classifier to the data, and applies this classifier in turn to each data observation. Cross-validation (in leave-one-out form) removes each observation in turn, constructs the classifier, and then computes whether this leave-one-out classifier correctly classifies the deleted observation. Resubstitution typically underestimates classifier error, severely so in many cases. Cross-validation has the advantage of producing an effectively unbiased error estimate, but the estimate is highly variable. In many applications it is not the misclassification rate <i>per se</i> that is of interest, but rather the construction of gene sets that have the potential to classify or predict. Hence, one needs to rank feature sets based on their performance.</p> <p><b>Results:</b> A model-based approach is used to compare the ranking performances of resubstitution and cross-validation for classification based on real-valued feature sets and for prediction in the context of probabilistic Boolean networks (PBNs). For classification, a Gaussian model is considered, along with classification via linear discriminant analysis and the 3-nearest-neighbor classification rule. Prediction is examined in the steady-distribution of a PBN. Three metrics are proposed to compare feature-set ranking based on error estimation with ranking based on the true error, which is known owing to the model-based approach. In all cases, resubstitution is competitive with cross-validation relative to ranking accuracy. This is in addition to the enormous savings in computation time afforded by resubstitution.

Description

Is cross-validation better than resubstitution for ranking genes?

Links and resources

Tags

community