Abstract
A classification algorithm, based on a multi-chip, multi-SNP
approach is proposed for Affymetrix SNP arrays. Current procedures
for calling genotypes on SNP arrays process all the features
associated with one chip and one SNP at a time. Using a large
training sample where the genotype labels are known, we develop a
supervised learning algorithm to obtain more accurate
classification results on new data. The method we propose, RLMM, is
based on a robustly fitted, linear model and uses the Mahalanobis
distance for classification. The chip-to-chip non-biological
variance is reduced through normalization. This model-based
algorithm captures the similarities across genotype groups and
probes, as well as across thousands of SNPs for accurate
classification. In this paper, we apply RLMM to Affymetrix 100 K
SNP array data, present classification results and compare them
with genotype calls obtained from the Affymetrix procedure DM, as
well as to the publicly available genotype calls from the HapMap
project.
Users
Please
log in to take part in the discussion (add own reviews or comments).