Zusammenfassung
High density oligonucleotide expression arrays are widely used in
many areas of biomedical research. Affymetrix GeneChip arrays are
the most popular. In the Affymetrix system, a fair amount of further
pre-processing and data reduction occurs following the image processing
step. Statistical procedures developed by academic groups have been
successful at improving the default algorithms provided by the Affymetrix
system. In this paper we present a solution to one of the pre-processing
steps, background adjustment, based on a formal statistical framework.
Our solution greatly improves the performance of the technology in
various practical applications.
Affymetrix GeneChip arrays use short oligonucleotides to probe for
genes in an RNA sample. Typically each gene will be represented by
11-20 pairs of oligonucleotide probes. The first component of these
pairs is referred to as a perfect match probe and is designed to
hybridize only with transcripts from the intended gene (specific
hybridization). However, hybridization by other sequences (non-specific
hybridization) is unavoidable. Furthermore, hybridization strengths
are measured by a scanner that introduces optical noise. Therefore,
the observed intensities need to be adjusted to give accurate measurements
of specific hybridization. One approach to adjusting is to pair each
perfect match probe with a mismatch probe that is designed with the
intention of measuring non-specific hybridization. The default adjustment,
provided as part of the Affymetrix system, is based on the difference
between perfect match and mismatch probe intensities. We have found
that this approach can be improved via the use of estimators derived
from a statistical model that use probe sequence information. The
model is based on simple hybridization theory from molecular biology
and experiments specifically designed to help develop it.
A final step in the pre-processing of these arrays is to combine the
11-20 probe pair intensities, after background adjustment and normalization,
for a given gene to define a measure of expression that represents
the amount of the corresponding mRNA species. In this paper we illustrate
the practical consequences of not adjusting appropriately for the
presence of nonspecific hybridization and provide a solution based
on our background adjustment procedure. Software that computes our
adjustment is available as part of the Bioconductor project (http://www.bioconductor.
Nutzer