Abstract
Genetic programming (GP) can be used to classify a
given gene sequence as either constitutively or
alternatively spliced. We describe the principles of GP
and apply it to a well-defined data set of
alternatively spliced genes. A feature matrix of
sequence properties, such as nucleotide composition or
exon length, was passed to the GP system Discipulus To
test its performance we concentrated on cassette exons
(SCE) and retained introns (SIR). We analysed 27,519
constitutively spliced and 9641 cassette exons
including their neighbouring introns; in addition we
analysed 33316 constitutively spliced introns compared
to 2712 retained introns. We find that the classifier
yields highly accurate predictions on the SIR data with
a sensitivity of 92.1percent and a specificity of
79.2percent. Prediction accuracies on the SCE data are
lower, 47.3percent (sensitivity) and 70.9percent
(specificity), indicating that alternative splicing of
introns can be better captured by sequence properties
than that of exons.
Users
Please
log in to take part in the discussion (add own reviews or comments).