Abstract
Short interfering RNA (siRNA) efficacy prediction
algorithms aim to increase the probability of selecting
target sites that are applicable for gene silencing by
RNA interference. Many algorithms have been published
recently, and they base their predictions on such
different features as duplex stability, sequence
characteristics, mRNA secondary structure, and target
site uniqueness. We compare the performance of the
algorithms on a collection of publicly available
siRNAs. First, we show that our regularised genetic
programming algorithm GPboost appears to have a higher
and more stable performance than other algorithms on
the collected datasets. Second, several algorithms gave
close to random classification on unseen data, and only
GPboost and three other algorithms have a reasonably
high and stable performance on all parts of the
dataset. Third, the results indicate that the siRNAs'
sequence is sufficient input to siRNA efficacy
algorithms, and that other features that have been
suggested to be important may be indirectly captured by
the sequence.
Users
Please
log in to take part in the discussion (add own reviews or comments).