Abstract
Our rapidly growing knowledge regarding genetic
variation in the human genome offers great potential
for understanding the genetic etiology of disease.
This, in turn, could revolutionise detection,
treatment, and in some cases prevention of disease.
While genes for most of the rare monogenic diseases
have already been discovered, most common diseases are
complex traits, resulting from multiple gene-gene and
gene-environment interactions. Detecting epistatic
genetic interactions that predispose for disease is an
important, but computationally daunting, task currently
facing bioinformaticists. Here, we propose a new
evolutionary approach that attempts to hill-climb from
large sets of candidate epistatic genetic features to
smaller sets, inspired by Kauffman's ``random
chemistry'' approach to detecting small auto-catalytic
sets of molecules from within large sets. Although the
algorithm is conceptually straightforward, its success
hinges upon the creation of a fitness function able to
discriminate large sets that contain subsets of
interacting genetic features from those that don't.
Here, we employ an approximate and noisy fitness
function based on the ReliefF data mining algorithm. We
establish proof-of-concept using synthetic data sets,
where individual features have no marginal effects. We
show that the resulting algorithm can successfully
detect epistatic pairs from up to 1,000 candidate
single nucleotide polymorphisms in time that is linear
in the size of the initial set, although success rate
degrades as heritability declines. Research continues
into seeking a more accurate fitness approximator for
large sets and other algorithmic improvements that will
enable us to extend the approach to larger data sets
and to lower heritabilities.
Users
Please
log in to take part in the discussion (add own reviews or comments).