Article,

Genomic mining for complex disease traits with ``random chemistry''

M. Eppstein, J. Payne, B. White, and J. Moore.
Genetic Programming and Evolvable Machines, 8 (4): 395--411 (December 2007)special issue on medical applications of Genetic and Evolutionary Computation.
DOI: doi:10.1007/s10710-007-9039-5

Abstract

Our rapidly growing knowledge regarding genetic variation in the human genome offers great potential for understanding the genetic etiology of disease. This, in turn, could revolutionise detection, treatment, and in some cases prevention of disease. While genes for most of the rare monogenic diseases have already been discovered, most common diseases are complex traits, resulting from multiple gene-gene and gene-environment interactions. Detecting epistatic genetic interactions that predispose for disease is an important, but computationally daunting, task currently facing bioinformaticists. Here, we propose a new evolutionary approach that attempts to hill-climb from large sets of candidate epistatic genetic features to smaller sets, inspired by Kauffman's ``random chemistry'' approach to detecting small auto-catalytic sets of molecules from within large sets. Although the algorithm is conceptually straightforward, its success hinges upon the creation of a fitness function able to discriminate large sets that contain subsets of interacting genetic features from those that don't. Here, we employ an approximate and noisy fitness function based on the ReliefF data mining algorithm. We establish proof-of-concept using synthetic data sets, where individual features have no marginal effects. We show that the resulting algorithm can successfully detect epistatic pairs from up to 1,000 candidate single nucleotide polymorphisms in time that is linear in the size of the initial set, although success rate degrades as heritability declines. Research continues into seeking a more accurate fitness approximator for large sets and other algorithmic improvements that will enable us to extend the approach to larger data sets and to lower heritabilities.

BibTeX key: Eppstein:2007:GPEM
entry type: article
year: 2007
month: December
journal: Genetic Programming and Evolvable Machines
number: 4
pages: 395--411
volume: 8
issn: 1389-2576
notes: SNP, ROC, AUC
DOI: doi:10.1007/s10710-007-9039-5
note: special issue on medical applications of Genetic and Evolutionary Computation

Users

Comments and Reviewsshow / hide

Please log in to take part in the discussion (add own reviews or comments).

Cite this publication

%0 Journal Article %1 Eppstein:2007:GPEM %A Eppstein, Margaret J. %A Payne, Joshua L. %A White, Bill C. %A Moore, Jason H. %D 2007 %J Genetic Programming and Evolvable Machines %K Complex Data Epistasis, Evolutionary Feature Genome-wide Single algorithms, association mining, nucleotide polymorphisms, selection studies, traits, %N 4 %P 395--411 %R doi:10.1007/s10710-007-9039-5 %T Genomic mining for complex disease traits with ``random chemistry'' %V 8 %X Our rapidly growing knowledge regarding genetic variation in the human genome offers great potential for understanding the genetic etiology of disease. This, in turn, could revolutionise detection, treatment, and in some cases prevention of disease. While genes for most of the rare monogenic diseases have already been discovered, most common diseases are complex traits, resulting from multiple gene-gene and gene-environment interactions. Detecting epistatic genetic interactions that predispose for disease is an important, but computationally daunting, task currently facing bioinformaticists. Here, we propose a new evolutionary approach that attempts to hill-climb from large sets of candidate epistatic genetic features to smaller sets, inspired by Kauffman's ``random chemistry'' approach to detecting small auto-catalytic sets of molecules from within large sets. Although the algorithm is conceptually straightforward, its success hinges upon the creation of a fitness function able to discriminate large sets that contain subsets of interacting genetic features from those that don't. Here, we employ an approximate and noisy fitness function based on the ReliefF data mining algorithm. We establish proof-of-concept using synthetic data sets, where individual features have no marginal effects. We show that the resulting algorithm can successfully detect epistatic pairs from up to 1,000 candidate single nucleotide polymorphisms in time that is linear in the size of the initial set, although success rate degrades as heritability declines. Research continues into seeking a more accurate fitness approximator for large sets and other algorithmic improvements that will enable us to extend the approach to larger data sets and to lower heritabilities.

@article{Eppstein:2007:GPEM, abstract = {Our rapidly growing knowledge regarding genetic variation in the human genome offers great potential for understanding the genetic etiology of disease. This, in turn, could revolutionise detection, treatment, and in some cases prevention of disease. While genes for most of the rare monogenic diseases have already been discovered, most common diseases are complex traits, resulting from multiple gene-gene and gene-environment interactions. Detecting epistatic genetic interactions that predispose for disease is an important, but computationally daunting, task currently facing bioinformaticists. Here, we propose a new evolutionary approach that attempts to hill-climb from large sets of candidate epistatic genetic features to smaller sets, inspired by Kauffman's ``random chemistry'' approach to detecting small auto-catalytic sets of molecules from within large sets. Although the algorithm is conceptually straightforward, its success hinges upon the creation of a fitness function able to discriminate large sets that contain subsets of interacting genetic features from those that don't. Here, we employ an approximate and noisy fitness function based on the ReliefF data mining algorithm. We establish proof-of-concept using synthetic data sets, where individual features have no marginal effects. We show that the resulting algorithm can successfully detect epistatic pairs from up to 1,000 candidate single nucleotide polymorphisms in time that is linear in the size of the initial set, although success rate degrades as heritability declines. Research continues into seeking a more accurate fitness approximator for large sets and other algorithmic improvements that will enable us to extend the approach to larger data sets and to lower heritabilities.}, added-at = {2008-06-19T17:35:00.000+0200}, author = {Eppstein, Margaret J. and Payne, Joshua L. and White, Bill C. and Moore, Jason H.}, biburl = {https://www.bibsonomy.org/bibtex/24b187002d78f88b37551876c29de2af6/brazovayeye}, doi = {doi:10.1007/s10710-007-9039-5}, interhash = {7aa2a69c5125a3cea8607efbbab16391}, intrahash = {4b187002d78f88b37551876c29de2af6}, issn = {1389-2576}, journal = {Genetic Programming and Evolvable Machines}, keywords = {Complex Data Epistasis, Evolutionary Feature Genome-wide Single algorithms, association mining, nucleotide polymorphisms, selection studies, traits,}, month = {December}, note = {special issue on medical applications of Genetic and Evolutionary Computation}, notes = {SNP, ROC, AUC}, number = 4, pages = {395--411}, timestamp = {2008-06-19T17:39:15.000+0200}, title = {Genomic mining for complex disease traits with ``random chemistry''}, volume = 8, year = 2007 }

BibSonomy

Genomic mining for complex disease traits with ``random chemistry''

Abstract

Tags

Users

Comments and Reviewsshow / hide

Cite this publication

More citation styles

search on