Evolving Regular Expression-based Sequence Classifiers
for Protein Nuclear Localisation
A. Heddad, M. Brameier, и R. MacCallum. Applications of Evolutionary Computing,
EvoWorkshops2004: EvoBIO, EvoCOMNET, EvoHOT,
EvoIASP, EvoMUSART, EvoSTOC, том 3005 из LNCS, стр. 31--40. Coimbra, Portugal, Springer Verlag, (5-7 April 2004)
Аннотация
A number of bioinformatics tools use regular
expression (RE) matching to locate protein or DNA
sequence motifs that have been discovered by
researchers in the laboratory. For example, patterns
representing nuclear localisation signals (NLSs) are
used to predict nuclear localisation. NLSs are not yet
well understood, and so the set of currently known NLSs
may be incomplete. Here we use genetic programming (GP)
to generate RE-based classifiers for nuclear
localisation. While the approach is a supervised one
(with respect to protein location), it is unsupervised
with respect to already known NLSs. It therefore has
the potential to discover new NLS motifs. We apply both
tree based and linear GP to the problem. The inclusion
of predicted secondary structure in the input does not
improve performance. Benchmarking shows that our
majority classifiers are competitive with existing
tools. The evolved REs are usually "NLS like" and
work is underway to analyse these for novelty.
%0 Conference Paper
%1 heddad:evows04
%A Heddad, Amine
%A Brameier, Markus
%A MacCallum, Robert M.
%B Applications of Evolutionary Computing,
EvoWorkshops2004: EvoBIO, EvoCOMNET, EvoHOT,
EvoIASP, EvoMUSART, EvoSTOC
%C Coimbra, Portugal
%D 2004
%E Raidl, Guenther R.
%E Cagnoni, Stefano
%E Branke, Jurgen
%E Corne, David W.
%E Drechsler, Rolf
%E Jin, Yaochu
%E Johnson, Colin R.
%E Machado, Penousal
%E Marchiori, Elena
%E Rothlauf, Franz
%E Smith, George D.
%E Squillero, Giovanni
%I Springer Verlag
%K BNF, GP, LGP, RE, algorithms, computation, evolutionary expressions genetic grammar, linear perl, programming, regular
%P 31--40
%T Evolving Regular Expression-based Sequence Classifiers
for Protein Nuclear Localisation
%U http://www.sbc.su.se/~maccallr/publications/heddad-evobio2004.pdf
%V 3005
%X A number of bioinformatics tools use regular
expression (RE) matching to locate protein or DNA
sequence motifs that have been discovered by
researchers in the laboratory. For example, patterns
representing nuclear localisation signals (NLSs) are
used to predict nuclear localisation. NLSs are not yet
well understood, and so the set of currently known NLSs
may be incomplete. Here we use genetic programming (GP)
to generate RE-based classifiers for nuclear
localisation. While the approach is a supervised one
(with respect to protein location), it is unsupervised
with respect to already known NLSs. It therefore has
the potential to discover new NLS motifs. We apply both
tree based and linear GP to the problem. The inclusion
of predicted secondary structure in the input does not
improve performance. Benchmarking shows that our
majority classifiers are competitive with existing
tools. The evolved REs are usually "NLS like" and
work is underway to analyse these for novelty.
%@ 3-540-21378-3
@inproceedings{heddad:evows04,
abstract = {A number of bioinformatics tools use regular
expression (RE) matching to locate protein or DNA
sequence motifs that have been discovered by
researchers in the laboratory. For example, patterns
representing nuclear localisation signals (NLSs) are
used to predict nuclear localisation. NLSs are not yet
well understood, and so the set of currently known NLSs
may be incomplete. Here we use genetic programming (GP)
to generate RE-based classifiers for nuclear
localisation. While the approach is a supervised one
(with respect to protein location), it is unsupervised
with respect to already known NLSs. It therefore has
the potential to discover new NLS motifs. We apply both
tree based and linear GP to the problem. The inclusion
of predicted secondary structure in the input does not
improve performance. Benchmarking shows that our
majority classifiers are competitive with existing
tools. The evolved REs are usually {"}NLS like{"} and
work is underway to analyse these for novelty.},
added-at = {2008-06-19T17:35:00.000+0200},
address = {Coimbra, Portugal},
author = {Heddad, Amine and Brameier, Markus and MacCallum, Robert M.},
biburl = {https://www.bibsonomy.org/bibtex/29328df5f7830c733a6abab966134d221/brazovayeye},
booktitle = {Applications of Evolutionary Computing,
EvoWorkshops2004: {EvoBIO}, {EvoCOMNET}, {EvoHOT},
{EvoIASP}, {EvoMUSART}, {EvoSTOC}},
editor = {Raidl, Guenther R. and Cagnoni, Stefano and Branke, Jurgen and Corne, David W. and Drechsler, Rolf and Jin, Yaochu and Johnson, Colin R. and Machado, Penousal and Marchiori, Elena and Rothlauf, Franz and Smith, George D. and Squillero, Giovanni},
interhash = {e3ac49d134d91b5dd1f4b508c3987ca6},
intrahash = {9328df5f7830c733a6abab966134d221},
isbn = {3-540-21378-3},
keywords = {BNF, GP, LGP, RE, algorithms, computation, evolutionary expressions genetic grammar, linear perl, programming, regular},
month = {5-7 April},
notes = {EvoWorkshops2004, perlGP, grammar (not needed, cf
p39?). http://www.sbc.su.se/~maccallr/nucpred/ perl
eval(), grammar, stgp, matches(),, pdiv, plog, multiple
classifier combination majority vote. 'No crossover is
allowed between REs' p38. Removing ineffective code.
'LGP very close to PerlGP' p38. RE matching done in C.
cf. \cite{brameier:nucpred}},
pages = {31--40},
publisher = {Springer Verlag},
publisher_address = {Berlin},
series = {LNCS},
timestamp = {2008-06-19T17:41:16.000+0200},
title = {Evolving Regular Expression-based Sequence Classifiers
for Protein Nuclear Localisation},
url = {http://www.sbc.su.se/~maccallr/publications/heddad-evobio2004.pdf},
volume = 3005,
year = 2004
}