The Application of Genetic Programming For Feature
Construction in Classification
M. Muharram. School of Computing Sciences at the University of East
Anglia, Norwich, England, (July 2005)
Abstract
This Thesis addresses the task of feature construction
for classification. The quality of the data is one of
the most important factors influencing the performance
of any classification algorithm. The attributes
defining the feature space of a given data set can
often be inadequate, making it difficult to discover
interesting knowledge. However, even when the original
attributes are individually inadequate, it is often
possible to combine such attributes in order to
construct new ones with greater predictive power.
The goal of this Thesis is to restructure the feature
space in order to improve the performance of decision
tree classification techniques on complex, real world
data. The proposed framework involves the use of
genetic programming to evolve (construct) new
attributes, which are non-linear combinations of the
original attributes. This approach incorporates a
number of decision tree splitting mechanisms in the
fitness measures of the genetic program.
The empirical results obtained are encouraging and show
that classification techniques can definitely benefit
from the inclusion of an evolved attribute in terms of
the accuracy and model size (for decision tree
classifiers). When compared to existing approaches, the
use of a decision tree splitting criteria as the
fitness of the genetic program prove to be competitive
and robust in terms predictive accuracy. Additionally,
some of the evolved attributes manage to uncover
physical properties in the data.
School of Computing Sciences at the University of East
Anglia
notes
Evolving new more predictive features for a number of
classification techniques, particularly, decision tree
classification algorithms.
The GP incorporates the splitting mechanism of a
decision tree classifier as its fitness for
constructing new features.
pages3-4
%0 Thesis
%1 Muharram:thesis
%A Muharram, Mohammed Ahmed Yahya
%C Norwich, England
%D 2005
%K algorithms, genetic programming
%T The Application of Genetic Programming For Feature
Construction in Classification
%U http://www.cs.ucl.ac.uk/staff/W.Langdon/ftp/papers/Muharram_thesis.pdf
%X This Thesis addresses the task of feature construction
for classification. The quality of the data is one of
the most important factors influencing the performance
of any classification algorithm. The attributes
defining the feature space of a given data set can
often be inadequate, making it difficult to discover
interesting knowledge. However, even when the original
attributes are individually inadequate, it is often
possible to combine such attributes in order to
construct new ones with greater predictive power.
The goal of this Thesis is to restructure the feature
space in order to improve the performance of decision
tree classification techniques on complex, real world
data. The proposed framework involves the use of
genetic programming to evolve (construct) new
attributes, which are non-linear combinations of the
original attributes. This approach incorporates a
number of decision tree splitting mechanisms in the
fitness measures of the genetic program.
The empirical results obtained are encouraging and show
that classification techniques can definitely benefit
from the inclusion of an evolved attribute in terms of
the accuracy and model size (for decision tree
classifiers). When compared to existing approaches, the
use of a decision tree splitting criteria as the
fitness of the genetic program prove to be competitive
and robust in terms predictive accuracy. Additionally,
some of the evolved attributes manage to uncover
physical properties in the data.
@phdthesis{Muharram:thesis,
abstract = {This Thesis addresses the task of feature construction
for classification. The quality of the data is one of
the most important factors influencing the performance
of any classification algorithm. The attributes
defining the feature space of a given data set can
often be inadequate, making it difficult to discover
interesting knowledge. However, even when the original
attributes are individually inadequate, it is often
possible to combine such attributes in order to
construct new ones with greater predictive power.
The goal of this Thesis is to restructure the feature
space in order to improve the performance of decision
tree classification techniques on complex, real world
data. The proposed framework involves the use of
genetic programming to evolve (construct) new
attributes, which are non-linear combinations of the
original attributes. This approach incorporates a
number of decision tree splitting mechanisms in the
fitness measures of the genetic program.
The empirical results obtained are encouraging and show
that classification techniques can definitely benefit
from the inclusion of an evolved attribute in terms of
the accuracy and model size (for decision tree
classifiers). When compared to existing approaches, the
use of a decision tree splitting criteria as the
fitness of the genetic program prove to be competitive
and robust in terms predictive accuracy. Additionally,
some of the evolved attributes manage to uncover
physical properties in the data.},
added-at = {2008-06-19T17:46:40.000+0200},
address = {Norwich, England},
author = {Muharram, Mohammed Ahmed Yahya},
biburl = {https://www.bibsonomy.org/bibtex/229df9726bd41cb5844694e4c2c473dd9/brazovayeye},
interhash = {6847bac0488e4c9a3e3540c10ef33d57},
intrahash = {29df9726bd41cb5844694e4c2c473dd9},
keywords = {algorithms, genetic programming},
month = {July},
notes = {Evolving new more predictive features for a number of
classification techniques, particularly, decision tree
classification algorithms.
The GP incorporates the splitting mechanism of a
decision tree classifier as its fitness for
constructing new features.
pages3-4},
school = {School of Computing Sciences at the University of East
Anglia},
size = {187 pages},
timestamp = {2008-06-19T17:47:45.000+0200},
title = {The Application of Genetic Programming For Feature
Construction in Classification},
url = {http://www.cs.ucl.ac.uk/staff/W.Langdon/ftp/papers/Muharram_thesis.pdf},
year = 2005
}