Genetic Programming-Based Decision Trees for
Software Quality Classification
T. Khoshgoftaar, Y. Liu, and N. Seliya. Proceedings of the Fifteenth International Conference
on Tools with Artificial Intelligence (ICTAI 03), page 374--383. Los Alamitos, California, IEEE Computer Society, (3-5 November 2003)
Abstract
The knowledge of the likely problematic areas of a
software system is very useful for improving its
overall quality. Based on such information, a more
focused software testing and inspection plan can be
devised. Decision trees are attractive for a software
quality classification problem which predicts the
quality of program modules in terms of risk-based
classes. They provide a comprehensible classification
model which can be directly interpreted by observing
the tree-structure. A simultaneous optimisation of the
classification accuracy and the size of the decision
tree is a difficult problem, and very few studies have
addressed the issue. This paper presents an automated
and simplified genetic programming (gp) based decision
tree modelling technique for the software quality
classification problem. Genetic programming is ideally
suited for problems that require optimisation of
multiple criteria. The proposed technique is based on
multi-objective optimisation using strongly typed GP.
In the context of an industrial high-assurance software
system, two fitness functions are used for the
optimization problem: one for minimising the average
weighted cost of misclassification, and one for
controlling the size of the decision tree. The
classification performances of the GP-based decision
trees are compared with those based on standard GP,
i.e., S-expression tree. It is shown that the GP-based
decision tree technique yielded better classification
models. As compared to other decision tree-based
methods, such as C4.5, GP-based decision trees are more
flexible and can allow optimisation of performance
objectives other than accuracy. Moreover, it provides a
practical solution for building models in the presence
of conflicting objectives, which is commonly observed
in software development practice.
%0 Conference Paper
%1 Khoshgoftaar03
%A Khoshgoftaar, Taghi M.
%A Liu, Yi
%A Seliya, Naeem
%B Proceedings of the Fifteenth International Conference
on Tools with Artificial Intelligence (ICTAI 03)
%C Los Alamitos, California
%D 2003
%I IEEE Computer Society
%K C4.5 GP-based S-expression algorithms, automated classes, classification classification, cost, criteria, decision development, genetic inspection, metrics, misclassification model, module, multiobjective multiple optimization, program programming, quality quality, risk-based simultaneous software system, testing, tree, tree-structure trees,
%P 374--383
%T Genetic Programming-Based Decision Trees for
Software Quality Classification
%U http://doi.ieeecomputersociety.org/10.1109/TAI.2003.1250214
%X The knowledge of the likely problematic areas of a
software system is very useful for improving its
overall quality. Based on such information, a more
focused software testing and inspection plan can be
devised. Decision trees are attractive for a software
quality classification problem which predicts the
quality of program modules in terms of risk-based
classes. They provide a comprehensible classification
model which can be directly interpreted by observing
the tree-structure. A simultaneous optimisation of the
classification accuracy and the size of the decision
tree is a difficult problem, and very few studies have
addressed the issue. This paper presents an automated
and simplified genetic programming (gp) based decision
tree modelling technique for the software quality
classification problem. Genetic programming is ideally
suited for problems that require optimisation of
multiple criteria. The proposed technique is based on
multi-objective optimisation using strongly typed GP.
In the context of an industrial high-assurance software
system, two fitness functions are used for the
optimization problem: one for minimising the average
weighted cost of misclassification, and one for
controlling the size of the decision tree. The
classification performances of the GP-based decision
trees are compared with those based on standard GP,
i.e., S-expression tree. It is shown that the GP-based
decision tree technique yielded better classification
models. As compared to other decision tree-based
methods, such as C4.5, GP-based decision trees are more
flexible and can allow optimisation of performance
objectives other than accuracy. Moreover, it provides a
practical solution for building models in the presence
of conflicting objectives, which is commonly observed
in software development practice.
@inproceedings{Khoshgoftaar03,
abstract = {The knowledge of the likely problematic areas of a
software system is very useful for improving its
overall quality. Based on such information, a more
focused software testing and inspection plan can be
devised. Decision trees are attractive for a software
quality classification problem which predicts the
quality of program modules in terms of risk-based
classes. They provide a comprehensible classification
model which can be directly interpreted by observing
the tree-structure. A simultaneous optimisation of the
classification accuracy and the size of the decision
tree is a difficult problem, and very few studies have
addressed the issue. This paper presents an automated
and simplified genetic programming (gp) based decision
tree modelling technique for the software quality
classification problem. Genetic programming is ideally
suited for problems that require optimisation of
multiple criteria. The proposed technique is based on
multi-objective optimisation using strongly typed GP.
In the context of an industrial high-assurance software
system, two fitness functions are used for the
optimization problem: one for minimising the average
weighted cost of misclassification, and one for
controlling the size of the decision tree. The
classification performances of the GP-based decision
trees are compared with those based on standard GP,
i.e., S-expression tree. It is shown that the GP-based
decision tree technique yielded better classification
models. As compared to other decision tree-based
methods, such as C4.5, GP-based decision trees are more
flexible and can allow optimisation of performance
objectives other than accuracy. Moreover, it provides a
practical solution for building models in the presence
of conflicting objectives, which is commonly observed
in software development practice.},
added-at = {2008-06-19T17:35:00.000+0200},
address = {Los Alamitos, California},
author = {Khoshgoftaar, Taghi M. and Liu, Yi and Seliya, Naeem},
biburl = {https://www.bibsonomy.org/bibtex/294dfd8801c2c2b1971f5a8a4fdde7cdf/brazovayeye},
booktitle = {Proceedings of the Fifteenth International Conference
on Tools with Artificial Intelligence (ICTAI 03)},
interhash = {13d4723d17488cb39d0f115b6d265b89},
intrahash = {94dfd8801c2c2b1971f5a8a4fdde7cdf},
issn = {1082-3409},
keywords = {C4.5 GP-based S-expression algorithms, automated classes, classification classification, cost, criteria, decision development, genetic inspection, metrics, misclassification model, module, multiobjective multiple optimization, program programming, quality quality, risk-based simultaneous software system, testing, tree, tree-structure trees,},
month = {3-5 November},
notes = {Inspec Accession Number: 7862146},
pages = {374--383},
publisher = {IEEE Computer Society},
size = {10 pages},
timestamp = {2008-06-19T17:43:12.000+0200},
title = {Genetic {P}rogramming-{B}ased {D}ecision {T}rees for
{S}oftware {Q}uality {C}lassification},
url = {http://doi.ieeecomputersociety.org/10.1109/TAI.2003.1250214},
year = 2003
}