Аннотация
In data mining, we emphasise the need for learning
from huge, incomplete and imperfect data sets (Fayyad
et al. 1996, Frawley et al. 1991, Piatetsky-Shapiro and
Frawley, 1991). To handle noise in the problem domain,
existing learning systems avoid overfitting the
imperfect training examples by excluding insignificant
patterns. The problem is that these systems use a
limiting attribute-value language for representing the
training examples and the induced knowledge. Moreover,
some important patterns are ignored because they are
statistically insignificant. In this paper, we present
a framework that combines Genetic Programming (Koza
1992; 1994) and Inductive Logic Programming (Muggleton,
1992) to induce knowledge represented in various
knowledge representation formalisms from noisy
databases. The framework is based on a formalism of
logic grammars and it can specify the search space
declaratively. An implementation of the framework,
LOGENPRO (The Logic grammar based GENetic PROgramming
system), has been developed. The performance of
LOGENPRO is evaluated on the chess endgame domain. We
compare LOGENPRO with FOIL and other learning systems
in detail and find its performance is significantly
better than that of the others. This result indicates
that the Darwinian principle of natural selection is a
plausible noise handling method which can avoid
overfitting and identify important patterns at the same
time. Moreover, the system is applied to one real-life
medical database. The knowledge discovered provides
insights to and allows better understanding of the
medical domains.
Пользователи данного ресурса
Пожалуйста,
войдите в систему, чтобы принять участие в дискуссии (добавить собственные рецензию, или комментарий)