Abstract
Knowledge discovery in data (KDD) is the generic
approach to analyse and extract useful knowledge from
data collections using computerised tools. Applying KDD
techniques directly to a database is not
straightforward, since in a database, there may be
several views of the database depending on the user's
interests, unlike the data collections stored in a
single flat file format. Moreover, in many cases, there
is a data model discrepancy between the target database
and the representation format for the input data set
that most KDD techniques expect. The presented research
centres on developing methodologies, techniques, and
tools to discover useful characteristic knowledge in
databases. Our approach is first to partition a given
database into several clusters with similar properties
using cluster analysis, and then to discover
characteristic knowledge in each cluster using genetic
programming. In this research, we analyzed the problems
in clustering databases. We proposed an extended data
set format as an input data set format that can store
related information unlike a traditional flat file
format. We developed an automatic tool that generates
an extended data set from databases, which may contain
the related information from related tables or classes.
We proposed a unified similarity framework that can
cope with various kinds of data sets, and generalised
clustering algorithms for the proposed similarity
framework. We also developed a discovery system that
takes the set of data objects in each cluster and
discovers characteristic knowledge for the given object
set using genetic programming.
Users
Please
log in to take part in the discussion (add own reviews or comments).