PhD thesis,

Discovery of characteristic knowledge in databases using cluster analysis and genetic programming

.
Department of Computer Science, University of Houston, USA, (December 1998)

Abstract

Knowledge discovery in data (KDD) is the generic approach to analyse and extract useful knowledge from data collections using computerised tools. Applying KDD techniques directly to a database is not straightforward, since in a database, there may be several views of the database depending on the user's interests, unlike the data collections stored in a single flat file format. Moreover, in many cases, there is a data model discrepancy between the target database and the representation format for the input data set that most KDD techniques expect. The presented research centres on developing methodologies, techniques, and tools to discover useful characteristic knowledge in databases. Our approach is first to partition a given database into several clusters with similar properties using cluster analysis, and then to discover characteristic knowledge in each cluster using genetic programming. In this research, we analyzed the problems in clustering databases. We proposed an extended data set format as an input data set format that can store related information unlike a traditional flat file format. We developed an automatic tool that generates an extended data set from databases, which may contain the related information from related tables or classes. We proposed a unified similarity framework that can cope with various kinds of data sets, and generalised clustering algorithms for the proposed similarity framework. We also developed a discovery system that takes the set of data objects in each cluster and discovers characteristic knowledge for the given object set using genetic programming.

Tags

Users

  • @brazovayeye

Comments and Reviews