Inproceedings,

Automatic data fusion with HumMer

, , , , , and .
Proc. of the 31st International Conference on Very Large Data Bases (VLDB), Demo Abstract Band, (2005)

Abstract

Heterogeneous and dirty data is abundant. It is stored under different, often opaque schemata, it represents identical real-world objects multiple times, causing duplicates, and it has missing values and conflicting values. The Humboldt Merger (HumMer) is a tool that allows ad-hoc, declarative fusion of such data using a simple extension to SQL.Guided by a query against multiple tables, HumMer proceeds in three fully automated steps: First, instance-based schema matching bridges schematic heterogeneity of the tables by aligning corresponding attributes. Next, duplicate detection techniques find multiple representations of identical real-world objects. Finally, data fusion and conflict resolution merges duplicates into a single, consistent, and clean representation.

Tags

Users

  • @tkirsten

Comments and Reviews