Abstract
Openly available datasets originate from different data providers which range from government agencies, over commercial enterprises to communities of data enthusiasts. Integrating different source datasets into a single RDF graph by using ETL (Extract-Transform-Load) sys- tems which perform offline transformation, ontology matching and link- ing techniques usually takes many iterations of revisions until the target dataset is made free of the most obvious mapping, linking and consis- tency errors. Since ETL systems produce the RDF offline, any map- ping or content change requires a re-ingest of the relevant source data. When dealing with heterogeneous source datasets, creating a unified tar- get dataset can be a tedious undertaking. Therefore the paper proposes an RDF view based ingestion approach, which allows real-time ``debug- ging'' of the unified dataset where mappings and links can be changed with immediate effect. Once the unified graph passes all data quality tests, the RDF can be materialized. This process poses an alternative to existing ETL solutions.
Users
Please
log in to take part in the discussion (add own reviews or comments).