@thau

Exploring Scientific Workflow Provenance Using Hybrid Queries over Nested Data and Lineage Graphs

, , , and . Scientific and Statistical Database Management, (2009)

Abstract

Existing approaches for representing the provenance of scientific workflow runs largely ignore computation models that work over structured data, including XML. Unlike models based on transformation semantics, these computation models often employupdate semantics, in which only a portion of an incoming XML stream is modified by each workflow step. Applying conventionalprovenance approaches to such models results in provenance information that is either too coarse (e.g., stating that one versionof an XML document depends entirely on a prior version) or potentially incorrect (e.g., stating that each element of an XMLdocument depends on every element in a prior version). We describe a generic provenance model that naturally represents workflowruns involving processes that work over nested data collections and that employ update semantics. Moreover, we extend currentquery approaches to support our model, enabling queries to be posed not only over data lineage relationships, but also overversions of nested data structures produced during a workflow run. We show how hybrid queries can be expressed against ourmodel using high-level query constructs and implemented efficiently over relational provenance storage schemes.

Description

SpringerLink - Book Chapter

Links and resources

Tags

community

  • @ludaesch
  • @dblp
  • @thau
  • @tmcphillips
@thau's tags highlighted