@vinayaka2000

Querying Provenance Information in Distributed Environments

, , , and . International Journal of Computers and Their Applications (IJCA), 18 (3): 196--215 (September 2011)

Abstract

The growing recognition of the importance of provenance for data intensive and multidisciplinary domains is leading to careful collection of provenance. One consequence of this is the proliferation of provenance repositories hosted for individual organization or communities, with limited ability to reconstruct and query for and on provenance across them. Community standards like the Open Provenance Model (OPM) allow uniform interpretation and exchange of provenance metadata but do not prescribe query or service specifications to access provenance. If data reuse and sharing across institutions is not accompanied by passing provenance at the time of data exchange, we need to track the provenance and query for them or over them across distributed provenance repositories. In this article, we present approaches for querying over distributed provenance information, and address two common provenance query models that we formalize: provenance retrieval query and provenance filter query. Our problem is motivated by Smart Oilfield applications in the energy informatics domain, and we evaluate the performance of our algorithms using synthetic workflows based on the domain.

Links and resources

Tags

community

  • @simmhan
  • @vinayaka2000
  • @dblp
@vinayaka2000's tags highlighted