@sac

Clause-iteration with MapReduce to scalably query datagraphs in the SHARD graph-store

, and . Proceedings of the fourth international workshop on Data-intensive distributed computing, page 35--44. New York, NY, USA, ACM, (2011)
DOI: 10.1145/1996014.1996021

Abstract

Graph data processing is an emerging application area for cloud computing because there are few other information infrastructures that cost-effectively permit scalable graph data processing. We present a scalable cloud-based approach to process queries on graph data utilizing the MapReduce model. We call this approach the Clause-Iteration approach. We present algorithms that, when used in conjunction with a MapReduce framework, respond to SPARQL queries over RDF data. Our innovation in the Clause-Iteration approach comes from 1) the iterative construction of query responses by incrementally growing the number of query clauses considered in a response, and 2) our use of flagged keys to join the results of these incremental responses. The Clause-Iteration algorithms form the basis of our scalable, SHARD graph-store built on the Hadoop implementation of MapReduce. SHARD performs favorably when compared to existing "industrial" graph-stores on a standard benchmark graph with 800 million edges. We discuss design considerations and alternatives associated with constructing scalable graph processing technologies.

Description

Clause-iteration with MapReduce to scalably query datagraphs in the SHARD graph-store

Links and resources

Tags

community

  • @sac
  • @dblp
@sac's tags highlighted