Clause-iteration with MapReduce to scalably query datagraphs in the SHARD graph-store
K. Rohloff, and R. Schantz. Proceedings of the fourth international workshop on Data-intensive distributed computing, page 35--44. New York, NY, USA, ACM, (2011)
DOI: 10.1145/1996014.1996021
Abstract
Graph data processing is an emerging application area for cloud computing because there are few other information infrastructures that cost-effectively permit scalable graph data processing. We present a scalable cloud-based approach to process queries on graph data utilizing the MapReduce model. We call this approach the Clause-Iteration approach. We present algorithms that, when used in conjunction with a MapReduce framework, respond to SPARQL queries over RDF data. Our innovation in the Clause-Iteration approach comes from 1) the iterative construction of query responses by incrementally growing the number of query clauses considered in a response, and 2) our use of flagged keys to join the results of these incremental responses. The Clause-Iteration algorithms form the basis of our scalable, SHARD graph-store built on the Hadoop implementation of MapReduce. SHARD performs favorably when compared to existing "industrial" graph-stores on a standard benchmark graph with 800 million edges. We discuss design considerations and alternatives associated with constructing scalable graph processing technologies.
Description
Clause-iteration with MapReduce to scalably query datagraphs in the SHARD graph-store
%0 Conference Paper
%1 Rohloff:2011:CMS:1996014.1996021
%A Rohloff, Kurt
%A Schantz, Richard E.
%B Proceedings of the fourth international workshop on Data-intensive distributed computing
%C New York, NY, USA
%D 2011
%I ACM
%K graph mapreduce query rdf scalably
%P 35--44
%R 10.1145/1996014.1996021
%T Clause-iteration with MapReduce to scalably query datagraphs in the SHARD graph-store
%U http://doi.acm.org/10.1145/1996014.1996021
%X Graph data processing is an emerging application area for cloud computing because there are few other information infrastructures that cost-effectively permit scalable graph data processing. We present a scalable cloud-based approach to process queries on graph data utilizing the MapReduce model. We call this approach the Clause-Iteration approach. We present algorithms that, when used in conjunction with a MapReduce framework, respond to SPARQL queries over RDF data. Our innovation in the Clause-Iteration approach comes from 1) the iterative construction of query responses by incrementally growing the number of query clauses considered in a response, and 2) our use of flagged keys to join the results of these incremental responses. The Clause-Iteration algorithms form the basis of our scalable, SHARD graph-store built on the Hadoop implementation of MapReduce. SHARD performs favorably when compared to existing "industrial" graph-stores on a standard benchmark graph with 800 million edges. We discuss design considerations and alternatives associated with constructing scalable graph processing technologies.
%@ 978-1-4503-0704-8
@inproceedings{Rohloff:2011:CMS:1996014.1996021,
abstract = {Graph data processing is an emerging application area for cloud computing because there are few other information infrastructures that cost-effectively permit scalable graph data processing. We present a scalable cloud-based approach to process queries on graph data utilizing the MapReduce model. We call this approach the Clause-Iteration approach. We present algorithms that, when used in conjunction with a MapReduce framework, respond to SPARQL queries over RDF data. Our innovation in the Clause-Iteration approach comes from 1) the iterative construction of query responses by incrementally growing the number of query clauses considered in a response, and 2) our use of flagged keys to join the results of these incremental responses. The Clause-Iteration algorithms form the basis of our scalable, SHARD graph-store built on the Hadoop implementation of MapReduce. SHARD performs favorably when compared to existing "industrial" graph-stores on a standard benchmark graph with 800 million edges. We discuss design considerations and alternatives associated with constructing scalable graph processing technologies.},
acmid = {1996021},
added-at = {2012-08-02T15:04:35.000+0200},
address = {New York, NY, USA},
author = {Rohloff, Kurt and Schantz, Richard E.},
biburl = {https://www.bibsonomy.org/bibtex/22f570eea4604f70e8f4b39c88c78d0c6/sac},
booktitle = {Proceedings of the fourth international workshop on Data-intensive distributed computing},
description = {Clause-iteration with MapReduce to scalably query datagraphs in the SHARD graph-store},
doi = {10.1145/1996014.1996021},
interhash = {24cf46fdcae8eeaf8336024641bdbed3},
intrahash = {2f570eea4604f70e8f4b39c88c78d0c6},
isbn = {978-1-4503-0704-8},
keywords = {graph mapreduce query rdf scalably},
location = {San Jose, California, USA},
numpages = {10},
pages = {35--44},
publisher = {ACM},
series = {DIDC '11},
timestamp = {2012-08-02T15:04:35.000+0200},
title = {Clause-iteration with MapReduce to scalably query datagraphs in the SHARD graph-store},
url = {http://doi.acm.org/10.1145/1996014.1996021},
year = 2011
}