Inproceedings,

Goods: Organizing Google's Datasets

A. Halevy, F. Korn, N. Noy, C. Olston, N. Polyzotis, S. Roy, and S. Whang.
Proceedings of the 2016 International Conference on Management of Data, page 795--806. New York, NY, USA, ACM, (2016)
DOI: 10.1145/2882903.2903730

Abstract

Enterprises increasingly rely on structured datasets to run their businesses. These datasets take a variety of forms, such as structured files, databases, spreadsheets, or even services that provide access to the data. The datasets often reside in different storage systems, may vary in their formats, may change every day. In this paper, we present GOODS, a project to rethink how we organize structured datasets at scale, in a setting where teams use diverse and often idiosyncratic ways to produce the datasets and where there is no centralized system for storing and querying them. GOODS extracts metadata ranging from salient information about each dataset (owners, timestamps, schema) to relationships among datasets, such as similarity and provenance. It then exposes this metadata through services that allow engineers to find datasets within the company, to monitor datasets, to annotate them in order to enable others to use their datasets, and to analyze relationships between them. We discuss the technical challenges that we had to overcome in order to crawl and infer the metadata for billions of datasets, to maintain the consistency of our metadata catalog at scale, and to expose the metadata to users. We believe that many of the lessons that we learned are applicable to building large-scale enterprise-level data-management systems in general.

BibTeX key: halevy2016goods
entry type: inproceedings
address: New York, NY, USA
booktitle: Proceedings of the 2016 International Conference on Management of Data
year: 2016
pages: 795--806
publisher: ACM
series: SIGMOD '16
shorttitle: Goods
isbn: 978-1-4503-3531-7
file: ACM Full Text PDF:/home/jochen/.mozilla/firefox/70eonxwr.default-1478346858373/zotero/storage/NQSH28JD/Halevy et al. - 2016 - Goods Organizing Google's Datasets.pdf:application/pdf
DOI: 10.1145/2882903.2903730
urldate: 2018-12-23
url: http://doi.acm.org/10.1145/2882903.2903730

BibSonomy

Goods: Organizing Google's Datasets

Abstract

Tags

Users

Comments and Reviewsshow / hide

Cite this publication

More citation styles

search on