James Hamilton has published a thorough summary of Facebook's Cassandra, another scalable key-value store for your perusal. It's open source and is described as a "BigTable data model running on a Dynamo-like infrastructure." Cassandra is used in Facebook as an email search system containing 25TB and over 100m mailboxes. # Google Code for Cassandra - A Structured Storage System on a P2P Network # SIGMOD 2008 Presentation. # Video Presentation at Facebook # Facebook Engineering Blog for Cassandra # Anti-RDBMS: A list of distributed key-value stores # Facebook Cassandra Architecture and Design by James Hamilton
Distributed Sage is a framework that allows one to do distributed computing from within Sage. It includes a server, client and workers as well as a set of classes that one can subclass from to write distributed computation jobs. It is designed to be used mainly for ‘coarsely’ distributed computations, i.e., computations where jobs do not have to communicate much with each other. This is also sometimes referred to as ‘grid’ computing.
HBase is the Hadoop database. Its an open-source, distributed, column-oriented store modeled after the Google paper, Bigtable: A Distributed Storage System for Structured Data by Chang et al. Just as Bigtable leverages the distributed data storage provided by the Google File System, HBase provides Bigtable-like capabilities on top of Hadoop. HBase's goal is the hosting of very large tables -- billions of rows X millions of columns -- atop clusters of commodity hardware. Try it if your plans for a data store run to big.
HBase: Bigtable-like structured storage for Hadoop HDFS Just as Google's [WWW] Bigtable leverages the distributed data storage provided by the [WWW] Google File System, HBase provides Bigtable-like capabilities on top of Hadoop Core. Data is organized into tables, rows and columns. An Iterator-like interface is available for scanning through a row range (and of course there is the ability to retrieve a column value for a specific key). Any particular column may have multiple versions for the same row key.
Apache's Hadoop project aims to solve these problems by providing a framework for running large data processing applications on clusters of commodity hardware. Combined with Amazon EC2 for running the application, and Amazon S3 for storing the data, we can run large jobs very economically. This paper describes how to use Amazon Web Services and Hadoop to run an ad hoc analysis on a large collection of web access logs that otherwise would have cost a prohibitive amount in either time or money.
The goal of the Condor® Project is to develop, implement, deploy, and evaluate mechanisms and policies that support High Throughput Computing (HTC) on large collections of distributively owned computing resources. Guided by both the technological and sociological challenges of such a computing environment, the Condor Team has been building software tools that enable scientists and engineers to increase their computing throughput
JumboMem provides a low-effort solution to the problem of running memory-hungry programs on memory-starved computers. The JumboMem software gives programs access to memory spread across multiple computers, providing the illusion that all of the memory resides within a single computer. When a program exceeds the memory in one computer, it automatically spills over into the memory of the next computer.
J. Weerasinghe, F. Abel, C. Hagleitner, и A. Herkersdorf. Ubiquitous Intelligence and Computing and 2015 IEEE 12th Intl Conf on Autonomic and Trusted Computing and 2015 IEEE 15th Intl Conf on Scalable Computing and Communications and Its Associated Workshops (UIC-ATC-ScalCom), 2015 IEEE 12th Intl Conf on, стр. 1078--1086. IEEE, (2015)