News: all of the few remaining calls to scipy have been replaced with calls to numpy. Versions 0.1.8 and above do not require scipy as a dependency. Introduction This library provides Python functions for agglomerative clustering. Its features include * generating hierarchical clusters from distance matrices * computing distance matrices from observation vectors * computing statistics on clusters * cutting linkages to generate flat clusters * and visualizing clusters with dendrograms. The interface is very similar to MATLAB's Statistics Toolbox API to make code easier to port from MATLAB to Python/Numpy. The core implementation of this library is in C for efficiency.
OSCAR allows users, regardless of their experience level with a *nix environment, to install a Beowulf type high performance computing cluster. It also contains everything needed to administer and program this type of HPC cluster. OSCAR's flexible package management system has a rich set of pre-packaged applications and utilities which means you can get up and running without laboriously installing and configuring complex cluster administration and communication packages. It also lets administrators create customized packages for any kind of distributed application or utility, and to distribute those packages from an online package repository, either on or off site.
InfiniBand is a switched fabric communications link primarily used in high-performance computing. Its features include quality of service and failover, and it is designed to be scalable. The InfiniBand architecture specification defines a connection between processor nodes and high performance I/O nodes such as storage devices. InfiniBand forms a superset of the Virtual Interface Architecture.
The Ohio Supercomputer Center provides supercomputing, research and educational resources to a diverse state and national community, including education, academic research, industry and state government. At the Ohio Supercomputer Center, our duty is to empower our clients, partner strategically to develop new research and business opportunities, and lead Ohio's knowledge economy.
The Large Synoptic Survey Telescope (LSST) is a project to build an 8.4m telescope at Cerro Pachon, Chile and survey the entire sky every three days starting around 2014. The scientific goals of the project range from characterizing the population of largish asteroids which are in orbits that could hit the Earth to understanding the nature of the dark energy that is causing the Universe's expansion to accelerate. The application codes, which handle the images coming from the telescope and generate catalogs of astronomical sources, are being implemented in C++, exported to python using swig. The pipeline processing framework allows these python modules to be connected together to process data in a parallel environment.
Distributed Sage is a framework that allows one to do distributed computing from within Sage. It includes a server, client and workers as well as a set of classes that one can subclass from to write distributed computation jobs. It is designed to be used mainly for ‘coarsely’ distributed computations, i.e., computations where jobs do not have to communicate much with each other. This is also sometimes referred to as ‘grid’ computing.
Ack. Ppython requires worker threads on each cluster node. I want an ssh private key (no p/w) solution. 1) Start parallel python execution server on all your remote computational nodes:
Rocks is an open-source Linux cluster distribution that enables end users to easily build computational clusters, grid endpoints and visualization tiled-display walls. Hundreds of researchers from around the world have used Rocks to deploy their own cluster (see the Rocks Cluster Register).
HBase is the Hadoop database. Its an open-source, distributed, column-oriented store modeled after the Google paper, Bigtable: A Distributed Storage System for Structured Data by Chang et al. Just as Bigtable leverages the distributed data storage provided by the Google File System, HBase provides Bigtable-like capabilities on top of Hadoop. HBase's goal is the hosting of very large tables -- billions of rows X millions of columns -- atop clusters of commodity hardware. Try it if your plans for a data store run to big.
HBase: Bigtable-like structured storage for Hadoop HDFS Just as Google's [WWW] Bigtable leverages the distributed data storage provided by the [WWW] Google File System, HBase provides Bigtable-like capabilities on top of Hadoop Core. Data is organized into tables, rows and columns. An Iterator-like interface is available for scanning through a row range (and of course there is the ability to retrieve a column value for a specific key). Any particular column may have multiple versions for the same row key.
Apache's Hadoop project aims to solve these problems by providing a framework for running large data processing applications on clusters of commodity hardware. Combined with Amazon EC2 for running the application, and Amazon S3 for storing the data, we can run large jobs very economically. This paper describes how to use Amazon Web Services and Hadoop to run an ad hoc analysis on a large collection of web access logs that otherwise would have cost a prohibitive amount in either time or money.