Disco is an open-source implementation of the Map-Reduce framework for distributed computing. As the original framework, Disco supports parallel computations over large data sets on unreliable cluster of computers.
"For a while now, IBM has had multiple and competing tools for managing AIX and Linux clusters for its supercomputer customers and yet another set of tools that were used for other HPC setups with a slightly more commercial bent to them. But Big Blue has now cleaned house, killing off its closed-source Cluster Systems Management (CSM) tool and tapping its own open source Extreme Cluster Administration Toolkit (known as xCAT) as its replacement."
Building and Promoting a Linux-based Operating System to Support Virtual Organizations for Next Generation Grids (2006-2010). The emergence of Grids enables the sharing of a wide range of resources to solve large-scale computational and data intensive problems in science, engineering and commerce. While much has been done to build Grid middleware on top of existing operating systems, little has been done to extend the underlying operating systems to enablee and facilitate Grid computing, for example by embedding important functionalities directly into the operating system kernel.
SystemImager is software which automates Linux installs, software distribution, and production deployment. SystemImager makes it easy to do automated installs (clones), software distribution, content or data distribution, configuration changes, and operating system updates to your network of Linux machines. You can even update from one Linux release version to another! It can also be used to ensure safe production deployments. By saving your current production image before updating to your new production image, you have a highly reliable contingency mechanism. If the new production enviroment is found to be flawed, simply roll-back to the last production image with a simple update command! Some typical environments include: Internet server farms, database server farms, high performance clusters, computer labs, and corporate desktop environments.
Modern graphics processing units (GPUs) contain hundreds of arithmetic units and can be harnessed to provide tremendous acceleration for many numerically intensive scientific applications. The key to effective utilization of GPUs for scientific computing
As high performance computing (HPC) becomes a ubiquitous part of the scientific computing landscape, the science of visualizing HPC datasets has become a critical field of its own. One of the hottest solutions can be found in commoditized high performance
Red Hat on Wednesday announced a significant departure from its current business plan, saying its flagship Linux product will be available on Amazon.com's Elastic Computing Cloud online service.
The Ohio Supercomputer Center provides supercomputing, research and educational resources to a diverse state and national community, including education, academic research, industry and state government. At the Ohio Supercomputer Center, our duty is to empower our clients, partner strategically to develop new research and business opportunities, and lead Ohio's knowledge economy.
Rocks is an open-source Linux cluster distribution that enables end users to easily build computational clusters, grid endpoints and visualization tiled-display walls. Hundreds of researchers from around the world have used Rocks to deploy their own cluster (see the Rocks Cluster Register).
Handcock, M.S., Raftery, A.E. and Tantrum, J. (2005).
Model-Based Clustering for Social Networks.
Working Paper no. 46, Center for Statistics and the Social Sciences,
University of Washington.
This seems difficult, at first glance, but really, it’s not. At all. From the time you get all your hardware plugged in to the time you’re doing some massive parallel processing, depending on your needs, can be anywhere from 2 hours to 10 minutes. And
In this first of five articles, learn what it means for software to be highly available and how to install and set up heartbeat software from the High-Availability Linux project on a two-node system. You'll also learn how to configure the Apache Web serve
You MUST have a third server as a managment node but this can be shut down after the cluster starts. Also note that I do not recommend shutting down the managment server (see the extra notes at the bottom of this document for more information). You can no
This tutorial shows how to configure a MySQL 5 cluster with three nodes: two storage nodes and one management node. This cluster is load-balanced by a high-availability load balancer that in fact has two nodes that use the Ultra Monkey package which provi
James Hamilton has published a thorough summary of Facebook's Cassandra, another scalable key-value store for your perusal. It's open source and is described as a "BigTable data model running on a Dynamo-like infrastructure." Cassandra is used in Facebook as an email search system containing 25TB and over 100m mailboxes. # Google Code for Cassandra - A Structured Storage System on a P2P Network # SIGMOD 2008 Presentation. # Video Presentation at Facebook # Facebook Engineering Blog for Cassandra # Anti-RDBMS: A list of distributed key-value stores # Facebook Cassandra Architecture and Design by James Hamilton
Spark is a fast, in-memory cluster computing framework with a language-integrated interface in Scala. It shines at iterative MapReduce (e.g. machine learning) and interactive data mining, where keeping data in memory provides substantial speedups.
This tutorial will show you how to create a High Availability HAProxy load balancer setup on DigitalOcean, with the support of a Floating IP and the Corosync/Pacemaker cluster stack. The HAProxy load balancers will each be configured to split traffic
Die zweitgrößte Stadt im Land meldet den nächsten Mini-Cluster: Nach dem Fall eines erkrankten Kellners in der Herrengasse von Wiener Neustadt – die ...
Die Zahl der Fälle im Coronavirus-Cluster in St. Wolfgang im oberösterreichischen Salzkammergut hat sich am Samstagabend weiter erhöht. Mit Stand 21 ...
The Diamond Princess cruise ship was put under quarantine offshore Yokohama, Japan, after a passenger who disembarked in Hong Kong was confirmed as a coronavirus disease 2019 case. We performed whole-genome sequencing of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) directly from PCR+ clinical specimens and conducted a phylogenetic analysis of the outbreak. All tested isolates exhibited a transversion at G11083T, suggesting that SARS-CoV-2 dissemination on the Diamond Princess originated from a single introduction event before the quarantine started. Although further spreading might have been prevented by quarantine, some progeny clusters could be linked to transmission through mass-gathering events in the recreational areas and direct transmission among passengers who shared cabins during the quarantine. This study demonstrates the usefulness of haplotype network/phylogeny analysis in identifying potential infection routes.
openMosix is a Linux kernel extension for single-system image clustering. This kernel extension turns a network of ordinary computers into a supercomputer for Linux applications.
Ultra Monkey is a project to create load balanced and highly available network services. For example a cluster of web servers that appear as a single web server to end-users. The service may be for end-users across the world connected via the internet, or
The goal of the Condor® Project is to develop, implement, deploy, and evaluate mechanisms and policies that support High Throughput Computing (HTC) on large collections of distributively owned computing resources. Guided by both the technological and sociological challenges of such a computing environment, the Condor Team has been building software tools that enable scientists and engineers to increase their computing throughput
JPPF is an open source Grid Computing platform written in Java that makes it easy to run applications in parallel, and speed up their execution by orders of magnitude. Write once, deploy once, execute everywhere!
DRBD® refers to block devices designed as a building block to form high availability (HA) clusters. This is done by mirroring a whole block device via an assigned network. DRBD can be understood as network based raid-1.
In the illustration above, the two orange boxes represent two servers that form an HA cluster. The boxes contain the usual components of a Linux™ kernel: file system, buffer cache, disk scheduler, disk drivers, TCP/IP stack and network interface card (NIC) driver. The black arrows illustrate the flow of data between these components.
The orange arrows show the flow of data, as DRBD mirrors the data of a high availably service from the active node of the HA cluster to the standby node of the HA cluster.
Basic principle: each time you visit a new site, you are gaining one point of expertise. With every 10 points, you move to the next level. Your search engine is mutating, new buttons appear giving you access to advanced features (search video, images, news, encylopedia, advanced filters, animated skins, web archive, traffic details...)
Hole transport in molecularly doped polymers (MDPs) is modeled as random walks on fixed donors (Ds) embedded in a polymer matrix. Dilution p<1 corresponds to placing individual Ds, dimers D2, or tetramers D4 randomly on a fraction p of sites in a face-centered-cubic lattice. Monte Carlo simulations of the drift velocity vD(E) in a bias field E have maxima in dilute (p = 8%) systems of D2 or D4 that are related to the formation and polarization of clusters of nearest-neighbor donors. Marcus or...
Last week I moderated a webinar entitled Optimizing Performance for HPC: Part 2 - Interconnect with InfiniBand. It was a great presentation with a lot of practical information and good questions. If you missed it, it will be available for a few months, so you still have a chance to check it out. As part of the webinar, Vallard Benincosa of IBM, mentioned that the speed of light was a becoming an issue in network design. In engineering terms, that is refered to as a hard limit.
PelicanHPC is a distribution of GNU/Linux that runs as a "live CD" (or it can be put on a USB device, or it can be used as a virtualized OS). If the ISO image file is burnt to a CD, the resulting CD can be used to boot a computer. The computer on which PelicanHPC is booted is referred to as the "frontend node". It is the computer with which the user interacts. Once PelicanHPC is running, a script - "pelican_setup" - may be run. This script configures the frontend node as a netboot server. After this has been done, other computers can boot copies of PelicanHPC over the network. These other computers are referred to as "compute nodes". PelicanHPC configures the cluster made up of the frontend node and the compute nodes so that MPI-based parallel computing may be done.
Thankfully, Rocks v4.3 kernel roll and Lustre v1.6.1 appear to be based on the same kernel version. This saved me from having to get the kernel source, and patch it for Lustre. I've done that before with some success, but I'm happy to be able to avoid doing it again. The following applies to setting up Rocks so that Lustre is installed and ready on compute nodes. I also put Lustre on the frontend node, but that was done manually, simply by installing the rpms, and configuring grub.
"ParallelKnoppix is a modified Knoppix live CD designed for use in creating HPC clusters. You can start up PK on multiple nodes to run a cluster, and customize PK to add or remove applications. "
"OpenSSI is an open source single-system image clustering system. It allows a collection of computers to be treated as one large system, allowing applications running on any one machine access to the resources of all the machines in the cluster. [2] [3]"
J. Ko, B. Stewart, D. Fox, K. Konolige, and B. Limketkai. Intelligent Robots and Systems, 2003. (IROS 2003). Proceedings. 2003 IEEE/RSJ International Conference on, (27-31 Oct. 2003)
D. KANNAN, and N.MANGALAM. IRJCS:: International Research Journal of Computer Science, Volume IV (Issue XII):
01-06(December 2017)1. S. Berchtold, C Bohm, and H. Kriegel. The Pyramid-Technique: Towards Breaking the Curse of Dimensionality. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, pages 142–153, Seattle, Washington, 2010, 98. 185 2. Stefan Berchtold, Daniel A. Keim, and Hans-Peter Kriegel. The SR-tree : An index structure for high-dimensional data. In Proceedings of 22th International Conference on Very Large Data Bases, VLDB’12, pages 28–39, Bombay, India, 2012. 3. N. Beckmann, H.P. Kriegel, R. Schneider, and B. Seeger. The SR-tree: an Efficient and Robust Access Method for Points and Rectangles. In Proceedings of ACM-SIGMOD International Conference on Management of Data, pages 322–331, Atlantic City, NJ, May 2011. 4. K. Chakrabarti and S. Mehrotra. The Hybrid Tree: An Index Structure for High Dimensional Feature Spaces. In Proceedings of the 16th International Conference on Data Engineering, pages 440–447, San Diego, CA, February 2012. 5. Sudipto Guha, Rajeev Rastogi, and Kyuseok Shim. Cure: An efficient clustering algorithm for large databases. In Proceedings of the ACM SIGMOD conference on Management of Data, pages 73–84, Seattle, WA, 2011. 6. R. Kurniawati, J. S. Jin, and J. A. Shepherd. The SS+-tree: An improved index structure for similarity searches in a high-dimensional feature space. In Proceedings of SPIE Storage and Retrieval for Image and Video Databases, pages 13–24, February 2012. 7. N. Katayama and S. Satoh. The SR-tree: An Index Structure for High-Dimensional Nearest Neighbor Queries. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, pages 369–380, Tucson, Arizona, 2013. 8. J.T. Robinson. The K-D-B-Tree: A Search Structure for Large Multidimensional Dynamic Indexes. In Proceedings of the ACM SIGMOD Conference on Management of Data, pages 10–18, Ann Arbor, MI, April 2013. 9. D.A. White and R. Jain. Similarity Indexing with the SS-tree. In Proceedings of the 12th Intl. Conf. on Data Engineering, pages 516–523, New Orleans, Louisiana, February 2014. 10. D. Yu, S. Chatterjee, G. Sheikholeslami, and A. Zhang. Efficiently detecting arbitrary shaped clusters in very large datasets with high dimensions. Technical Report 98-8, State University of New York at Buffalo, Department of Computer Science and Engineering, November 2013. 11. Tian Zhang, Raghu Ramakrishnan, and Miron Livny. BIRCH: An Efficient Data Clustering Method for Very Large Databases. In Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, pages 103–114, Montreal, Canada, 2012..
F. Perteneder, M. Bresler, E. Grossauer, J. Leong, C. Rendl, and M. Haller. Proceedings of the 19th ACM Conference on Computer Supported Cooperative Work and Social Computing Companion, page 81--85. New York, NY, USA, ACM, (2016)event-place: San Francisco, California, USA.
A. Mulia, R. Chandar, and B. Whitmore. (2016)cite arxiv:1607.03577Comment: 18 pages, 15 figures, accepted for publication in ApJ. Image quality on some figures has been degraded to comply with arxiv file limitations.