From Thread to Transcontinental Computer: Disturbing Lessons in
Distributed Supercomputing
D. Groen, и S. Zwart. (2015)cite arxiv:1507.01138Comment: Accepted for publication in IEEE conference on ERRORs.
Аннотация
We describe the political and technical complications encountered during the
astronomical CosmoGrid project. CosmoGrid is a numerical study on the formation
of large scale structure in the universe. The simulations are challenging due
to the enormous dynamic range in spatial and temporal coordinates, as well as
the enormous computer resources required. In CosmoGrid we dealt with the
computational requirements by connecting up to four supercomputers via an
optical network and make them operate as a single machine. This was
challenging, if only for the fact that the supercomputers of our choice are
separated by half the planet, as three of them are located scattered across
Europe and fourth one is in Tokyo. The co-scheduling of multiple computers and
the 'gridification' of the code enabled us to achieve an efficiency of up to
$93\%$ for this distributed intercontinental supercomputer. In this work, we
find that high-performance computing on a grid can be done much more
effectively if the sites involved are willing to be flexible about their user
policies, and that having facilities to provide such flexibility could be key
to strengthening the position of the HPC community in an increasingly
Cloud-dominated computing landscape. Given that smaller computer clusters owned
by research groups or university departments usually have flexible user
policies, we argue that it could be easier to instead realize distributed
supercomputing by combining tens, hundreds or even thousands of these
resources.
Описание
[1507.01138] From Thread to Transcontinental Computer: Disturbing Lessons in Distributed Supercomputing
%0 Generic
%1 groen2015thread
%A Groen, Derek
%A Zwart, Simon Portegies
%D 2015
%K distributed supercomputing
%T From Thread to Transcontinental Computer: Disturbing Lessons in
Distributed Supercomputing
%U http://arxiv.org/abs/1507.01138
%X We describe the political and technical complications encountered during the
astronomical CosmoGrid project. CosmoGrid is a numerical study on the formation
of large scale structure in the universe. The simulations are challenging due
to the enormous dynamic range in spatial and temporal coordinates, as well as
the enormous computer resources required. In CosmoGrid we dealt with the
computational requirements by connecting up to four supercomputers via an
optical network and make them operate as a single machine. This was
challenging, if only for the fact that the supercomputers of our choice are
separated by half the planet, as three of them are located scattered across
Europe and fourth one is in Tokyo. The co-scheduling of multiple computers and
the 'gridification' of the code enabled us to achieve an efficiency of up to
$93\%$ for this distributed intercontinental supercomputer. In this work, we
find that high-performance computing on a grid can be done much more
effectively if the sites involved are willing to be flexible about their user
policies, and that having facilities to provide such flexibility could be key
to strengthening the position of the HPC community in an increasingly
Cloud-dominated computing landscape. Given that smaller computer clusters owned
by research groups or university departments usually have flexible user
policies, we argue that it could be easier to instead realize distributed
supercomputing by combining tens, hundreds or even thousands of these
resources.
@misc{groen2015thread,
abstract = {We describe the political and technical complications encountered during the
astronomical CosmoGrid project. CosmoGrid is a numerical study on the formation
of large scale structure in the universe. The simulations are challenging due
to the enormous dynamic range in spatial and temporal coordinates, as well as
the enormous computer resources required. In CosmoGrid we dealt with the
computational requirements by connecting up to four supercomputers via an
optical network and make them operate as a single machine. This was
challenging, if only for the fact that the supercomputers of our choice are
separated by half the planet, as three of them are located scattered across
Europe and fourth one is in Tokyo. The co-scheduling of multiple computers and
the 'gridification' of the code enabled us to achieve an efficiency of up to
$93\%$ for this distributed intercontinental supercomputer. In this work, we
find that high-performance computing on a grid can be done much more
effectively if the sites involved are willing to be flexible about their user
policies, and that having facilities to provide such flexibility could be key
to strengthening the position of the HPC community in an increasingly
Cloud-dominated computing landscape. Given that smaller computer clusters owned
by research groups or university departments usually have flexible user
policies, we argue that it could be easier to instead realize distributed
supercomputing by combining tens, hundreds or even thousands of these
resources.},
added-at = {2015-07-07T09:39:23.000+0200},
author = {Groen, Derek and Zwart, Simon Portegies},
biburl = {https://www.bibsonomy.org/bibtex/219dbd92ecdd71e56b58f2667530561ec/miki},
description = {[1507.01138] From Thread to Transcontinental Computer: Disturbing Lessons in Distributed Supercomputing},
interhash = {53a657caf02545c281d247057984105b},
intrahash = {19dbd92ecdd71e56b58f2667530561ec},
keywords = {distributed supercomputing},
note = {cite arxiv:1507.01138Comment: Accepted for publication in IEEE conference on ERRORs},
timestamp = {2015-07-07T09:39:23.000+0200},
title = {From Thread to Transcontinental Computer: Disturbing Lessons in
Distributed Supercomputing},
url = {http://arxiv.org/abs/1507.01138},
year = 2015
}