Quasi-opportunistic supercomputing
Encyclopedia
Quasi-opportunistic supercomputing is a computational paradigm for supercomputing on a large number of geographically disperse computers
Distributed computing
Distributed computing is a field of computer science that studies distributed systems. A distributed system consists of multiple autonomous computers that communicate through a computer network. The computers interact with each other in order to achieve a common goal...

. Quasi-opportunistic supercomputing aims to provide a higher quality of service than opportunistic resource sharing
Grid computing
Grid computing is a term referring to the combination of computer resources from multiple administrative domains to reach a common goal. The grid can be thought of as a distributed system with non-interactive workloads that involve a large number of files...

.

The quasi-opportunistic approach coordinates computers which are often under different ownerships to achieve reliable and fault-tolerant high performance with more control than opportunistic computer grids
Grid computing
Grid computing is a term referring to the combination of computer resources from multiple administrative domains to reach a common goal. The grid can be thought of as a distributed system with non-interactive workloads that involve a large number of files...

 in which computational resources are used whenever they may become available.

While the "opportunistic match-making" approach to task scheduling on computer grids is simpler in that it merely matches tasks to whatever resources may be available at a given time, demanding supercomputer applications such as weather simulations or computational fluid dynamics
Computational fluid dynamics
Computational fluid dynamics, usually abbreviated as CFD, is a branch of fluid mechanics that uses numerical methods and algorithms to solve and analyze problems that involve fluid flows. Computers are used to perform the calculations required to simulate the interaction of liquids and gases with...

 have remained out of reach, partly due to the barriers in reliable sub-assignment of a large number of tasks as well as the reliable availability of resources at a given time.

The quasi-opportunistic approach enables the execution of demanding applications within computer grids by establishing grid-wise resource allocation agreements; and fault tolerant
Fault-tolerant system
Fault-tolerance or graceful degradation is the property that enables a system to continue operating properly in the event of the failure of some of its components. A newer approach is progressive enhancement...

 message passing to abstractly shield against the failures of the underlying resources, thus maintaining some opportunism, while allowing a higher level of control.

Opportunistic supercomputing on grids

The general principle of grid computing
Grid computing
Grid computing is a term referring to the combination of computer resources from multiple administrative domains to reach a common goal. The grid can be thought of as a distributed system with non-interactive workloads that involve a large number of files...

 is to use distributed computing resources from diverse administrative domains to solve a single task, by using resources as they become available. Traditionally, most grid systems have approached the task scheduling challenge by using an "opportunistic match-making" approach in which tasks are matched to whatever resources may be available at a given time.
BOINC, developed at the University of California, Berkeley
University of California, Berkeley
The University of California, Berkeley , is a teaching and research university established in 1868 and located in Berkeley, California, USA...

 is an example of a volunteer-based
Volunteer computing
Volunteer computing is a type of distributed computing in which computer owners donate their computing resources to one or more "projects".-History:...

, opportunistic grid computing system. The applications based on the BOINC grid have reached multi-petaflop levels by using close to half a million computers connected on the internet, whenever volunteer resources become available. Another system, Folding@home
Folding@home
Folding@home is a distributed computing project designed to use spare processing power on personal computers to perform simulations of disease-relevant protein folding and other molecular dynamics, and to improve on the methods of doing so...

, which is not based on BOINC, computes protein folding
Protein folding
Protein folding is the process by which a protein structure assumes its functional shape or conformation. It is the physical process by which a polypeptide folds into its characteristic and functional three-dimensional structure from random coil....

, has reached 8..8 petaflops by using clients that include GPU and PlayStation 3
PlayStation 3
The is the third home video game console produced by Sony Computer Entertainment and the successor to the PlayStation 2 as part of the PlayStation series. The PlayStation 3 competes with Microsoft's Xbox 360 and Nintendo's Wii as part of the seventh generation of video game consoles...

 systems. However, these results are not applicable to the TOP500
TOP500
The TOP500 project ranks and details the 500 most powerful known computer systems in the world. The project was started in 1993 and publishes an updated list of the supercomputers twice a year...

 ratings because they do not run the general purpose Linpack
LINPACK
LINPACK is a software library for performing numerical linear algebra on digital computers. It was written in Fortran by Jack Dongarra, Jim Bunch, Cleve Moler, and Gilbert Stewart, and was intended for use on supercomputers in the 1970s and early 1980s...

 benchmark.

A key strategy for grid computing is the use of middleware
Middleware
Middleware is computer software that connects software components or people and their applications. The software consists of a set of services that allows multiple processes running on one or more machines to interact...

 that partitions pieces of a program among the different computers on the network. Although general grid computing
Grid computing
Grid computing is a term referring to the combination of computer resources from multiple administrative domains to reach a common goal. The grid can be thought of as a distributed system with non-interactive workloads that involve a large number of files...

 has had success in parallel task execution, demanding supercomputer applications such as weather simulations or computational fluid dynamics
Computational fluid dynamics
Computational fluid dynamics, usually abbreviated as CFD, is a branch of fluid mechanics that uses numerical methods and algorithms to solve and analyze problems that involve fluid flows. Computers are used to perform the calculations required to simulate the interaction of liquids and gases with...

 have remained out of reach, partly due to the barriers in reliable sub-assignment of a large number of tasks as well as the reliable availability of resources at a given time.

The opportunistic Internet PrimeNet Server supports GIMPS
Great Internet Mersenne Prime Search
The Great Internet Mersenne Prime Search is a collaborative project of volunteers who use freely available computer software to search for Mersenne prime numbers. The project was founded by George Woltman, who also wrote the software Prime95 and MPrime for the project...

, one of the earliest grid computing projects since 1997, researching Mersenne prime
Mersenne prime
In mathematics, a Mersenne number, named after Marin Mersenne , is a positive integer that is one less than a power of two: M_p=2^p-1.\,...

 numbers. , GIMPS's distributed research currently achieves about 60 teraflops as an volunteer-based computing project. The use of computing resources on "volunteer grids
Volunteer computing
Volunteer computing is a type of distributed computing in which computer owners donate their computing resources to one or more "projects".-History:...

" such as GIMPS is usually purely opportunistic: geographically disperse distributively owned computers are contributing whenever they become available, with no preset commitments that any resources will be available at any given time. Hence, hypothetically, if many of the volunteers unwittingly decide to switch their computers off on a certain day, grid resources will become significantly reduced. Furthermore, users will find it exceedingly costly to organize a very large number of opportunistic computing resources in a manner that can achieve resonable high performance computing.

Quasi-control of computational resources

An example of a more structured grid for high performance computing is DEISA
DEISA
The Distributed European Infrastructure for Supercomputing Applications is a European Union supercomputer project. It is made up of a consortium of eleven leading national supercomputing centres from seven European countries...

, a supercomputer project organized by the European Community which uses computers in seven European countries. Although different parts of a program executing within DEISA may be running on computers located in different countries under different ownerships and administrations, there is more control and coordination than with a purely opportunistic approach. DEISA has a two level integration scheme: the "inner level" consists of a number of strongly connected high performance computer clusters that share similar operating systems and scheduling mechanisms and provide a homogeneous computing environment; while the "outer level" consists of heterogeneous systems that have supercomputing capabilities. Thus DEISA can provide somewhat controlled, yet dispersed high performance computing services to users.

The quasi-opportunistic paradigm aims to overcome this by achieving more control over the assignment of tasks to distributed resources and the use of pre-negotiated scenarios for the availability of systems within the network. Quasi-opportunistic distributed execution of demanding parallel computing software in grids focuses on the implementation of grid-wise allocation agreements, co-allocation subsystems, communication topology-aware allocation mechanisms, fault tolerant message passing libraries and data pre-conditioning. In this approach, fault tolerant message passing is essential to abstractly shield against the failures of the underlying resources.

The quasi-opportunistic approach goes beyond volunteer computing
Volunteer computing
Volunteer computing is a type of distributed computing in which computer owners donate their computing resources to one or more "projects".-History:...

 on a highly distributed systems such as BOINC, or general grid computing
Grid computing
Grid computing is a term referring to the combination of computer resources from multiple administrative domains to reach a common goal. The grid can be thought of as a distributed system with non-interactive workloads that involve a large number of files...

 on a system such as Globus
Globus Toolkit
The Globus Toolkit, currently at version 5, is an open source toolkit for building computing grids developed and provided by the Globus Alliance.-Standards implementation:The Globus Toolkit is an implementation of the following standards:...

 by allowing the middleware
Middleware
Middleware is computer software that connects software components or people and their applications. The software consists of a set of services that allows multiple processes running on one or more machines to interact...

 to provide almost seamless access to many computing clusters so that existing programs in languages such as Fortran or C can be distributed among multiple computing resources.

A key component of the quasi-opportunistic approach, as in the Qoscos Grid
Qoscos Grid
The Qoscos Grid is a quasi-opportunistic supercomputing system using grid computing.QosCos Grid acts as middleware resource management facilities which provide end-users with supercomputer-like performance by connecting many computing clusters together...

, is an economic-based resource allocation model in which resources are provided based on agreements among specific supercomputer administration sites. Unlike volunteer systems that rely on altruism, specific contractual terms are stipulated for the performance of specific types of tasks. However, "tit-for-tat" paradigms in which computations are paid back via future computations is not suitable for supercomputing applications, and is avoided.

The other key component of the quasi-opportunistic approach is a reliable message passing
Message passing
Message passing in computer science is a form of communication used in parallel computing, object-oriented programming, and interprocess communication. In this model, processes or objects can send and receive messages to other processes...

 system to provide distributed checkpoint restart
Application checkpointing
Checkpointing is a technique for inserting fault tolerance into computing systems. It basically consists of storing a snapshot of the current application state, and later on, use it for restarting the execution in case of failure.- Technique properties :...

mechanisms when computer hardware or networks inevitably experience failures. In this way, if some part of a large computation fails, the entire run need not be abandoned, but can restart from the last saved checkpoint.
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK