TORQUE Resource Manager
Encyclopedia
The TORQUE Resource Manager is a distributed resource manager providing control over batch jobs and distributed compute nodes. Its name stands for Terascale Open-Source Resource and QUEue Manager.
Cluster Resources, Inc. describes it as open-source
and Debian
classifies it as non-free software. It is a community effort based on the original PBS
project and, with more than 1,200 patches, has incorporated significant advances in the areas of scalability, fault tolerance, and feature extensions contributed by NCSA
, OSC
, USC
, the US DOE
, Sandia
, PNNL
, UB
, TeraGrid
, and many other leading edge HPC
organizations.
TORQUE can integrate with the non-commercial Maui Cluster Scheduler
or the commercial Moab Workload Manager to improve overall utilization, scheduling and administration on a cluster. TORQUE is described by its developers as open-source software
, using the OpenPBS version 2.3 license and as non-free software in the Debian Free Software Guidelines
.
Cluster Resources, Inc. describes it as open-source
Open-source software
Open-source software is computer software that is available in source code form: the source code and certain other rights normally reserved for copyright holders are provided under a software license that permits users to study, change, improve and at times also to distribute the software.Open...
and Debian
Debian Free Software Guidelines
The Debian Free Software Guidelines is a set of guidelines that the Debian Project uses to determine whether a software license is a free software license, which in turn is used to determine whether a piece of software can be included in Debian...
classifies it as non-free software. It is a community effort based on the original PBS
Portable Batch System
Portable Batch System is the name of computer software that performs job scheduling. Its primary task is to allocate computational tasks, i.e., batch jobs, among the available computing resources...
project and, with more than 1,200 patches, has incorporated significant advances in the areas of scalability, fault tolerance, and feature extensions contributed by NCSA
National Center for Supercomputing Applications
The National Center for Supercomputing Applications is an American state-federal partnership to develop and deploy national-scale cyberinfrastructure that advances science and engineering. NCSA operates as a unit of the University of Illinois at Urbana-Champaign but it provides high-performance...
, OSC
Ohio Supercomputer Center
Established in 1987, the Ohio Supercomputer Center is a partner of Ohio universities and industries that provides a high performance computing, research, cyberinfrastructure, and computational science education services....
, USC
University of Southern California
The University of Southern California is a private, not-for-profit, nonsectarian, research university located in Los Angeles, California, United States. USC was founded in 1880, making it California's oldest private research university...
, the US DOE
United States Department of Energy
The United States Department of Energy is a Cabinet-level department of the United States government concerned with the United States' policies regarding energy and safety in handling nuclear material...
, Sandia
Sandia National Laboratories
The Sandia National Laboratories, managed and operated by the Sandia Corporation , are two major United States Department of Energy research and development national laboratories....
, PNNL
Pacific Northwest National Laboratory
Pacific Northwest National Laboratory is one of the United States Department of Energy National Laboratories, managed by the Department of Energy's Office of Science. The main campus of the laboratory is in Richland, Washington....
, UB
University at Buffalo, The State University of New York
University at Buffalo, The State University of New York, also commonly known as the University at Buffalo or UB, is a public research university and a "University Center" in the State University of New York system. The university was founded by Millard Fillmore in 1846. UB has multiple campuses...
, TeraGrid
TeraGrid
TeraGrid is an e-Science grid computing infrastructure combining resources at eleven partner sites. The project started in 2001 and operated from 2004 through 2011....
, and many other leading edge HPC
High-performance computing
High-performance computing uses supercomputers and computer clusters to solve advanced computation problems. Today, computer systems approaching the teraflops-region are counted as HPC-computers.-Overview:...
organizations.
TORQUE can integrate with the non-commercial Maui Cluster Scheduler
Maui Cluster Scheduler
Maui Cluster Scheduler is a job scheduler for use on clusters and supercomputers initially developed by Cluster Resources, Inc.. Maui is capable of supporting multiple scheduling policies, dynamic priorities, reservations, and fairshare capabilities...
or the commercial Moab Workload Manager to improve overall utilization, scheduling and administration on a cluster. TORQUE is described by its developers as open-source software
Open-source software
Open-source software is computer software that is available in source code form: the source code and certain other rights normally reserved for copyright holders are provided under a software license that permits users to study, change, improve and at times also to distribute the software.Open...
, using the OpenPBS version 2.3 license and as non-free software in the Debian Free Software Guidelines
Debian Free Software Guidelines
The Debian Free Software Guidelines is a set of guidelines that the Debian Project uses to determine whether a software license is a free software license, which in turn is used to determine whether a piece of software can be included in Debian...
.
Feature Set
TORQUE provides enhancements over standard OpenPBS in the following areas:- Fault Tolerance
- Additional failure conditions checked/handled
- Node health check script support
- Scheduling Interface
- Extended query interface providing the scheduler with additional and more accurate information
- Extended control interface allowing the scheduler increased control over job behavior and attributes
- Allows the collection of statistics for completed jobs
- Scalability
- Significantly improved server to MOM communication model
- Ability to handle larger clusters (over 15 TF/2,500 processors)
- Ability to handle larger jobs (over 2000 processors)
- Ability to support larger server messages
- Usability
- Extensive logging additions
- More human readable logging (i.e. no more 'error 15038 on command 42')