QCDOC
Encyclopedia
The QCDOC, Quantum ChromoDynamics
Quantum chromodynamics
In theoretical physics, quantum chromodynamics is a theory of the strong interaction , a fundamental force describing the interactions of the quarks and gluons making up hadrons . It is the study of the SU Yang–Mills theory of color-charged fermions...

 On a Chip
, is a supercomputer
Supercomputer
A supercomputer is a computer at the frontline of current processing capacity, particularly speed of calculation.Supercomputers are used for highly calculation-intensive tasks such as problems including quantum physics, weather forecasting, climate research, molecular modeling A supercomputer is a...

 technology focusing on using relatively cheap low power
Low-power electronics
Low-power electronics means that the consumption of electric power is deliberately low, e.g. notebook processors.- Computing elements :The density and speed of integrated-circuit computing elements have increased exponentially for several decades, following a trend described by Moore's Law...

 processing elements
Microprocessor
A microprocessor incorporates the functions of a computer's central processing unit on a single integrated circuit, or at most a few integrated circuits. It is a multipurpose, programmable device that accepts digital data as input, processes it according to instructions stored in its memory, and...

 to produce a massively parallel machine. As the name suggests, the machine is custom made to solve small but extremely demanding problems in the fields of quantum physics.

Overview

The computers were designed and built jointly by University of Edinburgh
University of Edinburgh
The University of Edinburgh, founded in 1583, is a public research university located in Edinburgh, the capital of Scotland, and a UNESCO World Heritage Site. The university is deeply embedded in the fabric of the city, with many of the buildings in the historic Old Town belonging to the university...

 (UKQCD), Columbia University
Columbia University
Columbia University in the City of New York is a private, Ivy League university in Manhattan, New York City. Columbia is the oldest institution of higher learning in the state of New York, the fifth oldest in the United States, and one of the country's nine Colonial Colleges founded before the...

, the RIKEN
RIKEN
is a large natural sciences research institute in Japan. Founded in 1917, it now has approximately 3000 scientists on seven campuses across Japan, the main one in Wako, just outside Tokyo...

 BNL
Brookhaven National Laboratory
Brookhaven National Laboratory , is a United States national laboratory located in Upton, New York on Long Island, and was formally established in 1947 at the site of Camp Upton, a former U.S. Army base...

 Brookhaven Research Center and IBM
IBM
International Business Machines Corporation or IBM is an American multinational technology and consulting corporation headquartered in Armonk, New York, United States. IBM manufactures and sells computer hardware and software, and it offers infrastructure, hosting and consulting services in areas...

. The purpose of the collaboration was to exploit computing facilities for lattice field theory
Lattice QCD
Lattice QCD is a well-established non-perturbative approach to solving the quantum chromodynamics theory of quarks and gluons. It is a lattice gauge theory formulated on a grid or lattice of points in space and time....

 calculations whose primary aim is to increase the predictive power of the Standard Model
Standard Model
The Standard Model of particle physics is a theory concerning the electromagnetic, weak, and strong nuclear interactions, which mediate the dynamics of the known subatomic particles. Developed throughout the mid to late 20th century, the current formulation was finalized in the mid 1970s upon...

 of elementary particle interactions through numerical simulation of quantum chromodynamics (QCD). The target was to build a massively parallel supercomputer able to peak at 10 Tflops
FLOPS
In computing, FLOPS is a measure of a computer's performance, especially in fields of scientific calculations that make heavy use of floating-point calculations, similar to the older, simpler, instructions per second...

 with sustained power at 50% capacity.

There are three QCDOCs in service each reaching 10 Tflops peak operation.
  • University of Edinburgh
    University of Edinburgh
    The University of Edinburgh, founded in 1583, is a public research university located in Edinburgh, the capital of Scotland, and a UNESCO World Heritage Site. The university is deeply embedded in the fabric of the city, with many of the buildings in the historic Old Town belonging to the university...

    's Parallel Computing Centre (EPCC
    EPCC
    EPCC is a supercomputing centre based at the University of Edinburgh. Since its foundation, its stated mission has been to accelerate the effective exploitation of novel computing throughout industry, academia and commerce.The University has supported high-performance computing services since 1982...

    ). In operation by the UKQCD since 2005
  • RIKEN BNL Brookhaven Research Center at Brookhaven National Laboratory
    Brookhaven National Laboratory
    Brookhaven National Laboratory , is a United States national laboratory located in Upton, New York on Long Island, and was formally established in 1947 at the site of Camp Upton, a former U.S. Army base...

  • U.S. Department of Energy
    United States Department of Energy
    The United States Department of Energy is a Cabinet-level department of the United States government concerned with the United States' policies regarding energy and safety in handling nuclear material...

     Program in High Energy and Nuclear Physics at Brookhaven National Laboratory


Around 23 UK academic staff, their postdocs and students, from seven universities, belong to UKQCD. Costs were funded through a Joint Infrastructure Fund Award of £6.6 million. Staff costs (system support, physicist programmers and postdocs) are around £1 million per year, other computing and operating costs are around £0.2 million per year.http://www.scitech.ac.uk/roadmap/rmProject.aspx?q=82

QCDOC was to replace an earlier design, QCDSP, where the power came from connecting large amounts of DSPs
Digital signal processor
A digital signal processor is a specialized microprocessor with an architecture optimized for the fast operational needs of digital signal processing.-Typical characteristics:...

 together in a similar fashion. The QCDSP strapped 12.288 nodes to a 4D network and reached 1 Tflops in 1998.

QCDOC can be seen as a predecessor to the highly successful Blue Gene/L supercomputer. They share a lot of design traits, and similarities go beyond superficial characteristics. Blue Gene is also a massively parallel supercomputer built with a large amount of cheap, relatively weak PowerPC 440 based SoC nodes connected with a high bandwidth multidimensional mesh. They differ, however, in that the computing nodes in BG/L are more powerful and are connected with a faster, more sophisticated network that scales up to several hundred thousand nodes per system.

Computing node

The computing nodes are custom ASIC
ASIC
ASIC may refer to:* Application-specific integrated circuit, an integrated circuit developed for a particular use, as opposed to a customised general-purpose device.* ASIC programming language, a dialect of BASIC...

s with about fifty million transistors each. They are mainly made up of existing building blocks from IBM
IBM
International Business Machines Corporation or IBM is an American multinational technology and consulting corporation headquartered in Armonk, New York, United States. IBM manufactures and sells computer hardware and software, and it offers infrastructure, hosting and consulting services in areas...

. They are built around a 500 MHz PowerPC 440 core with 4 MB DRAM
Dram
Dram or DRAM may refer to:As a unit of measure:* Dram , an imperial unit of mass and volume* Armenian dram, a monetary unit* Dirham, a unit of currency in several Arab nationsOther uses:...

, memory management for external DDR SDRAM
DDR SDRAM
Double data rate synchronous dynamic random access memory is a class of memory integrated circuits used in computers. DDR SDRAM has been superseded by DDR2 SDRAM and DDR3 SDRAM, neither of which are either forward or backward compatible with DDR SDRAM, meaning that DDR2 or DDR3 memory modules...

, system I/O for internode communications, and dual Ethernet built in. The computing node is capable of 1 double precision Gflops. Each node has one DIMM
DIMM
A DIMM or dual in-line memory module, comprises a series of dynamic random-access memory integrated circuits. These modules are mounted on a printed circuit board and designed for use in personal computers, workstations and servers...

 socket capable of holding between 128 and 2048 MB of 333 MHz ECC DDR SDRAM
DDR SDRAM
Double data rate synchronous dynamic random access memory is a class of memory integrated circuits used in computers. DDR SDRAM has been superseded by DDR2 SDRAM and DDR3 SDRAM, neither of which are either forward or backward compatible with DDR SDRAM, meaning that DDR2 or DDR3 memory modules...

.

Inter node communication

Each node has the capability to send and receive data from each of its twelve nearest neighbors in a six-dimensional mesh at a rate of 500 Mbit/s each. This provides a total off-node bandwidth of 12 Gbit/s. Each of these 24 channels has DMA
Direct memory access
Direct memory access is a feature of modern computers that allows certain hardware subsystems within the computer to access system memory independently of the central processing unit ....

 to the other nodes' on-chip DRAM or the external SDRAM. In practice only four dimensions will be used to form a communications sub-torus where the remaining two dimensions will be used to partition the system.

The operating system communicates with the computing nodes using the Ethernet network. This is also used for diagnostics, configuration and communications with disk storage.

Mechanical design

Two nodes are placed together on a daughter card with one DIMM socket and a 4:1 Ethernet hub for off-card communications. The daughter cards have two connectors, one carrying the internode communications network and one carrying power, Ethernet, clock and other house keeping facilities.

Thirty-two daughter cards are placed in two rows on a motherboard that supports 800 Mbit/s off-board Ethernet communications. Eight motherboards are placed in crates with two backplanes supporting four motherboards each. Each crate consists of 512 processor nodes a and a 26 hypercube communications network. One node consumes about 5 W of power, and each crate is air and water cooled. A complete system can consist of any number of crates, for a total of up to several tens of thousands of nodes.

Operating system

The QCDOC runs a custom-built operating system, QOS, which facilitates boot, runtime, monitoring, diagnostics, and performance and simplifies management of the large number of computing nodes. It uses a custom embedded kernel and provides single process POSIX
POSIX
POSIX , an acronym for "Portable Operating System Interface", is a family of standards specified by the IEEE for maintaining compatibility between operating systems...

 ("unix-like") compatibility using the Cygnus newlib
Newlib
Newlib is a C standard library implementation intended for use on embedded systems. It is a conglomeration of several library parts, all under free software licenses that make them easily usable on embedded products....

 library. The kernel includes a specially written UDP
User Datagram Protocol
The User Datagram Protocol is one of the core members of the Internet Protocol Suite, the set of network protocols used for the Internet. With UDP, computer applications can send messages, in this case referred to as datagrams, to other hosts on an Internet Protocol network without requiring...

/IP
Internet Protocol
The Internet Protocol is the principal communications protocol used for relaying datagrams across an internetwork using the Internet Protocol Suite...

 stack and NFS client for disk access.

The operating system also maintains system partitions so several users can have access to separate parts of the system for different applications. Each partition will only run one client application at any given time. Any multitasking is scheduled by the host controller system which is a regular computer using a large amounts of Ethernet ports connecting to the QCDOC.

See also

  • Norman Christ
    Norman Christ
    Norman Howard Christ is a physicist and a professor at Columbia University, where he holds the Ephraim Gildor Professorship of Computational Theoretical Physics. He graduated Salutatorian with an undergraduate degree in physics from Columbia in 1965 and received his Ph.D. from the same institution...

  • PowerPC 440
  • BlueGene/L
  • QPACE
    QPACE
    QPACE is pursuing the development of a massive parallel, scalable supercomputer for applications in lattice quantum chromodynamics . The machine structure is a three-dimensional torus of identical processing nodes, based on IBM's PowerXCell 8i processors...

  • Power Architecture
    Power Architecture
    Power Architecture is a broad term to describe similar RISC instruction sets for microprocessors developed and manufactured by such companies as IBM, Freescale, AMCC, Tundra and P.A. Semi...

  • Supercomputer
    Supercomputer
    A supercomputer is a computer at the frontline of current processing capacity, particularly speed of calculation.Supercomputers are used for highly calculation-intensive tasks such as problems including quantum physics, weather forecasting, climate research, molecular modeling A supercomputer is a...

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK