Kentucky Linux Athlon Testbed
Encyclopedia
The Kentucky Linux Athlon Testbed (KLAT2) is a 64+2 node Beowulf cluster built by the University of Kentucky
College of Engineering
in 2000. The cluster used entirely off the shelf
components. It is capable of over 64 GFLOPS using ScaLAPACK
, and approximately 22.8 GFLOPS using the standard untuned/uncustomized 80/64-bit version. Those numbers represent actual performance by the KLAT2 supercomputer; the theoretical maxima are calculated at 179 and 89 GFLOPS for the 32-bit and 80/64-bit versions, respectively. At a total cost of $41,205 USD, it was one of the first two supercomputers to bring supercomputer power under the $1,000 USD per GFLOPS cost barrier.
instruction set (AMD's answer to the Intel's MMX technology) allowed for better processor operations for high-end mathematical computing.
The cluster contained 64 primary systems with 2 "hot spare" nodes, all of which contained this basic hardware:
o One 700MHz AMD Athlon Slot A module and dual-fan heat sink
o 128MB CAS2 PC100 SDRAM
o FIC SD11 motherboard
o Four RealTek-based Fast Ethernet NICs
o Floppy drive (for net boot code)
o 300W power supply and mid-tower case with extra fan
After some performance testing, it was determined by the Aggregate that the ideal solution for clustering the computers would be to create a Flat Neighborhood Network
instead of using gigabit Ethernet
because of cost concerns. The four Fast Ethernet
cards in each node created similar performance to gigabit Ethernet after overhead concerns were taken into account.
The cluster also required 10 32-port switches (one of which was an uplink) and over 264 CAT5 cables to connect all of the systems. All of the systems were powered by Red Hat Linux 6.0, with an updated kernel to support the Message Passing Interface
.
The KLAT2 project was remarkable because it was one of the first two supercomputers/clustered computer networks to bring the cost of processing to under $1,000 USD per GFLOPS. Although the exact timetables are not clear, KLAT2 and Bunyip
(the Beowulf cluster created by the Australian National University in Canberra) were built and brought online at roughly the same time. While Bunyip was the first to officially pass the mark, KLAT2's performance was not officially measured until well after it had surpassed the $1,000/GFLOPS mark. Since Bunyip only surpassed the $1k/GFLOPS margin by approximately 2%, strong fluctuations in the exchange rate between US Dollars and Australian Dollars can cause it to be temporarily out of contention. Also, the KLAT2 project used standard benchmarking software, while Bunyip used a customized version that was specifically tuned to the hardware being used. Given all the caveats and disclaimers, the two supercomputers basically share credit for breaking the $1,000 USD/GFLOPS mark.
The Flat Neighborhood Network
design is incredibly complex. Only small design problems can be tackled by hand, due to the scaling complexities involved with adding multiple network cards with large numbers of processing units and switches. With 66 machines each having 4 network cards, the KLAT2 network had 264 network cards with which to make single-hop paths between any two given computers. The network also had to be optimized for the specific network traffic it would be carrying.
On top of the design issues, there are a several problems with the Flat Neighborhood Networks with regards to wiring. The designs often have no symmetry in their wiring schemes, which requires certain site design properties for the FNN to work. On top of that, issues needed to be dealt with regarding the routing properties of the network. Since the common switch between two computers is often different, asking Computer X for the IP address of Computer Y can yield a different result than asking Computer Z for the IP address of Computer Y. Finally, the standard Linux channel bonding features typically used for clustered computing do not work with FNN topologies.
The upsides to the difficulty in setting up a Flat Neighborhood Network is that, given the right configuration and components, it is much more cost effective to set up. The hardware for the KLAT2 network ran at approximately $8100 USD and provideded a bisected bandwidth of 25GB/s. This was far more effective than going with Gigabit Ethernet (which, at the time, was also far more expensive), and also provided similar performance to channel bonding for a fraction of the cost.
The Aggregate: Flat Neighborhood Networks
University of Kentucky
The University of Kentucky, also known as UK, is a public co-educational university and is one of the state's two land-grant universities, located in Lexington, Kentucky...
College of Engineering
University of Kentucky College of Engineering
The University of Kentucky College of Engineering is an ABET accredited, public engineering school located on the campus of the University of Kentucky...
in 2000. The cluster used entirely off the shelf
Commercial off-the-shelf
In the United States, Commercially available Off-The-Shelf is a Federal Acquisition Regulation term defining a nondevelopmental item of supply that is both commercial and sold in substantial quantities in the commercial marketplace, and that can be procured or utilized under government contract...
components. It is capable of over 64 GFLOPS using ScaLAPACK
ScaLAPACK
The ScaLAPACK library includes a subset of LAPACK routines redesigned for distributed memory MIMD parallel computers. It is currently written in a Single-Program-Multiple-Data style using explicit message passing for interprocessor communication...
, and approximately 22.8 GFLOPS using the standard untuned/uncustomized 80/64-bit version. Those numbers represent actual performance by the KLAT2 supercomputer; the theoretical maxima are calculated at 179 and 89 GFLOPS for the 32-bit and 80/64-bit versions, respectively. At a total cost of $41,205 USD, it was one of the first two supercomputers to bring supercomputer power under the $1,000 USD per GFLOPS cost barrier.
Specifications
The entire cluster was based on readily-available off the shelf hardware. After doing some tests on what the most effective hardware would be, the Aggregate (the University of Kentucky research group who was responsible for the project) decided to go with 700 MHz AMD Athlon processors. This decision was made because the 3DNow!3DNow!
3DNow! is an extension to the x86 instruction set developed by Advanced Micro Devices . It adds single instruction multiple data instructions to the base x86 instruction set, enabling it to perform simple vector processing, which improves the performance of many graphic-intensive applications...
instruction set (AMD's answer to the Intel's MMX technology) allowed for better processor operations for high-end mathematical computing.
The cluster contained 64 primary systems with 2 "hot spare" nodes, all of which contained this basic hardware:
o One 700MHz AMD Athlon Slot A module and dual-fan heat sink
o 128MB CAS2 PC100 SDRAM
o FIC SD11 motherboard
o Four RealTek-based Fast Ethernet NICs
o Floppy drive (for net boot code)
o 300W power supply and mid-tower case with extra fan
After some performance testing, it was determined by the Aggregate that the ideal solution for clustering the computers would be to create a Flat Neighborhood Network
Flat Neighborhood Network
Flat Neighborhood Network is a topology for distributed computing and other computer networks. Each node connects to two or more switches which, ideally, entirely cover the node collection, so that each node can connect to any other node in two "hops"...
instead of using gigabit Ethernet
Gigabit Ethernet
Gigabit Ethernet is a term describing various technologies for transmitting Ethernet frames at a rate of a gigabit per second , as defined by the IEEE 802.3-2008 standard. It came into use beginning in 1999, gradually supplanting Fast Ethernet in wired local networks where it performed...
because of cost concerns. The four Fast Ethernet
Fast Ethernet
In computer networking, Fast Ethernet is a collective term for a number of Ethernet standards that carry traffic at the nominal rate of 100 Mbit/s, against the original Ethernet speed of 10 Mbit/s. Of the fast Ethernet standards 100BASE-TX is by far the most common and is supported by the...
cards in each node created similar performance to gigabit Ethernet after overhead concerns were taken into account.
The cluster also required 10 32-port switches (one of which was an uplink) and over 264 CAT5 cables to connect all of the systems. All of the systems were powered by Red Hat Linux 6.0, with an updated kernel to support the Message Passing Interface
Message Passing Interface
Message Passing Interface is a standardized and portable message-passing system designed by a group of researchers from academia and industry to function on a wide variety of parallel computers...
.
Costs
Although the AMD Athlon CPUs used for this project were donated by AMD, the Aggregate compiled a list of the costs that were accrued in purchasing all the parts, and added in a market-value cost for all of the CPUs that were donated by AMD. Overall, the entire project cost approximately $41,205, with the primary costs being roughly $13,200 in processors, $8,100 in the network, $6,900 in motherboards, and $6,200 in memory.The KLAT2 project was remarkable because it was one of the first two supercomputers/clustered computer networks to bring the cost of processing to under $1,000 USD per GFLOPS. Although the exact timetables are not clear, KLAT2 and Bunyip
Bunyip
The bunyip, or kianpraty, is a large mythical creature from Aboriginal mythology, said to lurk in swamps, billabongs, creeks, riverbeds, and waterholes....
(the Beowulf cluster created by the Australian National University in Canberra) were built and brought online at roughly the same time. While Bunyip was the first to officially pass the mark, KLAT2's performance was not officially measured until well after it had surpassed the $1,000/GFLOPS mark. Since Bunyip only surpassed the $1k/GFLOPS margin by approximately 2%, strong fluctuations in the exchange rate between US Dollars and Australian Dollars can cause it to be temporarily out of contention. Also, the KLAT2 project used standard benchmarking software, while Bunyip used a customized version that was specifically tuned to the hardware being used. Given all the caveats and disclaimers, the two supercomputers basically share credit for breaking the $1,000 USD/GFLOPS mark.
Flat Neighborhood Network
Because of the large amount of network traffic being passed by the computers using the Message Passing Interface, it was important that the appropriate network topology be used to connect all of the various machines. This meant creating a mesh network where, while every machine could not connect directly to every other machine, it could connect with a common switch which would then connect it to every other machine.The Flat Neighborhood Network
Flat Neighborhood Network
Flat Neighborhood Network is a topology for distributed computing and other computer networks. Each node connects to two or more switches which, ideally, entirely cover the node collection, so that each node can connect to any other node in two "hops"...
design is incredibly complex. Only small design problems can be tackled by hand, due to the scaling complexities involved with adding multiple network cards with large numbers of processing units and switches. With 66 machines each having 4 network cards, the KLAT2 network had 264 network cards with which to make single-hop paths between any two given computers. The network also had to be optimized for the specific network traffic it would be carrying.
On top of the design issues, there are a several problems with the Flat Neighborhood Networks with regards to wiring. The designs often have no symmetry in their wiring schemes, which requires certain site design properties for the FNN to work. On top of that, issues needed to be dealt with regarding the routing properties of the network. Since the common switch between two computers is often different, asking Computer X for the IP address of Computer Y can yield a different result than asking Computer Z for the IP address of Computer Y. Finally, the standard Linux channel bonding features typically used for clustered computing do not work with FNN topologies.
The upsides to the difficulty in setting up a Flat Neighborhood Network is that, given the right configuration and components, it is much more cost effective to set up. The hardware for the KLAT2 network ran at approximately $8100 USD and provideded a bisected bandwidth of 25GB/s. This was far more effective than going with Gigabit Ethernet (which, at the time, was also far more expensive), and also provided similar performance to channel bonding for a fraction of the cost.
External links
The Aggregate: KLAT2The Aggregate: Flat Neighborhood Networks