BrookGPU
Encyclopedia
BrookGPU is the Stanford University
Stanford University
The Leland Stanford Junior University, commonly referred to as Stanford University or Stanford, is a private research university on an campus located near Palo Alto, California. It is situated in the northwestern Santa Clara Valley on the San Francisco Peninsula, approximately northwest of San...

 graphics group's compiler and runtime implementation of the Brook stream programming
Stream processing
Stream processing is a computer programming paradigm, related to SIMD , that allows some applications to more easily exploit a limited form of parallel processing...

 language for using modern graphics hardware for non-graphical, general purpose computations
GPGPU
General-purpose computing on graphics processing units is the technique of using a GPU, which typically handles computation only for computer graphics, to perform computation in applications traditionally handled by the CPU...

. It can be used to program highly parallel GPU
Graphics processing unit
A graphics processing unit or GPU is a specialized circuit designed to rapidly manipulate and alter memory in such a way so as to accelerate the building of images in a frame buffer intended for output to a display...

s such as those found on ATI
ATI Technologies
ATI Technologies Inc. was a semiconductor technology corporation based in Markham, Ontario, Canada, that specialized in the development of graphics processing units and chipsets. Founded in 1985 as Array Technologies Inc., the company was listed publicly in 1993 and was acquired by Advanced Micro...

 or Nvidia
NVIDIA
Nvidia is an American global technology company based in Santa Clara, California. Nvidia is best known for its graphics processors . Nvidia and chief rival AMD Graphics Techonologies have dominated the high performance GPU market, pushing other manufacturers to smaller, niche roles...

 graphics cards or increasingly Intel's integrated graphics solutions.

BrookGPU compiles programs written using the Brook stream programming language, which is a variant of ANSI C
ANSI C
ANSI C refers to the family of successive standards published by the American National Standards Institute for the C programming language. Software developers writing in C are encouraged to conform to the standards, as doing so aids portability between compilers.-History and outlook:The first...

. It can use OpenGL
OpenGL
OpenGL is a standard specification defining a cross-language, cross-platform API for writing applications that produce 2D and 3D computer graphics. The interface consists of over 250 different function calls which can be used to draw complex three-dimensional scenes from simple primitives. OpenGL...

 v1.3+, DirectX
DirectX
Microsoft DirectX is a collection of application programming interfaces for handling tasks related to multimedia, especially game programming and video, on Microsoft platforms. Originally, the names of these APIs all began with Direct, such as Direct3D, DirectDraw, DirectMusic, DirectPlay,...

 v9+ or AMD's Close to Metal
Close to Metal
Close To Metal is the name of a beta version of a low-level programming interface developed by ATI , aimed at enabling GPGPU computing...

 for the computational backend and runs on both Microsoft Windows
Microsoft Windows
Microsoft Windows is a series of operating systems produced by Microsoft.Microsoft introduced an operating environment named Windows on November 20, 1985 as an add-on to MS-DOS in response to the growing interest in graphical user interfaces . Microsoft Windows came to dominate the world's personal...

, Linux
Linux
Linux is a Unix-like computer operating system assembled under the model of free and open source software development and distribution. The defining component of any Linux system is the Linux kernel, an operating system kernel first released October 5, 1991 by Linus Torvalds...

 and possibly Mac OS X
Mac OS X
Mac OS X is a series of Unix-based operating systems and graphical user interfaces developed, marketed, and sold by Apple Inc. Since 2002, has been included with all new Macintosh computer systems...

. It can also simulate a virtual graphics card by itself via a special CPU
Central processing unit
The central processing unit is the portion of a computer system that carries out the instructions of a computer program, to perform the basic arithmetical, logical, and input/output operations of the system. The CPU plays a role somewhat analogous to the brain in the computer. The term has been in...

 backend which is useful for debugging Brook kernels.

Unlike most increasingly proprietary GPGPU frameworks currently available, Brook is licensed under the BSD license (parts are under the GNU General Public License
GNU General Public License
The GNU General Public License is the most widely used free software license, originally written by Richard Stallman for the GNU Project....

) and is free software. This makes it ideal for students interested in GPGPU programming without having to delve into OpenGL or DirectX implementation details.

Status

Brook has been in beta for a long time. The last major beta release (v0.4) was in October 2004 but renewed development began and stopped again in November 2007 with a v0.5 beta 1 release.

The new features of v0.5 include a much upgraded and faster OpenGL
OpenGL
OpenGL is a standard specification defining a cross-language, cross-platform API for writing applications that produce 2D and 3D computer graphics. The interface consists of over 250 different function calls which can be used to draw complex three-dimensional scenes from simple primitives. OpenGL...

 backend which uses framebuffer objects instead of PBuffers and harmonised the code around standard OpenGL interfaces instead of using proprietary vendor extensions. GLSL
GLSL
OpenGL Shading Language , is a high-level shading language based on the syntax of the C programming language...

 support was added which brings all the functionality (complex branching and loops) previously only supported by DirectX 9 to OpenGL. In particular, this means that Brook is now just as capable on Linux
Linux
Linux is a Unix-like computer operating system assembled under the model of free and open source software development and distribution. The defining component of any Linux system is the Linux kernel, an operating system kernel first released October 5, 1991 by Linus Torvalds...

 as Windows
Microsoft Windows
Microsoft Windows is a series of operating systems produced by Microsoft.Microsoft introduced an operating environment named Windows on November 20, 1985 as an add-on to MS-DOS in response to the growing interest in graphical user interfaces . Microsoft Windows came to dominate the world's personal...

.

Other improvements in the v0.5 series include multi-backend usage whereby different threads can run different Brook programs concurrently (this allows a multi-GPU setup to be maxed out) and SSE
Streaming SIMD Extensions
In computing, Streaming SIMD Extensions is a SIMD instruction set extension to the x86 architecture, designed by Intel and introduced in 1999 in their Pentium III series processors as a reply to AMD's 3DNow! . SSE contains 70 new instructions, most of which work on single precision floating point...

 and OpenMP
OpenMP
OpenMP is an API that supports multi-platform shared memory multiprocessing programming in C, C++, and Fortran, on most processor architectures and operating systems, including Linux, Unix, AIX, Solaris, Mac OS X, and Microsoft Windows platforms...

 support for the CPU backend (this allows near maximal usage of modern CPUs).

Later versions might include Brook+, a version supporting integer and double precision processing for AMD GPUs, possibly with scatter support.

Performance Comparison

A like for like comparison between desktop CPUs and GPGPUs is problematic because of algorithmic & structural differences.

For example, a 2.66 GHz Intel Core 2 Duo can perform a maximum of 25 GFLOPs
FLOPS
In computing, FLOPS is a measure of a computer's performance, especially in fields of scientific calculations that make heavy use of floating-point calculations, similar to the older, simpler, instructions per second...

 (25 billion single-precision floating-point operations per second) if optimally using SSE and streaming memory access so the prefetcher works perfectly. However, traditionally (due to shader program length limits) most GPGPU kernels tend to perform relatively small amounts of work on large amounts of data in parallel, so the big problem with directly executing GPGPU algorithms on desktop CPUs is vastly lower memory bandwidth as generally speaking the CPU spends most of its time waiting on RAM
Random-access memory
Random access memory is a form of computer data storage. Today, it takes the form of integrated circuits that allow stored data to be accessed in any order with a worst case performance of constant time. Strictly speaking, modern types of DRAM are therefore not random access, as data is read in...

. As an example, dual-channel PC2-6400 DDR2 RAM can throughput about 11 Gb/sec which is around 1.5 GFLOPs maximum given that there is a total of 3 GFLOPs total bandwidth and one must both read and write. As a result, if memory bandwidth constrained, Brook's CPU backend won't exceed 2 GFLOPs. In practice, it's even lower than that most especially for anything other than float4 which is the only data type which can be SSE accelerated.

On an ATI HD 2900 XT
Radeon R600
The graphics processing unit codenamed the Radeon R600 is the foundation of the Radeon HD 2000/3000 series and the FireGL 2007 series video cards developed by ATI Technologies...

 (740 MHz core 1000 MHz memory), Brook can perform a maximum of 410 GFLOPs via its DirectX 9 backend. OpenGL is currently (due to driver and Cg compiler limitations) much less efficient as a GPGPU backend and Brook can only manage 210 GFLOPs via OpenGL. On paper, this looks like around twenty times faster than the CPU, but as just explained it isn't as easy as that. GPUs currently have major branch and read/write access penalties so expect a reasonable maximum of one third of the peak maximum in real world code - this still leaves that ATI card at around 125 GFLOPs some five times faster than the Intel Core 2 Duo.

However this discounts the important part of transferring the data to be processed to and from the GPU. With a PCI Express
PCI Express
PCI Express , officially abbreviated as PCIe, is a computer expansion card standard designed to replace the older PCI, PCI-X, and AGP bus standards...

 1.0 x8 interface, the memory of an ATI HD 2900 XT can be written to at about 730 Mb/sec and read from at about 311 Mb/sec which is significantly slower than normal PC memory. For large datasets, this can greatly diminish the speed increase of using a GPU over a well-tuned CPU implementation. Of course, as GPUs become faster far more quickly than CPUs and the PCI Express interface improves, it will make more sense to offload large processing to GPUs.

See also

  • GPGPU
    GPGPU
    General-purpose computing on graphics processing units is the technique of using a GPU, which typically handles computation only for computer graphics, to perform computation in applications traditionally handled by the CPU...

  • CUDA
    CUDA
    CUDA or Compute Unified Device Architecture is a parallel computing architecture developed by Nvidia. CUDA is the computing engine in Nvidia graphics processing units that is accessible to software developers through variants of industry standard programming languages...

  • Close to Metal
    Close to Metal
    Close To Metal is the name of a beta version of a low-level programming interface developed by ATI , aimed at enabling GPGPU computing...

  • OpenCL
    OpenCL
    OpenCL is a framework for writing programs that execute across heterogeneous platforms consisting of CPUs, GPUs, and other processors. OpenCL includes a language for writing kernels , plus APIs that are used to define and then control the platforms...

  • Lib Sh
    Lib Sh
    Sh is a metaprogramming language for programmable GPUs. Programmable GPUs are graphics processing units that execute some operations with higher efficiency than CPUs...

  • Intel Ct
    Intel Ct
    Intel Ct is a programming model developed by Intel to ease the exploitation of its future multicore chips, as demonstrated by the Tera-Scale research program.It is based on the exploitation of SIMD to produce automatically parallelized programs....


External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK