GPGPU
Encyclopedia
General-purpose computing on graphics processing units (GPGPU, also referred to as GPGP and less often GP²U) is the technique of using a GPU
, which typically handles computation only for computer graphics
, to perform computation in applications traditionally handled by the CPU
. It is made possible by the addition of programmable stages and higher precision arithmetic to the rendering pipelines, which allows programmer
s to use stream processing
on non-graphics data. Additionally, the use of multiple graphics cards in a single computer, or large numbers of graphics chips, further parallelizes the already parallel nature of graphics processing
. Some improvements were needed before GPGPU became feasible.
, or per-pixel. Programmable fragment shaders allow the programmer to substitute, for example, a lighting model other than those provided by default by the graphics card, typically simple Gouraud shading
. Shaders have enabled graphics programmers to create lens effects, displacement mapping
, and depth of field
.
The programmability of the pipelines have trended according to Microsoft’s DirectX
specification , with DirectX 8 introducing Shader Model 1.1, DirectX 8.1 Pixel Shader Models 1.2, 1.3 and 1.4, and DirectX 9 defining Shader Model 2.x and 3.0. Each shader model increased the programming model flexibilities and capabilities, ensuring the conforming hardware follows suit. The DirectX 10 specification introduces Shader Model 4.0 which unifies the programming specification for vertex, geometry (“Geometry Shaders” are new to DirectX 10) and fragment processing
allowing for a better fit for unified shader hardware, thus providing a single computational pool of programmable resource.
or integer color types. Various formats are available, each containing a red element, a green element, and a blue element. Sometimes an additional alpha value is added, to be used for transparency. Common formats are:
For early fixed-function or limited programmability graphics (i.e. up to and including DirectX 8.1-compliant GPUs) this was sufficient because this is also the representation used in displays. This representation does have certain limitations, however. Given sufficient graphics processing power even graphics programmers would like to use better formats, such as floating point
data formats, in order to obtain effects such as high dynamic range imaging
. Many GPGPU applications require floating point accuracy, which came with graphics cards conforming to the DirectX 9 specification.
DirectX 9 Shader Model 2.x suggested the support of two precision types: full and partial precision. Full precision support could either be FP32 and FP24 (floating point 24-bit per component) or greater, while partial precision was FP16. ATI’s
R300 series
of GPUs supported FP24 precision only in the programmable fragment pipeline (although FP32 was supported in the vertex processors) while Nvidia
’s NV30
series supported both FP16 and FP32; other vendors such as S3 Graphics
and XGI
supported a mixture of formats up to FP24.
Shader Model 3.0 altered the specification, increasing full precision requirements to a minimum of FP32 support in the fragment pipeline. ATI’s Shader Model 3.0 compliant R5xx generation (Radeon X1000 series
) supports just FP32 throughout the pipeline while Nvidia’s NV4x
and G7x
series continued to support both FP32 full precision and FP16 partial precisions. Although not stipulated by Shader Model 3.0, both ATI and Nvidia’s Shader Model 3.0 GPUs introduced support for blendable FP16 render targets, more easily facilitating the support for High Dynamic Range Rendering.
The implementations of floating point on Nvidia GPUs are mostly IEEE
compliant; however, this is not true across all vendors. This has implications for correctness which are considered important to some scientific applications. While 64-bit floating point values (double precision float) are commonly available on CPUs, these are not universally supported on GPUs; some GPU architectures sacrifice IEEE compliance while others lack double-precision altogether. There have been efforts to emulate double-precision floating point values on GPUs; however, the speed tradeoff negates any benefit to offloading the computation onto the GPU in the first place.
Most operations on the GPU operate in a vectorized fashion: a single operation can be performed on up to four values at once. For instance, if one color is to be modulated by another color , the GPU can produce the resulting color in a single operation. This functionality is useful in graphics because almost every basic data type is a vector (either 2-, 3-, or 4-dimensional). Examples include vertices, colors, normal vectors, and texture coordinates. Many other applications can put this to good use, and because of their higher performance, vector instructions (SIMD
) have long been available on CPUs.
In 2002 Fung etal developed OpenVIDIA at University of Toronto, and demonstrated this work, which was later published in 2003, 2004, and 2005, in conjunction with a collaboration between University of Toronto and nVIDIA.
In November 2006 Nvidia launched CUDA
, an SDK and API that allows a programmer to use the C programming language to code algorithms for execution on Geforce 8 series GPUs. . OpenCL
, an open standard defined by the Khronos Group
provides a cross platform GPGPU platform that additionally supports data parallel compute on CPUs. OpenCL is actively supported on Intel, AMD, Nvidia and Arm platforms. GPGPU compared, for example, to traditional floating point
accelerators such as the 64-bit CSX700 boards from ClearSpeed
that are used in today's supercomputers, current top-end GPUs from AMD and Nvidia emphasize single-precision (32-bit) computation; double-precision (64-bit) computation executes more slowly.
and the hardware can only be used in certain ways.
A stream is simply a set of records that require similar computation. Streams provide data parallelism. Kernels are the functions that are applied to each element in the stream. In the GPUs, vertices and fragments are the elements in streams and vertex and fragment shaders are the kernels to be run on them. Since GPUs process elements independently there is no way to have shared or static data. For each element we can only read from the input, perform operations on it, and write to the output. It is permissible to have multiple inputs and multiple outputs, but never a piece of memory that is both readable and writable.
Arithmetic intensity is defined as the number of operations performed per word of memory transferred. It is important for GPGPU applications to have high arithmetic intensity else the memory access latency will limit computational speedup.
Ideal GPGPU applications have large data sets, high parallelism, and minimal dependency between data elements.
In fact, the programmer can substitute a write only texture for output instead of the framebuffer. This is accomplished either through Render to Texture (RTT), Render-To-Backbuffer-Copy-To-Texture (RTBCTT), or the more recent stream-out.
Since textures are used as memory, texture lookups are then used as memory reads. Certain operations can be done automatically by the GPU because of this.
On the GPU, the programmer only specifies the body of the loop as the kernel and what data to loop over by invoking geometry processing.
In sequential code it is possible to control the flow of the program using if-then-else statements and various forms of loops. Such flow control structures have only recently been added to GPUs. Conditional writes could be accomplished using a properly crafted series of arithmetic/bit operations, but looping and conditional branching were not possible.
Recent GPUs allow branching, but usually with a performance penalty. Branching should generally be avoided in inner loops, whether in CPU or GPU code, and various techniques, such as static branch resolution, pre-computation, predication, loop splitting, and Z-cull can be used to achieve branching when hardware support does not exist.
, which allows the programmer to control where information is deposited on the grid. Other extensions are also possible, such as controlling how large an area the vertex affects.
The fragment processor cannot perform a direct scatter operation because the location of each fragment on the grid is fixed at the time of the fragment's creation and cannot be altered by the programmer. However, a logical scatter operation may sometimes be recast or implemented with an additional gather step. A scatter implementation would first emit both an output value and an output address. An immediately following gather operation uses address comparisons to see whether the output value maps to the current output slot.
Graphics processing unit
A graphics processing unit or GPU is a specialized circuit designed to rapidly manipulate and alter memory in such a way so as to accelerate the building of images in a frame buffer intended for output to a display...
, which typically handles computation only for computer graphics
Computer graphics
Computer graphics are graphics created using computers and, more generally, the representation and manipulation of image data by a computer with help from specialized software and hardware....
, to perform computation in applications traditionally handled by the CPU
Central processing unit
The central processing unit is the portion of a computer system that carries out the instructions of a computer program, to perform the basic arithmetical, logical, and input/output operations of the system. The CPU plays a role somewhat analogous to the brain in the computer. The term has been in...
. It is made possible by the addition of programmable stages and higher precision arithmetic to the rendering pipelines, which allows programmer
Programmer
A programmer, computer programmer or coder is someone who writes computer software. The term computer programmer can refer to a specialist in one area of computer programming or to a generalist who writes code for many kinds of software. One who practices or professes a formal approach to...
s to use stream processing
Stream processing
Stream processing is a computer programming paradigm, related to SIMD , that allows some applications to more easily exploit a limited form of parallel processing...
on non-graphics data. Additionally, the use of multiple graphics cards in a single computer, or large numbers of graphics chips, further parallelizes the already parallel nature of graphics processing
GPU improvements
GPU functionality has, traditionally, been very limited. In fact, for many years the GPU was only used to accelerate certain parts of the graphics pipelineGraphics pipeline
In 3D computer graphics, the terms graphics pipeline or rendering pipeline most commonly refers to the current state of the art method of rasterization-based rendering as supported by commodity graphics hardware. The graphics pipeline typically accepts some representation of a three-dimensional...
. Some improvements were needed before GPGPU became feasible.
Programmability
Programmable vertex and fragment shaders were added to the graphics pipeline to enable game programmers to generate even more realistic effects. Vertex shaders allow the programmer to alter per-vertex attributes, such as position, color, texture coordinates, and normal vector. Fragment shaders are used to calculate the color of a fragmentFragment (computer graphics)
In computer graphics, a fragment is the data necessary to generate a single pixel's worth of a drawing primitive in the frame buffer.This data may include, but is not limited to:* raster position* depth...
, or per-pixel. Programmable fragment shaders allow the programmer to substitute, for example, a lighting model other than those provided by default by the graphics card, typically simple Gouraud shading
Gouraud shading
Gouraud shading, named after Henri Gouraud, is an interpolation method used in computer graphics to produce continuous shading of surfaces represented by polygon meshes...
. Shaders have enabled graphics programmers to create lens effects, displacement mapping
Displacement mapping
Displacement mapping is an alternative computer graphics technique in contrast to bump mapping, normal mapping, and parallax mapping, using a texture- or height map to cause an effect where the actual geometric position of points over the textured surface are displaced, often along the local...
, and depth of field
Depth of field
In optics, particularly as it relates to film and photography, depth of field is the distance between the nearest and farthest objects in a scene that appear acceptably sharp in an image...
.
The programmability of the pipelines have trended according to Microsoft’s DirectX
DirectX
Microsoft DirectX is a collection of application programming interfaces for handling tasks related to multimedia, especially game programming and video, on Microsoft platforms. Originally, the names of these APIs all began with Direct, such as Direct3D, DirectDraw, DirectMusic, DirectPlay,...
specification , with DirectX 8 introducing Shader Model 1.1, DirectX 8.1 Pixel Shader Models 1.2, 1.3 and 1.4, and DirectX 9 defining Shader Model 2.x and 3.0. Each shader model increased the programming model flexibilities and capabilities, ensuring the conforming hardware follows suit. The DirectX 10 specification introduces Shader Model 4.0 which unifies the programming specification for vertex, geometry (“Geometry Shaders” are new to DirectX 10) and fragment processing
Fragment processing
Fragment processing is a term in computer graphics referring to a collection of operations applied to fragments generated by the rasterization operation in the rendering pipeline....
allowing for a better fit for unified shader hardware, thus providing a single computational pool of programmable resource.
Data types
Pre-DirectX 9 graphics cards only supported palettedPalette (computing)
In computer graphics, a palette is either a given, finite set of colors for the management of digital images , or a small on-screen graphical element for choosing from a limited set of choices, not necessarily colors .Depending on the context In computer graphics, a palette is either a given,...
or integer color types. Various formats are available, each containing a red element, a green element, and a blue element. Sometimes an additional alpha value is added, to be used for transparency. Common formats are:
- 8 bits per pixel – Sometimes PalettePalette (computing)In computer graphics, a palette is either a given, finite set of colors for the management of digital images , or a small on-screen graphical element for choosing from a limited set of choices, not necessarily colors .Depending on the context In computer graphics, a palette is either a given,...
mode, where each value is an index in a table with the real color value specified in one of the other formats. Sometimes two bits for red, three bits for green, and three bits for blue. - 16 bits per pixel – Usually allocated as five bits for red, six bits for green, and five bits for blue.
- 24 bits per pixel – eight bits for each of red, green, and blue
- 32 bits per pixel – eight bits for each of red, green, blue, and alpha
For early fixed-function or limited programmability graphics (i.e. up to and including DirectX 8.1-compliant GPUs) this was sufficient because this is also the representation used in displays. This representation does have certain limitations, however. Given sufficient graphics processing power even graphics programmers would like to use better formats, such as floating point
Floating point
In computing, floating point describes a method of representing real numbers in a way that can support a wide range of values. Numbers are, in general, represented approximately to a fixed number of significant digits and scaled using an exponent. The base for the scaling is normally 2, 10 or 16...
data formats, in order to obtain effects such as high dynamic range imaging
High dynamic range imaging
In image processing, computer graphics, and photography, high dynamic range imaging is a set of techniques that allows a greater dynamic range between the lightest and darkest areas of an image than current standard digital imaging techniques or photographic methods...
. Many GPGPU applications require floating point accuracy, which came with graphics cards conforming to the DirectX 9 specification.
DirectX 9 Shader Model 2.x suggested the support of two precision types: full and partial precision. Full precision support could either be FP32 and FP24 (floating point 24-bit per component) or greater, while partial precision was FP16. ATI’s
ATI Technologies
ATI Technologies Inc. was a semiconductor technology corporation based in Markham, Ontario, Canada, that specialized in the development of graphics processing units and chipsets. Founded in 1985 as Array Technologies Inc., the company was listed publicly in 1993 and was acquired by Advanced Micro...
R300 series
Radeon R300
The Radeon R300 is the third generation of Radeon graphics chips from ATI Technologies. The line features 3D acceleration based upon Direct3D 9.0 and OpenGL 2.0, a major improvement in features and performance compared to the preceding Radeon R200 design. R300 was the first fully Direct3D...
of GPUs supported FP24 precision only in the programmable fragment pipeline (although FP32 was supported in the vertex processors) while Nvidia
NVIDIA
Nvidia is an American global technology company based in Santa Clara, California. Nvidia is best known for its graphics processors . Nvidia and chief rival AMD Graphics Techonologies have dominated the high performance GPU market, pushing other manufacturers to smaller, niche roles...
’s NV30
GeForce FX
The GeForce FX or "GeForce 5" series is a line of graphics processing units from the manufacturer NVIDIA.-Overview:...
series supported both FP16 and FP32; other vendors such as S3 Graphics
S3 Graphics
S3 Graphics, Ltd is an American company specializing in graphics chipsets. Although they do not have the large market share that they once had, they still produce graphics accelerators for home computers under the "S3 Chrome" brand name.-History:...
and XGI
XGI Technology
XGI Technology Inc. is based upon the old graphics division of SiS spun off as a separate company, and the graphics assets of Trident Microsystems.-History:...
supported a mixture of formats up to FP24.
Shader Model 3.0 altered the specification, increasing full precision requirements to a minimum of FP32 support in the fragment pipeline. ATI’s Shader Model 3.0 compliant R5xx generation (Radeon X1000 series
Radeon R520
ATI's "R520" core is the foundation for a line of DirectX 9.0c and OpenGL 2.0 3D accelerator X1000 video cards. It is ATI's first major architectural overhaul since the "R300" core and is highly optimized for Shader Model 3.0. The Radeon X1000 series using the core was introduced on October 5,...
) supports just FP32 throughout the pipeline while Nvidia’s NV4x
GeForce 6 Series
The GeForce 6 Series is Nvidia's sixth generation of GeForce graphic processing units. Launched on April 14, 2004, the GeForce 6 family introduced PureVideo post-processing for video, SLI technology, and Shader Model 3.0 support .-GeForce 6 Series features:-SLI:The Scalable Link...
and G7x
GeForce 7 Series
The GeForce 7 Series is the seventh generation of Nvidia's GeForce graphics processing units.-Features:The following features are common to all models in the GeForce 7 series except the GeForce 7100, which lacks GCAA:-GeForce 7100 Series:...
series continued to support both FP32 full precision and FP16 partial precisions. Although not stipulated by Shader Model 3.0, both ATI and Nvidia’s Shader Model 3.0 GPUs introduced support for blendable FP16 render targets, more easily facilitating the support for High Dynamic Range Rendering.
The implementations of floating point on Nvidia GPUs are mostly IEEE
IEEE floating-point standard
IEEE 754–1985 was an industry standard for representingfloating-pointnumbers in computers, officially adopted in 1985 and superseded in 2008 byIEEE 754-2008. During its 23 years, it was the most widely used format for...
compliant; however, this is not true across all vendors. This has implications for correctness which are considered important to some scientific applications. While 64-bit floating point values (double precision float) are commonly available on CPUs, these are not universally supported on GPUs; some GPU architectures sacrifice IEEE compliance while others lack double-precision altogether. There have been efforts to emulate double-precision floating point values on GPUs; however, the speed tradeoff negates any benefit to offloading the computation onto the GPU in the first place.
Most operations on the GPU operate in a vectorized fashion: a single operation can be performed on up to four values at once. For instance, if one color
SIMD
Single instruction, multiple data , is a class of parallel computers in Flynn's taxonomy. It describes computers with multiple processing elements that perform the same operation on multiple data simultaneously...
) have long been available on CPUs.
In 2002 Fung etal developed OpenVIDIA at University of Toronto, and demonstrated this work, which was later published in 2003, 2004, and 2005, in conjunction with a collaboration between University of Toronto and nVIDIA.
In November 2006 Nvidia launched CUDA
CUDA
CUDA or Compute Unified Device Architecture is a parallel computing architecture developed by Nvidia. CUDA is the computing engine in Nvidia graphics processing units that is accessible to software developers through variants of industry standard programming languages...
, an SDK and API that allows a programmer to use the C programming language to code algorithms for execution on Geforce 8 series GPUs. . OpenCL
OpenCL
OpenCL is a framework for writing programs that execute across heterogeneous platforms consisting of CPUs, GPUs, and other processors. OpenCL includes a language for writing kernels , plus APIs that are used to define and then control the platforms...
, an open standard defined by the Khronos Group
Khronos Group
The Khronos Group is a not-for-profit member-funded industry consortium based in Beaverton, Oregon, focused on the creation of open standard, royalty-free APIs to enable the authoring and accelerated playback of dynamic media on a wide variety of platforms and devices...
provides a cross platform GPGPU platform that additionally supports data parallel compute on CPUs. OpenCL is actively supported on Intel, AMD, Nvidia and Arm platforms. GPGPU compared, for example, to traditional floating point
Floating point
In computing, floating point describes a method of representing real numbers in a way that can support a wide range of values. Numbers are, in general, represented approximately to a fixed number of significant digits and scaled using an exponent. The base for the scaling is normally 2, 10 or 16...
accelerators such as the 64-bit CSX700 boards from ClearSpeed
ClearSpeed
ClearSpeed Technology Ltd is a semiconductor company, formed in 2002 to develop enhanced SIMD processors for use in high-performance computing and embedded systems. Based in Bristol, UK, the company has been selling its processors since 2005...
that are used in today's supercomputers, current top-end GPUs from AMD and Nvidia emphasize single-precision (32-bit) computation; double-precision (64-bit) computation executes more slowly.
GPGPU programming concepts
GPUs are designed specifically for graphics and thus are very restrictive in terms of operations and programming. Because of their nature, GPUs are only effective at tackling problems that can be solved using stream processingStream processing
Stream processing is a computer programming paradigm, related to SIMD , that allows some applications to more easily exploit a limited form of parallel processing...
and the hardware can only be used in certain ways.
Stream processing
GPUs can only process independent vertices and fragments, but can process many of them in parallel. This is especially effective when the programmer wants to process many vertices or fragments in the same way. In this sense, GPUs are stream processors – processors that can operate in parallel by running a single kernel on many records in a stream at once.A stream is simply a set of records that require similar computation. Streams provide data parallelism. Kernels are the functions that are applied to each element in the stream. In the GPUs, vertices and fragments are the elements in streams and vertex and fragment shaders are the kernels to be run on them. Since GPUs process elements independently there is no way to have shared or static data. For each element we can only read from the input, perform operations on it, and write to the output. It is permissible to have multiple inputs and multiple outputs, but never a piece of memory that is both readable and writable.
Arithmetic intensity is defined as the number of operations performed per word of memory transferred. It is important for GPGPU applications to have high arithmetic intensity else the memory access latency will limit computational speedup.
Ideal GPGPU applications have large data sets, high parallelism, and minimal dependency between data elements.
Computational resources
There are a variety of computational resources available on the GPU:- Programmable processors – Vertex, primitive, and fragment pipelines allow programmer to perform kernel on streams of data
- Rasterizer – creates fragments and interpolates per-vertex constants such as texture coordinates and color
- Texture Unit – read only memory interface
- Framebuffer – write only memory interface
In fact, the programmer can substitute a write only texture for output instead of the framebuffer. This is accomplished either through Render to Texture (RTT), Render-To-Backbuffer-Copy-To-Texture (RTBCTT), or the more recent stream-out.
Textures as stream
The most common form for a stream to take in GPGPU is a 2D grid because this fits naturally with the rendering model built into GPUs. Many computations naturally map into grids: matrix algebra, image processing, physically based simulation, and so on.Since textures are used as memory, texture lookups are then used as memory reads. Certain operations can be done automatically by the GPU because of this.
Kernels
Kernels can be thought of as the body of loops. For example, if the programmer were operating on a grid on the CPU they might have code that looked like this:On the GPU, the programmer only specifies the body of the loop as the kernel and what data to loop over by invoking geometry processing.
Flow control
In sequential code it is possible to control the flow of the program using if-then-else statements and various forms of loops. Such flow control structures have only recently been added to GPUs. Conditional writes could be accomplished using a properly crafted series of arithmetic/bit operations, but looping and conditional branching were not possible.
Recent GPUs allow branching, but usually with a performance penalty. Branching should generally be avoided in inner loops, whether in CPU or GPU code, and various techniques, such as static branch resolution, pre-computation, predication, loop splitting, and Z-cull can be used to achieve branching when hardware support does not exist.
Map
The map operation simply applies the given function (the kernel) to every element in the stream. A simple example is multiplying each value in the stream by a constant (increasing the brightness of an image). The map operation is simple to implement on the GPU. The programmer generates a fragment for each pixel on screen and applies a fragment program to each one. The result stream of the same size is stored in the output buffer.Reduce
Some computations require calculating a smaller stream (possibly a stream of only 1 element) from a larger stream. This is called a reduction of the stream. Generally a reduction can be accomplished in multiple steps. The results from the previous step are used as the input for the current step and the range over which the operation is applied is reduced until only one stream element remains.Stream filtering
Stream filtering is essentially a non-uniform reduction. Filtering involves removing items from the stream based on some criteria.Scatter
The scatter operation is most naturally defined on the vertex processor. The vertex processor is able to adjust the position of the vertexVertex (geometry)
In geometry, a vertex is a special kind of point that describes the corners or intersections of geometric shapes.-Of an angle:...
, which allows the programmer to control where information is deposited on the grid. Other extensions are also possible, such as controlling how large an area the vertex affects.
The fragment processor cannot perform a direct scatter operation because the location of each fragment on the grid is fixed at the time of the fragment's creation and cannot be altered by the programmer. However, a logical scatter operation may sometimes be recast or implemented with an additional gather step. A scatter implementation would first emit both an output value and an output address. An immediately following gather operation uses address comparisons to see whether the output value maps to the current output slot.
Gather
The fragment processor is able to read textures in a random access fashion, so it can gather information from any grid cell, or multiple grid cells, as desired.Sort
The sort operation transforms an unordered set of elements into an ordered set of elements. The most common implementation on GPUs is using sorting networks.Search
The search operation allows the programmer to find a particular element within the stream, or possibly find neighbors of a specified element. The GPU is not used to speed up the search for an individual element, but instead is used to run multiple searches in parallel.Data structures
A variety of data structures can be represented on the GPU:- Dense arrays
- Sparse arraySparse arrayIn computer science, a sparse array is an array in which most of the elements have the same value . The occurrence of zero elements in a large array is inconvenient for both computation and storage...
s – static or dynamic - Adaptive structures
Applications
The following are some of the areas where GPUs have been used for general purpose computing:- BitcoinBitcoinBitcoin is a decentralized, peer-to-peer network over which users make transactions that are tracked and verified through this network. The word Bitcoin also refers to the digital currency implemented as the currency medium for user transactions over this network...
peer-to-peer currency relies on a distributed computing network for performing SHA256 calculations where GPGPUs have become the dominant mode of calculation - MATLABMATLABMATLAB is a numerical computing environment and fourth-generation programming language. Developed by MathWorks, MATLAB allows matrix manipulations, plotting of functions and data, implementation of algorithms, creation of user interfaces, and interfacing with programs written in other languages,...
acceleration using the Parallel Computing Toolbox and MATLAB Distributed Computing Server, as well as 3rd party packages like JacketJacket (software)Jacket is a numerical computing platform enabling GPU acceleration of MATLAB-based codes. Developed by AccelerEyes, Jacket allows GPU-based matrix manipulations, plotting of functions and data, implementation of algorithms, and interfacing with programs written in other languages, including C, C++,...
. - k-nearest neighbor algorithmK-nearest neighbor algorithmIn pattern recognition, the k-nearest neighbor algorithm is a method for classifying objects based on closest training examples in the feature space. k-NN is a type of instance-based learning, or lazy learning where the function is only approximated locally and all computation is deferred until...
- Computer clusters or a variation of a parallel computingParallel computingParallel computing is a form of computation in which many calculations are carried out simultaneously, operating on the principle that large problems can often be divided into smaller ones, which are then solved concurrently . There are several different forms of parallel computing: bit-level,...
(utilizing GPU clusterGpu clusterA GPU cluster is a computer cluster in which each node is equipped with a Graphics Processing Unit . By harnessing the computational power of modern GPUs via General-Purpose Computing on Graphics Processing Units , very fast calculations can be performed with a GPU cluster.- Hardware :The hardware...
technology) for highly calculation-intensive tasks:- High-performance computing clusters (HPC clusters) (often referred to as supercomputers)
- including cluster technologies like Message Passing InterfaceMessage Passing InterfaceMessage Passing Interface is a standardized and portable message-passing system designed by a group of researchers from academia and industry to function on a wide variety of parallel computers...
, and single-system image (SSI)Single-system imageIn distributed computing, a single system image cluster is a cluster of machines that appears to be one single system. The concept is often considered synonymous with that of a distributed operating system, but a single image may be presented for more limited purposes, just job scheduling for...
, distributed computingDistributed computingDistributed computing is a field of computer science that studies distributed systems. A distributed system consists of multiple autonomous computers that communicate through a computer network. The computers interact with each other in order to achieve a common goal...
, and BeowulfBeowulf (computing)A Beowulf cluster is a computer cluster of what are normally identical, commodity-grade computers networked into a small local area network with libraries and programs installed which allow processing to be shared among them...
- including cluster technologies like Message Passing Interface
- Grid computingGrid computingGrid computing is a term referring to the combination of computer resources from multiple administrative domains to reach a common goal. The grid can be thought of as a distributed system with non-interactive workloads that involve a large number of files...
(a form of distributed computing) (networkingComputer networkA computer network, often simply referred to as a network, is a collection of hardware components and computers interconnected by communication channels that allow sharing of resources and information....
many heterogeneous computers to create a virtual computer architecture) - Load-balancing clusters (sometimes referred to as a server farmServer farmA server farm or server cluster is a collection of computer servers usually maintained by an enterprise to accomplish server needs far beyond the capability of one machine. Server farms often have backup servers, which can take over the function of primary servers in the event of a primary server...
)
- High-performance computing clusters (HPC clusters) (often referred to as supercomputers)
- Physical based simulation and physics enginePhysics engineA physics engine is computer software that provides an approximate simulation of certain physical systems, such as rigid body dynamics , soft body dynamics, and fluid dynamics, of use in the domains of computer graphics, video games and film. Their main uses are in video games , in which case the...
s (usually based on Newtonian physics models)- Conway's Game of LifeConway's Game of LifeThe Game of Life, also known simply as Life, is a cellular automaton devised by the British mathematician John Horton Conway in 1970....
, cloth simulation, incompressible fluid flowIncompressible flowIn fluid mechanics or more generally continuum mechanics, incompressible flow refers to flow in which the material density is constant within an infinitesimal volume that moves with the velocity of the fluid...
by solution of Navier-Stokes equationsNavier-Stokes equationsIn physics, the Navier–Stokes equations, named after Claude-Louis Navier and George Gabriel Stokes, describe the motion of fluid substances. These equations arise from applying Newton's second law to fluid motion, together with the assumption that the fluid stress is the sum of a diffusing viscous...
- Conway's Game of Life
- Statistical physicsStatistical physicsStatistical physics is the branch of physics that uses methods of probability theory and statistics, and particularly the mathematical tools for dealing with large populations and approximations, in solving physical problems. It can describe a wide variety of fields with an inherently stochastic...
- Ising modelIsing modelThe Ising model is a mathematical model of ferromagnetism in statistical mechanics. The model consists of discrete variables called spins that can be in one of two states . The spins are arranged in a graph , and each spin interacts with its nearest neighbors...
- Ising model
- Lattice gauge theoryLattice gauge theoryIn physics, lattice gauge theory is the study of gauge theories on a spacetime that has been discretized into a lattice. Gauge theories are important in particle physics, and include the prevailing theories of elementary particles: quantum electrodynamics, quantum chromodynamics and the Standard...
- SegmentationSegmentation (image processing)In computer vision, segmentation refers to the process of partitioning a digital image into multiple segments . The goal of segmentation is to simplify and/or change the representation of an image into something that is more meaningful and easier to analyze...
– 2D and 3D - Level-setLevel setIn mathematics, a level set of a real-valued function f of n variables is a set of the formthat is, a set where the function takes on a given constant value c....
methods - CTComputed tomographyX-ray computed tomography or Computer tomography , is a medical imaging method employing tomography created by computer processing...
reconstruction - Fast Fourier transformFast Fourier transformA fast Fourier transform is an efficient algorithm to compute the discrete Fourier transform and its inverse. "The FFT has been called the most important numerical algorithm of our lifetime ." There are many distinct FFT algorithms involving a wide range of mathematics, from simple...
- Tone mappingTone mappingTone mapping is a technique used in image processing and computer graphics to map one set of colors to another in order to approximate the appearance of high dynamic range images in a medium that has a more limited dynamic range...
- Audio signal processingAudio signal processingAudio signal processing, sometimes referred to as audio processing, is the intentional alteration of auditory signals, or sound. As audio signals may be electronically represented in either digital or analog format, signal processing may occur in either domain...
- Audio and Sound Effects Processing, to use a GPU for DSP (digital signal processing)Digital signal processingDigital signal processing is concerned with the representation of discrete time signals by a sequence of numbers or symbols and the processing of these signals. Digital signal processing and analog signal processing are subfields of signal processing...
- Analog signal processingAnalog signal processingAnalog signal processing is any signal processing conducted on analog signals by analog means. "Analog" indicates something that is mathematically represented as a set of continuous values. This differs from "digital" which uses a series of discrete quantities to represent signal...
- Speech processingSpeech processingSpeech processing is the study of speech signals and the processing methods of these signals.The signals are usually processed in a digital representation, so speech processing can be regarded as a special case of digital signal processing, applied to speech signal.It is also closely tied to...
- Audio and Sound Effects Processing, to use a GPU for DSP (digital signal processing)
- Digital image processingDigital image processingDigital image processing is the use of computer algorithms to perform image processing on digital images. As a subcategory or field of digital signal processing, digital image processing has many advantages over analog image processing...
- Video ProcessingVideo processingIn electrical engineering and computer science, video processing is a particular case of signal processing, which often employs video filters and where the input and output signals are video files or video streams. Video processing techniques are used in television sets, VCRs, DVDs, video codecs,...
- Hardware accelerated video decoding and post-processing
- Motion compensation (mo comp)Motion compensationMotion compensation is an algorithmic technique employed in the encoding of video data for video compression, for example in the generation of MPEG-2 files. Motion compensation describes a picture in terms of the transformation of a reference picture to the current picture. The reference picture...
- Inverse discrete cosine transform (iDCT)
- Variable-length decoding (VLD)Huffman codingIn computer science and information theory, Huffman coding is an entropy encoding algorithm used for lossless data compression. The term refers to the use of a variable-length code table for encoding a source symbol where the variable-length code table has been derived in a particular way based on...
- Inverse quantization (IQ)
- In-loop deblocking
- Bitstream processing (CAVLC/CABAC) using special purpose hardware for this task because this is a serial task not suitable for regular GPGPU computation
- DeinterlacingDeinterlacingDeinterlacing is the process of converting interlaced video, such as common analog television signals or 1080i format HDTV signals, into a non-interlaced form....
- Spatial-temporal de-interlacing
- Noise reduction
- Edge enhancement
- Color correction
- Motion compensation (mo comp)
- Hardware accelerated video encoding and pre-processing
- Hardware accelerated video decoding and post-processing
- Global illuminationGlobal illuminationGlobal illumination is a general name for a group of algorithms used in 3D computer graphics that are meant to add more realistic lighting to 3D scenes...
– ray tracing, photon mappingPhoton mappingIn computer graphics, photon mapping is a two-pass global illumination algorithm developed by Henrik Wann Jensen that solves the rendering equation. Rays from the light source and rays from the camera are traced independently until some termination criterion is met, then they are connected in a...
, radiosityRadiosityRadiosity is a global illumination algorithm used in 3D computer graphics rendering. Radiosity is an application of the finite element method to solving the rendering equation for scenes with purely diffuse surfaces...
among others, subsurface scatteringSubsurface scatteringSubsurface scattering is a mechanism of light transport in which light penetrates the surface of a translucent object, is scattered by interacting with the material, and exits the surface at a different point... - Geometric computing – constructive solid geometryConstructive solid geometryConstructive solid geometry is a technique used in solid modeling. Constructive solid geometry allows a modeler to create a complex surface or object by using Boolean operators to combine objects...
, distance fields, collision detectionCollision detectionCollision detection typically refers to the computational problem of detecting the intersection of two or more objects. While the topic is most often associated with its use in video games and other physical simulations, it also has applications in robotics...
, transparency computation, shadow generation - Scientific computing
- Monte Carlo simulation of light propagation
- Weather forecastingWeather forecastingWeather forecasting is the application of science and technology to predict the state of the atmosphere for a given location. Human beings have attempted to predict the weather informally for millennia, and formally since the nineteenth century...
- Climate research
- Molecular modeling on GPUMolecular modeling on GPUMolecular modeling on GPU is the technique of using a graphics processing unit for molecular simulations.In 2007, NVIDIA introduced video cards that could be used not only to show graphics but also for scientific calculations. These cards include many arithmetic units working in parallel...
- Quantum mechanical physics
- AstrophysicsAstrophysicsAstrophysics is the branch of astronomy that deals with the physics of the universe, including the physical properties of celestial objects, as well as their interactions and behavior...
- BioinformaticsBioinformaticsBioinformatics is the application of computer science and information technology to the field of biology and medicine. Bioinformatics deals with algorithms, databases and information systems, web technologies, artificial intelligence and soft computing, information and computation theory, software...
- Computational financeComputational financeComputational finance, also called financial engineering, is a cross-disciplinary field which relies on computational intelligence, mathematical finance, numerical methods and computer simulations to make trading, hedging and investment decisions, as well as facilitating the risk management of...
- Medical imagingMedical imagingMedical imaging is the technique and process used to create images of the human body for clinical purposes or medical science...
- Computer visionComputer visionComputer vision is a field that includes methods for acquiring, processing, analysing, and understanding images and, in general, high-dimensional data from the real world in order to produce numerical or symbolic information, e.g., in the forms of decisions...
- Digital signal processingDigital signal processingDigital signal processing is concerned with the representation of discrete time signals by a sequence of numbers or symbols and the processing of these signals. Digital signal processing and analog signal processing are subfields of signal processing...
/ signal processingSignal processingSignal processing is an area of systems engineering, electrical engineering and applied mathematics that deals with operations on or analysis of signals, in either discrete or continuous time... - Control engineeringControl engineeringControl engineering or Control systems engineering is the engineering discipline that applies control theory to design systems with predictable behaviors...
- Neural networksNeural NetworksNeural Networks is the official journal of the three oldest societies dedicated to research in neural networks: International Neural Network Society, European Neural Network Society and Japanese Neural Network Society, published by Elsevier...
- DatabaseDatabaseA database is an organized collection of data for one or more purposes, usually in digital form. The data are typically organized to model relevant aspects of reality , in a way that supports processes requiring this information...
operations - Lattice Boltzmann methodsLattice Boltzmann methodsLattice Boltzmann methods is a class of computational fluid dynamics methods for fluid simulation. Instead of solving the Navier–Stokes equations, the discrete Boltzmann equation is solved to simulate the flow of a Newtonian fluid with collision models such as Bhatnagar-Gross-Krook...
- CryptographyCryptographyCryptography is the practice and study of techniques for secure communication in the presence of third parties...
and cryptanalysisCryptanalysisCryptanalysis is the study of methods for obtaining the meaning of encrypted information, without access to the secret information that is normally required to do so. Typically, this involves knowing how the system works and finding a secret key...
- Implementation of MD6MD6The MD6 Message-Digest Algorithm is a cryptographic hash function. It uses a Merkle tree-like structure to allow for immense parallel computation of hashes for very long inputs...
- Implementation of AESAdvanced Encryption StandardAdvanced Encryption Standard is a specification for the encryption of electronic data. It has been adopted by the U.S. government and is now used worldwide. It supersedes DES...
- Implementation of DESData Encryption StandardThe Data Encryption Standard is a block cipher that uses shared secret encryption. It was selected by the National Bureau of Standards as an official Federal Information Processing Standard for the United States in 1976 and which has subsequently enjoyed widespread use internationally. It is...
- Implementation of RSA
- Implementation of ECCElliptic curve cryptographyElliptic curve cryptography is an approach to public-key cryptography based on the algebraic structure of elliptic curves over finite fields. The use of elliptic curves in cryptography was suggested independently by Neal Koblitz and Victor S...
- Password crackingPassword crackingPassword cracking is the process of recovering passwords from data that has been stored in or transmitted by a computer system. A common approach is to repeatedly try guesses for the password...
- Implementation of MD6
- Electronic Design AutomationElectronic design automationElectronic design automation is a category of software tools for designing electronic systems such as printed circuit boards and integrated circuits...
- Antivirus softwareAntivirus softwareAntivirus or anti-virus software is used to prevent, detect, and remove malware, including but not limited to computer viruses, computer worm, trojan horses, spyware and adware...
- Intrusion DetectionIntrusion detectionIn Information Security, intrusion detection is the act of detecting actions that attempt to compromise the confidentiality, integrity or availability of a resource. When Intrusion detection takes a preventive measure without direct human intervention, then it becomes an Intrusion-prevention...
See also
- OpenMPOpenMPOpenMP is an API that supports multi-platform shared memory multiprocessing programming in C, C++, and Fortran, on most processor architectures and operating systems, including Linux, Unix, AIX, Solaris, Mac OS X, and Microsoft Windows platforms...
- OpenHMPP http://www.openhmpp.org
- Graphics processing unitGraphics processing unitA graphics processing unit or GPU is a specialized circuit designed to rapidly manipulate and alter memory in such a way so as to accelerate the building of images in a frame buffer intended for output to a display...
- Comparison of ATI graphics processing unitsComparison of ATI Graphics Processing UnitsThis page contains general information about the GPUs and video cards by Advanced Micro Devices , including those by ATI Technologies before 2006, based on official specifications in table form.-DirectX version note:...
- Comparison of Nvidia graphics processing unitsComparison of NVIDIA Graphics Processing UnitsThis page contains general information about Nvidia's GPUs and videocards based on official Nvidia specifications.-Direct X version note:Direct X version indicates which graphics acceleration operations the card supports.* DirectX 6.0 - Multitexturing...
- Graphics pipelineGraphics pipelineIn 3D computer graphics, the terms graphics pipeline or rendering pipeline most commonly refers to the current state of the art method of rasterization-based rendering as supported by commodity graphics hardware. The graphics pipeline typically accepts some representation of a three-dimensional...
- Graphics card
- Comparison of ATI graphics processing units
- Stream processingStream processingStream processing is a computer programming paradigm, related to SIMD , that allows some applications to more easily exploit a limited form of parallel processing...
- BrookGPUBrookGPUBrookGPU is the Stanford University graphics group's compiler and runtime implementation of the Brook stream programming language for using modern graphics hardware for non-graphical, general purpose computations...
- Mark HarrisMark Harris (programmer)Mark J. Harris, a real-time computer graphics researcher for Nvidia, received a Ph.D. in 2003 from the University of North Carolina at Chapel Hill...
founder of GPGPU.org, coined term "GPGPU". - Physics enginePhysics engineA physics engine is computer software that provides an approximate simulation of certain physical systems, such as rigid body dynamics , soft body dynamics, and fluid dynamics, of use in the domains of computer graphics, video games and film. Their main uses are in video games , in which case the...
is a computer programComputer programA computer program is a sequence of instructions written to perform a specified task with a computer. A computer requires programs to function, typically executing the program's instructions in a central processor. The program has an executable form that the computer can use directly to execute...
that simulates Newtonian physics (on CPU, GPU or PPUPhysics processing unitA physics processing unit is a dedicated microprocessor designed to handle the calculations of physics, especially in the physics engine of video games. Examples of calculations involving a PPU might include rigid body dynamics, soft body dynamics, collision detection, fluid dynamics, hair and...
)- Physics processing unitPhysics processing unitA physics processing unit is a dedicated microprocessor designed to handle the calculations of physics, especially in the physics engine of video games. Examples of calculations involving a PPU might include rigid body dynamics, soft body dynamics, collision detection, fluid dynamics, hair and...
- Physics processing unit
- Havok Physics / Havok FXHavok (software)Havok Physics is a physics engine developed by Irish company Havok. It is designed primarily for video games, and allows for real-time collision and dynamics of rigid bodies in three dimensions. It provides multiple types of dynamic constraints between rigid bodies , and has a highly optimized...
, commercial physics enginePhysics engineA physics engine is computer software that provides an approximate simulation of certain physical systems, such as rigid body dynamics , soft body dynamics, and fluid dynamics, of use in the domains of computer graphics, video games and film. Their main uses are in video games , in which case the...
middlewareMiddlewareMiddleware is computer software that connects software components or people and their applications. The software consists of a set of services that allows multiple processes running on one or more machines to interact...
SDKSoftware development kitA software development kit is typically a set of software development tools that allows for the creation of applications for a certain software package, software framework, hardware platform, computer system, video game console, operating system, or similar platform.It may be something as simple...
for computer and video games - PhysX SDKPhysXPhysX is a proprietary realtime physics engine middleware SDK developed by Ageia with the purchase of ETH Zurich spin-off NovodeX in 2004...
, commercial realtimeReal-time computingIn computer science, real-time computing , or reactive computing, is the study of hardware and software systems that are subject to a "real-time constraint"— e.g. operational deadlines from event to system response. Real-time programs must guarantee response within strict time constraints...
physics enginePhysics engineA physics engine is computer software that provides an approximate simulation of certain physical systems, such as rigid body dynamics , soft body dynamics, and fluid dynamics, of use in the domains of computer graphics, video games and film. Their main uses are in video games , in which case the...
middlewareMiddlewareMiddleware is computer software that connects software components or people and their applications. The software consists of a set of services that allows multiple processes running on one or more machines to interact...
SDKSoftware development kitA software development kit is typically a set of software development tools that allows for the creation of applications for a certain software package, software framework, hardware platform, computer system, video game console, operating system, or similar platform.It may be something as simple...
developed by AGEIAAGEIAAgeia, founded in 2002, was a fabless semiconductor company. Ageia invented PhysX – a Physics Processing Unit chip capable of performing game physics calculations much faster than general purpose CPUs; they also licensed out the PhysX SDK , a large physics middleware library for game...
- AGEIAAGEIAAgeia, founded in 2002, was a fabless semiconductor company. Ageia invented PhysX – a Physics Processing Unit chip capable of performing game physics calculations much faster than general purpose CPUs; they also licensed out the PhysX SDK , a large physics middleware library for game...
also designed a dedicated physics processing unitPhysics processing unitA physics processing unit is a dedicated microprocessor designed to handle the calculations of physics, especially in the physics engine of video games. Examples of calculations involving a PPU might include rigid body dynamics, soft body dynamics, collision detection, fluid dynamics, hair and...
expansion card designed to accelerate the PhysX SDK
- AGEIA
- GPU programming libraries/layers:
- Close to MetalClose to MetalClose To Metal is the name of a beta version of a low-level programming interface developed by ATI , aimed at enabling GPGPU computing...
, now called Stream, AMD/ATIAtiAs a word, Ati may refer to:* Ati, a town in Chad* Ati, a Negrito ethnic group in the Philippines* Ati-Atihan Festival, an annual celebration held in the Philippines* Ati, a queen of the fabled Land of Punt in Africa...
's GPGPU technology for ATI Radeon-based GPUs - CUDA (Compute Unified Device Architecture)CUDACUDA or Compute Unified Device Architecture is a parallel computing architecture developed by Nvidia. CUDA is the computing engine in Nvidia graphics processing units that is accessible to software developers through variants of industry standard programming languages...
, NvidiaNVIDIANvidia is an American global technology company based in Santa Clara, California. Nvidia is best known for its graphics processors . Nvidia and chief rival AMD Graphics Techonologies have dominated the high performance GPU market, pushing other manufacturers to smaller, niche roles...
's GPGPU technology for Nvidia GeForce-, Quadro- and Tesla-based GPUs - Sh, a GPGPU library for C++Lib ShSh is a metaprogramming language for programmable GPUs. Programmable GPUs are graphics processing units that execute some operations with higher efficiency than CPUs...
- BrookGPUBrookGPUBrookGPU is the Stanford University graphics group's compiler and runtime implementation of the Brook stream programming language for using modern graphics hardware for non-graphical, general purpose computations...
is the Stanford UniversityStanford UniversityThe Leland Stanford Junior University, commonly referred to as Stanford University or Stanford, is a private research university on an campus located near Palo Alto, California. It is situated in the northwestern Santa Clara Valley on the San Francisco Peninsula, approximately northwest of San...
Graphics group's compiler and runtime implementation of the Brook stream programming language. - OpenCL (Open Computing Language)OpenCLOpenCL is a framework for writing programs that execute across heterogeneous platforms consisting of CPUs, GPUs, and other processors. OpenCL includes a language for writing kernels , plus APIs that are used to define and then control the platforms...
cross platform GPGPU language for GPUs (AMD/ATI/Nvidia) and general purpose CPUs
Apple's GPU utilization introduced in Mac OS X v10.6Mac OS X v10.6Mac OS X Snow Leopard is the seventh major release of Mac OS X, Apple's desktop and server operating system for Macintosh computers.Snow Leopard was publicly unveiled on June 8, 2009 at the Apple Worldwide Developers Conference...
‘Snow Leopard’ - DirectComputeDirectComputeMicrosoft DirectCompute is an application programming interface that supports general-purpose computing on graphics processing units on Microsoft Windows Vista and Windows 7. DirectCompute is part of the Microsoft DirectX collection of APIs and was initially released with the DirectX 11 API but...
Microsoft's GPU Computing API - Initially released with the DirectX 11 API
- Close to Metal
- Audio processing unit (DSP can also be done on a GPU with GPGPU technology)
- List of emerging technologies
- Larrabee (microarchitecture)
External links
- openhmpp.org - New Open Standard for Many-Core
- OCLTools Open Source OpenCL Compiler and Linker
- GPGPU.org - General-Purpose Computation Using Graphics Hardware
- GPGPU Wiki
- SIGGRAPH 2005 GPGPU Course Notes
- IEEE VIS 2005 GPGPU Course Notes
- NVIDIA Developer Zone
- AMD GPU Tools
- CPU vs. GPGPU
- What is GPU Computing?
- Tech Report article: "ATI stakes claims on physics, GPGPU ground" by Scott Wasson
- GPU accelerated Monte Carlo simulation of the 2D and 3D Ising model - porting a standard model to GPU hardware
- GPGPU Computing @ Duke Statistical Science
- GPGPU Programming in F# using the Microsoft Research Accelerator system.
- GPGPU Review, Tobias PreisTobias PreisTobias Preis is a German physicist and founder of the Artemis Capital Asset Management GmbH. Born in Limburg an der Lahn, Germany, he researches complex systems with H. Eugene Stanley at Boston University and Dirk Helbing at ETH Zurich. He was awarded a Ph.D...
, European Physical Journal Special Topics 194, 87-119 (2011)