GPGPU - AbsoluteAstronomy.com

General-purpose computing on graphics processing units (GPGPU, also referred to as GPGP and less often GP²U) is the technique of using a GPU

Graphics processing unit

A graphics processing unit or GPU is a specialized circuit designed to rapidly manipulate and alter memory in such a way so as to accelerate the building of images in a frame buffer intended for output to a display...

, which typically handles computation only for computer graphics

Computer graphics

Computer graphics are graphics created using computers and, more generally, the representation and manipulation of image data by a computer with help from specialized software and hardware....

, to perform computation in applications traditionally handled by the CPU

Central processing unit

The central processing unit is the portion of a computer system that carries out the instructions of a computer program, to perform the basic arithmetical, logical, and input/output operations of the system. The CPU plays a role somewhat analogous to the brain in the computer. The term has been in...

. It is made possible by the addition of programmable stages and higher precision arithmetic to the rendering pipelines, which allows programmer

Programmer

A programmer, computer programmer or coder is someone who writes computer software. The term computer programmer can refer to a specialist in one area of computer programming or to a generalist who writes code for many kinds of software. One who practices or professes a formal approach to...

s to use stream processing

Stream processing

Stream processing is a computer programming paradigm, related to SIMD , that allows some applications to more easily exploit a limited form of parallel processing...

on non-graphics data. Additionally, the use of multiple graphics cards in a single computer, or large numbers of graphics chips, further parallelizes the already parallel nature of graphics processing

GPU improvements

GPU functionality has, traditionally, been very limited. In fact, for many years the GPU was only used to accelerate certain parts of the graphics pipeline

Graphics pipeline

In 3D computer graphics, the terms graphics pipeline or rendering pipeline most commonly refers to the current state of the art method of rasterization-based rendering as supported by commodity graphics hardware. The graphics pipeline typically accepts some representation of a three-dimensional...

. Some improvements were needed before GPGPU became feasible.

Programmability

Programmable vertex and fragment shaders were added to the graphics pipeline to enable game programmers to generate even more realistic effects. Vertex shaders allow the programmer to alter per-vertex attributes, such as position, color, texture coordinates, and normal vector. Fragment shaders are used to calculate the color of a fragment

Fragment (computer graphics)

In computer graphics, a fragment is the data necessary to generate a single pixel's worth of a drawing primitive in the frame buffer.This data may include, but is not limited to:* raster position* depth...

, or per-pixel. Programmable fragment shaders allow the programmer to substitute, for example, a lighting model other than those provided by default by the graphics card, typically simple Gouraud shading

Gouraud shading

Gouraud shading, named after Henri Gouraud, is an interpolation method used in computer graphics to produce continuous shading of surfaces represented by polygon meshes...

. Shaders have enabled graphics programmers to create lens effects, displacement mapping

Displacement mapping

Displacement mapping is an alternative computer graphics technique in contrast to bump mapping, normal mapping, and parallax mapping, using a texture- or height map to cause an effect where the actual geometric position of points over the textured surface are displaced, often along the local...

, and depth of field

Depth of field

In optics, particularly as it relates to film and photography, depth of field is the distance between the nearest and farthest objects in a scene that appear acceptably sharp in an image...

.

The programmability of the pipelines have trended according to Microsoft’s DirectX

DirectX

Microsoft DirectX is a collection of application programming interfaces for handling tasks related to multimedia, especially game programming and video, on Microsoft platforms. Originally, the names of these APIs all began with Direct, such as Direct3D, DirectDraw, DirectMusic, DirectPlay,...

specification , with DirectX 8 introducing Shader Model 1.1, DirectX 8.1 Pixel Shader Models 1.2, 1.3 and 1.4, and DirectX 9 defining Shader Model 2.x and 3.0. Each shader model increased the programming model flexibilities and capabilities, ensuring the conforming hardware follows suit. The DirectX 10 specification introduces Shader Model 4.0 which unifies the programming specification for vertex, geometry (“Geometry Shaders” are new to DirectX 10) and fragment processing

Fragment processing

Fragment processing is a term in computer graphics referring to a collection of operations applied to fragments generated by the rasterization operation in the rendering pipeline....

allowing for a better fit for unified shader hardware, thus providing a single computational pool of programmable resource.

Data types

Pre-DirectX 9 graphics cards only supported paletted

Palette (computing)

In computer graphics, a palette is either a given, finite set of colors for the management of digital images , or a small on-screen graphical element for choosing from a limited set of choices, not necessarily colors .Depending on the context In computer graphics, a palette is either a given,...

or integer color types. Various formats are available, each containing a red element, a green element, and a blue element. Sometimes an additional alpha value is added, to be used for transparency. Common formats are:

8 bits per pixel – Sometimes Palette
Palette (computing)
In computer graphics, a palette is either a given, finite set of colors for the management of digital images , or a small on-screen graphical element for choosing from a limited set of choices, not necessarily colors .Depending on the context In computer graphics, a palette is either a given,...

mode, where each value is an index in a table with the real color value specified in one of the other formats. Sometimes two bits for red, three bits for green, and three bits for blue.
16 bits per pixel – Usually allocated as five bits for red, six bits for green, and five bits for blue.
24 bits per pixel – eight bits for each of red, green, and blue
32 bits per pixel – eight bits for each of red, green, blue, and alpha

For early fixed-function or limited programmability graphics (i.e. up to and including DirectX 8.1-compliant GPUs) this was sufficient because this is also the representation used in displays. This representation does have certain limitations, however. Given sufficient graphics processing power even graphics programmers would like to use better formats, such as floating point

Floating point

In computing, floating point describes a method of representing real numbers in a way that can support a wide range of values. Numbers are, in general, represented approximately to a fixed number of significant digits and scaled using an exponent. The base for the scaling is normally 2, 10 or 16...

data formats, in order to obtain effects such as high dynamic range imaging

High dynamic range imaging

In image processing, computer graphics, and photography, high dynamic range imaging is a set of techniques that allows a greater dynamic range between the lightest and darkest areas of an image than current standard digital imaging techniques or photographic methods...

. Many GPGPU applications require floating point accuracy, which came with graphics cards conforming to the DirectX 9 specification.

DirectX 9 Shader Model 2.x suggested the support of two precision types: full and partial precision. Full precision support could either be FP32 and FP24 (floating point 24-bit per component) or greater, while partial precision was FP16. ATI’s

ATI Technologies

ATI Technologies Inc. was a semiconductor technology corporation based in Markham, Ontario, Canada, that specialized in the development of graphics processing units and chipsets. Founded in 1985 as Array Technologies Inc., the company was listed publicly in 1993 and was acquired by Advanced Micro...

R300 series

Radeon R300

The Radeon R300 is the third generation of Radeon graphics chips from ATI Technologies. The line features 3D acceleration based upon Direct3D 9.0 and OpenGL 2.0, a major improvement in features and performance compared to the preceding Radeon R200 design. R300 was the first fully Direct3D...

of GPUs supported FP24 precision only in the programmable fragment pipeline (although FP32 was supported in the vertex processors) while Nvidia

NVIDIA

Nvidia is an American global technology company based in Santa Clara, California. Nvidia is best known for its graphics processors . Nvidia and chief rival AMD Graphics Techonologies have dominated the high performance GPU market, pushing other manufacturers to smaller, niche roles...

’s NV30

GeForce FX

The GeForce FX or "GeForce 5" series is a line of graphics processing units from the manufacturer NVIDIA.-Overview:...

series supported both FP16 and FP32; other vendors such as S3 Graphics

S3 Graphics

S3 Graphics, Ltd is an American company specializing in graphics chipsets. Although they do not have the large market share that they once had, they still produce graphics accelerators for home computers under the "S3 Chrome" brand name.-History:...

and XGI

XGI Technology

XGI Technology Inc. is based upon the old graphics division of SiS spun off as a separate company, and the graphics assets of Trident Microsystems.-History:...

supported a mixture of formats up to FP24.

Shader Model 3.0 altered the specification, increasing full precision requirements to a minimum of FP32 support in the fragment pipeline. ATI’s Shader Model 3.0 compliant R5xx generation (Radeon X1000 series

Radeon R520

ATI's "R520" core is the foundation for a line of DirectX 9.0c and OpenGL 2.0 3D accelerator X1000 video cards. It is ATI's first major architectural overhaul since the "R300" core and is highly optimized for Shader Model 3.0. The Radeon X1000 series using the core was introduced on October 5,...

) supports just FP32 throughout the pipeline while Nvidia’s NV4x

GeForce 6 Series

The GeForce 6 Series is Nvidia's sixth generation of GeForce graphic processing units. Launched on April 14, 2004, the GeForce 6 family introduced PureVideo post-processing for video, SLI technology, and Shader Model 3.0 support .-GeForce 6 Series features:-SLI:The Scalable Link...

and G7x

GeForce 7 Series

The GeForce 7 Series is the seventh generation of Nvidia's GeForce graphics processing units.-Features:The following features are common to all models in the GeForce 7 series except the GeForce 7100, which lacks GCAA:-GeForce 7100 Series:...

series continued to support both FP32 full precision and FP16 partial precisions. Although not stipulated by Shader Model 3.0, both ATI and Nvidia’s Shader Model 3.0 GPUs introduced support for blendable FP16 render targets, more easily facilitating the support for High Dynamic Range Rendering.

The implementations of floating point on Nvidia GPUs are mostly IEEE

IEEE floating-point standard

IEEE 754–1985 was an industry standard for representingfloating-pointnumbers in computers, officially adopted in 1985 and superseded in 2008 byIEEE 754-2008. During its 23 years, it was the most widely used format for...

compliant; however, this is not true across all vendors. This has implications for correctness which are considered important to some scientific applications. While 64-bit floating point values (double precision float) are commonly available on CPUs, these are not universally supported on GPUs; some GPU architectures sacrifice IEEE compliance while others lack double-precision altogether. There have been efforts to emulate double-precision floating point values on GPUs; however, the speed tradeoff negates any benefit to offloading the computation onto the GPU in the first place.

Most operations on the GPU operate in a vectorized fashion: a single operation can be performed on up to four values at once. For instance, if one color is to be modulated by another color , the GPU can produce the resulting color in a single operation. This functionality is useful in graphics because almost every basic data type is a vector (either 2-, 3-, or 4-dimensional). Examples include vertices, colors, normal vectors, and texture coordinates. Many other applications can put this to good use, and because of their higher performance, vector instructions (SIMD

SIMD

Single instruction, multiple data , is a class of parallel computers in Flynn's taxonomy. It describes computers with multiple processing elements that perform the same operation on multiple data simultaneously...

) have long been available on CPUs.

In 2002 Fung etal developed OpenVIDIA at University of Toronto, and demonstrated this work, which was later published in 2003, 2004, and 2005, in conjunction with a collaboration between University of Toronto and nVIDIA.
In November 2006 Nvidia launched CUDA

CUDA

CUDA or Compute Unified Device Architecture is a parallel computing architecture developed by Nvidia. CUDA is the computing engine in Nvidia graphics processing units that is accessible to software developers through variants of industry standard programming languages...

, an SDK and API that allows a programmer to use the C programming language to code algorithms for execution on Geforce 8 series GPUs. . OpenCL

OpenCL

OpenCL is a framework for writing programs that execute across heterogeneous platforms consisting of CPUs, GPUs, and other processors. OpenCL includes a language for writing kernels , plus APIs that are used to define and then control the platforms...

, an open standard defined by the Khronos Group

Khronos Group

The Khronos Group is a not-for-profit member-funded industry consortium based in Beaverton, Oregon, focused on the creation of open standard, royalty-free APIs to enable the authoring and accelerated playback of dynamic media on a wide variety of platforms and devices...

provides a cross platform GPGPU platform that additionally supports data parallel compute on CPUs. OpenCL is actively supported on Intel, AMD, Nvidia and Arm platforms. GPGPU compared, for example, to traditional floating point

Floating point

accelerators such as the 64-bit CSX700 boards from ClearSpeed

ClearSpeed

ClearSpeed Technology Ltd is a semiconductor company, formed in 2002 to develop enhanced SIMD processors for use in high-performance computing and embedded systems. Based in Bristol, UK, the company has been selling its processors since 2005...

that are used in today's supercomputers, current top-end GPUs from AMD and Nvidia emphasize single-precision (32-bit) computation; double-precision (64-bit) computation executes more slowly.

GPGPU programming concepts

GPUs are designed specifically for graphics and thus are very restrictive in terms of operations and programming. Because of their nature, GPUs are only effective at tackling problems that can be solved using stream processing

Stream processing

Stream processing is a computer programming paradigm, related to SIMD , that allows some applications to more easily exploit a limited form of parallel processing...

and the hardware can only be used in certain ways.

Stream processing

GPUs can only process independent vertices and fragments, but can process many of them in parallel. This is especially effective when the programmer wants to process many vertices or fragments in the same way. In this sense, GPUs are stream processors – processors that can operate in parallel by running a single kernel on many records in a stream at once.

A stream is simply a set of records that require similar computation. Streams provide data parallelism. Kernels are the functions that are applied to each element in the stream. In the GPUs, vertices and fragments are the elements in streams and vertex and fragment shaders are the kernels to be run on them. Since GPUs process elements independently there is no way to have shared or static data. For each element we can only read from the input, perform operations on it, and write to the output. It is permissible to have multiple inputs and multiple outputs, but never a piece of memory that is both readable and writable.

Arithmetic intensity is defined as the number of operations performed per word of memory transferred. It is important for GPGPU applications to have high arithmetic intensity else the memory access latency will limit computational speedup.

Ideal GPGPU applications have large data sets, high parallelism, and minimal dependency between data elements.

Computational resources

There are a variety of computational resources available on the GPU:

Programmable processors – Vertex, primitive, and fragment pipelines allow programmer to perform kernel on streams of data
Rasterizer – creates fragments and interpolates per-vertex constants such as texture coordinates and color
Texture Unit – read only memory interface
Framebuffer – write only memory interface

In fact, the programmer can substitute a write only texture for output instead of the framebuffer. This is accomplished either through Render to Texture (RTT), Render-To-Backbuffer-Copy-To-Texture (RTBCTT), or the more recent stream-out.

Textures as stream

The most common form for a stream to take in GPGPU is a 2D grid because this fits naturally with the rendering model built into GPUs. Many computations naturally map into grids: matrix algebra, image processing, physically based simulation, and so on.
Since textures are used as memory, texture lookups are then used as memory reads. Certain operations can be done automatically by the GPU because of this.

Kernels

Kernels can be thought of as the body of loops. For example, if the programmer were operating on a grid on the CPU they might have code that looked like this:

// Input and output grids have 10000 x 10000 or 100 million elements.

void transform_10k_by_10k_grid(float in[10000][10000], float out[10000][10000])
{
for(int x = 0; x < 10000; x++)
{
for(int y = 0; y < 10000; y++)
{
// The next line is executed 100 million times
out[x][y] = do_some_hard_work(in[x][y]);
}
}
}

On the GPU, the programmer only specifies the body of the loop as the kernel and what data to loop over by invoking geometry processing.

Flow control

In sequential code it is possible to control the flow of the program using if-then-else statements and various forms of loops. Such flow control structures have only recently been added to GPUs. Conditional writes could be accomplished using a properly crafted series of arithmetic/bit operations, but looping and conditional branching were not possible.

Recent GPUs allow branching, but usually with a performance penalty. Branching should generally be avoided in inner loops, whether in CPU or GPU code, and various techniques, such as static branch resolution, pre-computation, predication, loop splitting, and Z-cull can be used to achieve branching when hardware support does not exist.

Map

The map operation simply applies the given function (the kernel) to every element in the stream. A simple example is multiplying each value in the stream by a constant (increasing the brightness of an image). The map operation is simple to implement on the GPU. The programmer generates a fragment for each pixel on screen and applies a fragment program to each one. The result stream of the same size is stored in the output buffer.

Reduce

Some computations require calculating a smaller stream (possibly a stream of only 1 element) from a larger stream. This is called a reduction of the stream. Generally a reduction can be accomplished in multiple steps. The results from the previous step are used as the input for the current step and the range over which the operation is applied is reduced until only one stream element remains.

Stream filtering

Stream filtering is essentially a non-uniform reduction. Filtering involves removing items from the stream based on some criteria.

Scatter

The scatter operation is most naturally defined on the vertex processor. The vertex processor is able to adjust the position of the vertex

Vertex (geometry)

In geometry, a vertex is a special kind of point that describes the corners or intersections of geometric shapes.-Of an angle:...

, which allows the programmer to control where information is deposited on the grid. Other extensions are also possible, such as controlling how large an area the vertex affects.

The fragment processor cannot perform a direct scatter operation because the location of each fragment on the grid is fixed at the time of the fragment's creation and cannot be altered by the programmer. However, a logical scatter operation may sometimes be recast or implemented with an additional gather step. A scatter implementation would first emit both an output value and an output address. An immediately following gather operation uses address comparisons to see whether the output value maps to the current output slot.

Gather

The fragment processor is able to read textures in a random access fashion, so it can gather information from any grid cell, or multiple grid cells, as desired.

Sort

The sort operation transforms an unordered set of elements into an ordered set of elements. The most common implementation on GPUs is using sorting networks.

Search

The search operation allows the programmer to find a particular element within the stream, or possibly find neighbors of a specified element. The GPU is not used to speed up the search for an individual element, but instead is used to run multiple searches in parallel.

Data structures

A variety of data structures can be represented on the GPU:

Dense arrays
Sparse array
Sparse array
In computer science, a sparse array is an array in which most of the elements have the same value . The occurrence of zero elements in a large array is inconvenient for both computation and storage...

s – static or dynamic
Adaptive structures

Applications

The following are some of the areas where GPUs have been used for general purpose computing:

Bitcoin
Bitcoin
Bitcoin is a decentralized, peer-to-peer network over which users make transactions that are tracked and verified through this network. The word Bitcoin also refers to the digital currency implemented as the currency medium for user transactions over this network...

peer-to-peer currency relies on a distributed computing network for performing SHA256 calculations where GPGPUs have become the dominant mode of calculation
MATLAB
MATLAB
MATLAB is a numerical computing environment and fourth-generation programming language. Developed by MathWorks, MATLAB allows matrix manipulations, plotting of functions and data, implementation of algorithms, creation of user interfaces, and interfacing with programs written in other languages,...

acceleration using the Parallel Computing Toolbox and MATLAB Distributed Computing Server, as well as 3rd party packages like Jacket
Jacket (software)
Jacket is a numerical computing platform enabling GPU acceleration of MATLAB-based codes. Developed by AccelerEyes, Jacket allows GPU-based matrix manipulations, plotting of functions and data, implementation of algorithms, and interfacing with programs written in other languages, including C, C++,...

.
k-nearest neighbor algorithm
K-nearest neighbor algorithm
In pattern recognition, the k-nearest neighbor algorithm is a method for classifying objects based on closest training examples in the feature space. k-NN is a type of instance-based learning, or lazy learning where the function is only approximated locally and all computation is deferred until...
Computer clusters or a variation of a parallel computing
Parallel computing
Parallel computing is a form of computation in which many calculations are carried out simultaneously, operating on the principle that large problems can often be divided into smaller ones, which are then solved concurrently . There are several different forms of parallel computing: bit-level,...

(utilizing GPU cluster
Gpu cluster
A GPU cluster is a computer cluster in which each node is equipped with a Graphics Processing Unit . By harnessing the computational power of modern GPUs via General-Purpose Computing on Graphics Processing Units , very fast calculations can be performed with a GPU cluster.- Hardware :The hardware...

technology) for highly calculation-intensive tasks:
- High-performance computing clusters (HPC clusters) (often referred to as supercomputers)
  - including cluster technologies like Message Passing Interface
    Message Passing Interface
    Message Passing Interface is a standardized and portable message-passing system designed by a group of researchers from academia and industry to function on a wide variety of parallel computers...
    
    , and single-system image (SSI)
    Single-system image
    In distributed computing, a single system image cluster is a cluster of machines that appears to be one single system. The concept is often considered synonymous with that of a distributed operating system, but a single image may be presented for more limited purposes, just job scheduling for...
    
    , distributed computing
    Distributed computing
    Distributed computing is a field of computer science that studies distributed systems. A distributed system consists of multiple autonomous computers that communicate through a computer network. The computers interact with each other in order to achieve a common goal...
    
    , and Beowulf
    Beowulf (computing)
    A Beowulf cluster is a computer cluster of what are normally identical, commodity-grade computers networked into a small local area network with libraries and programs installed which allow processing to be shared among them...
- Grid computing
  Grid computing
  Grid computing is a term referring to the combination of computer resources from multiple administrative domains to reach a common goal. The grid can be thought of as a distributed system with non-interactive workloads that involve a large number of files...
  
  (a form of distributed computing) (networking
  Computer network
  A computer network, often simply referred to as a network, is a collection of hardware components and computers interconnected by communication channels that allow sharing of resources and information....
  
  many heterogeneous computers to create a virtual computer architecture)
- Load-balancing clusters (sometimes referred to as a server farm
  Server farm
  A server farm or server cluster is a collection of computer servers usually maintained by an enterprise to accomplish server needs far beyond the capability of one machine. Server farms often have backup servers, which can take over the function of primary servers in the event of a primary server...
  
  )
Physical based simulation and physics engine
Physics engine
A physics engine is computer software that provides an approximate simulation of certain physical systems, such as rigid body dynamics , soft body dynamics, and fluid dynamics, of use in the domains of computer graphics, video games and film. Their main uses are in video games , in which case the...

s (usually based on Newtonian physics models)
- Conway's Game of Life
  Conway's Game of Life
  The Game of Life, also known simply as Life, is a cellular automaton devised by the British mathematician John Horton Conway in 1970....
  
  , cloth simulation, incompressible fluid flow
  Incompressible flow
  In fluid mechanics or more generally continuum mechanics, incompressible flow refers to flow in which the material density is constant within an infinitesimal volume that moves with the velocity of the fluid...
  
  by solution of Navier-Stokes equations
  Navier-Stokes equations
  In physics, the Navier–Stokes equations, named after Claude-Louis Navier and George Gabriel Stokes, describe the motion of fluid substances. These equations arise from applying Newton's second law to fluid motion, together with the assumption that the fluid stress is the sum of a diffusing viscous...
Statistical physics
Statistical physics
Statistical physics is the branch of physics that uses methods of probability theory and statistics, and particularly the mathematical tools for dealing with large populations and approximations, in solving physical problems. It can describe a wide variety of fields with an inherently stochastic...
- Ising model
  Ising model
  The Ising model is a mathematical model of ferromagnetism in statistical mechanics. The model consists of discrete variables called spins that can be in one of two states . The spins are arranged in a graph , and each spin interacts with its nearest neighbors...
Lattice gauge theory
Lattice gauge theory
In physics, lattice gauge theory is the study of gauge theories on a spacetime that has been discretized into a lattice. Gauge theories are important in particle physics, and include the prevailing theories of elementary particles: quantum electrodynamics, quantum chromodynamics and the Standard...
Segmentation
Segmentation (image processing)
In computer vision, segmentation refers to the process of partitioning a digital image into multiple segments . The goal of segmentation is to simplify and/or change the representation of an image into something that is more meaningful and easier to analyze...

– 2D and 3D
Level-set
Level set
In mathematics, a level set of a real-valued function f of n variables is a set of the formthat is, a set where the function takes on a given constant value c....

methods
CT
Computed tomography
X-ray computed tomography or Computer tomography , is a medical imaging method employing tomography created by computer processing...

reconstruction
Fast Fourier transform
Fast Fourier transform
A fast Fourier transform is an efficient algorithm to compute the discrete Fourier transform and its inverse. "The FFT has been called the most important numerical algorithm of our lifetime ." There are many distinct FFT algorithms involving a wide range of mathematics, from simple...
Tone mapping
Tone mapping
Tone mapping is a technique used in image processing and computer graphics to map one set of colors to another in order to approximate the appearance of high dynamic range images in a medium that has a more limited dynamic range...
Audio signal processing
Audio signal processing
Audio signal processing, sometimes referred to as audio processing, is the intentional alteration of auditory signals, or sound. As audio signals may be electronically represented in either digital or analog format, signal processing may occur in either domain...
- Audio and Sound Effects Processing, to use a GPU for DSP (digital signal processing)
  Digital signal processing
  Digital signal processing is concerned with the representation of discrete time signals by a sequence of numbers or symbols and the processing of these signals. Digital signal processing and analog signal processing are subfields of signal processing...
- Analog signal processing
  Analog signal processing
  Analog signal processing is any signal processing conducted on analog signals by analog means. "Analog" indicates something that is mathematically represented as a set of continuous values. This differs from "digital" which uses a series of discrete quantities to represent signal...
- Speech processing
  Speech processing
  Speech processing is the study of speech signals and the processing methods of these signals.The signals are usually processed in a digital representation, so speech processing can be regarded as a special case of digital signal processing, applied to speech signal.It is also closely tied to...
Digital image processing
Digital image processing
Digital image processing is the use of computer algorithms to perform image processing on digital images. As a subcategory or field of digital signal processing, digital image processing has many advantages over analog image processing...
Video Processing
Video processing
In electrical engineering and computer science, video processing is a particular case of signal processing, which often employs video filters and where the input and output signals are video files or video streams. Video processing techniques are used in television sets, VCRs, DVDs, video codecs,...
- Hardware accelerated video decoding and post-processing
  - Motion compensation (mo comp)
    Motion compensation
    Motion compensation is an algorithmic technique employed in the encoding of video data for video compression, for example in the generation of MPEG-2 files. Motion compensation describes a picture in terms of the transformation of a reference picture to the current picture. The reference picture...
  - Inverse discrete cosine transform (iDCT)
  - Variable-length decoding (VLD)
    Huffman coding
    In computer science and information theory, Huffman coding is an entropy encoding algorithm used for lossless data compression. The term refers to the use of a variable-length code table for encoding a source symbol where the variable-length code table has been derived in a particular way based on...
  - Inverse quantization (IQ)
  - In-loop deblocking
  - Bitstream processing (CAVLC/CABAC) using special purpose hardware for this task because this is a serial task not suitable for regular GPGPU computation
  - Deinterlacing
    Deinterlacing
    Deinterlacing is the process of converting interlaced video, such as common analog television signals or 1080i format HDTV signals, into a non-interlaced form....
    - Spatial-temporal de-interlacing
  - Noise reduction
  - Edge enhancement
  - Color correction
- Hardware accelerated video encoding and pre-processing
Global illumination
Global illumination
Global illumination is a general name for a group of algorithms used in 3D computer graphics that are meant to add more realistic lighting to 3D scenes...

– ray tracing, photon mapping
Photon mapping
In computer graphics, photon mapping is a two-pass global illumination algorithm developed by Henrik Wann Jensen that solves the rendering equation. Rays from the light source and rays from the camera are traced independently until some termination criterion is met, then they are connected in a...

, radiosity
Radiosity
Radiosity is a global illumination algorithm used in 3D computer graphics rendering. Radiosity is an application of the finite element method to solving the rendering equation for scenes with purely diffuse surfaces...

among others, subsurface scattering
Subsurface scattering
Subsurface scattering is a mechanism of light transport in which light penetrates the surface of a translucent object, is scattered by interacting with the material, and exits the surface at a different point...
Geometric computing – constructive solid geometry
Constructive solid geometry
Constructive solid geometry is a technique used in solid modeling. Constructive solid geometry allows a modeler to create a complex surface or object by using Boolean operators to combine objects...

, distance fields, collision detection
Collision detection
Collision detection typically refers to the computational problem of detecting the intersection of two or more objects. While the topic is most often associated with its use in video games and other physical simulations, it also has applications in robotics...

, transparency computation, shadow generation
Scientific computing
- Monte Carlo simulation of light propagation
- Weather forecasting
  Weather forecasting
  Weather forecasting is the application of science and technology to predict the state of the atmosphere for a given location. Human beings have attempted to predict the weather informally for millennia, and formally since the nineteenth century...
- Climate research
- Molecular modeling on GPU
  Molecular modeling on GPU
  Molecular modeling on GPU is the technique of using a graphics processing unit for molecular simulations.In 2007, NVIDIA introduced video cards that could be used not only to show graphics but also for scientific calculations. These cards include many arithmetic units working in parallel...
- Quantum mechanical physics
- Astrophysics
  Astrophysics
  Astrophysics is the branch of astronomy that deals with the physics of the universe, including the physical properties of celestial objects, as well as their interactions and behavior...
Bioinformatics
Bioinformatics
Bioinformatics is the application of computer science and information technology to the field of biology and medicine. Bioinformatics deals with algorithms, databases and information systems, web technologies, artificial intelligence and soft computing, information and computation theory, software...
Computational finance
Computational finance
Computational finance, also called financial engineering, is a cross-disciplinary field which relies on computational intelligence, mathematical finance, numerical methods and computer simulations to make trading, hedging and investment decisions, as well as facilitating the risk management of...
Medical imaging
Medical imaging
Medical imaging is the technique and process used to create images of the human body for clinical purposes or medical science...
Computer vision
Computer vision
Computer vision is a field that includes methods for acquiring, processing, analysing, and understanding images and, in general, high-dimensional data from the real world in order to produce numerical or symbolic information, e.g., in the forms of decisions...
Digital signal processing
Digital signal processing
Digital signal processing is concerned with the representation of discrete time signals by a sequence of numbers or symbols and the processing of these signals. Digital signal processing and analog signal processing are subfields of signal processing...

/ signal processing
Signal processing
Signal processing is an area of systems engineering, electrical engineering and applied mathematics that deals with operations on or analysis of signals, in either discrete or continuous time...
Control engineering
Control engineering
Control engineering or Control systems engineering is the engineering discipline that applies control theory to design systems with predictable behaviors...
Neural networks
Neural Networks
Neural Networks is the official journal of the three oldest societies dedicated to research in neural networks: International Neural Network Society, European Neural Network Society and Japanese Neural Network Society, published by Elsevier...
Database
Database
A database is an organized collection of data for one or more purposes, usually in digital form. The data are typically organized to model relevant aspects of reality , in a way that supports processes requiring this information...

operations
Lattice Boltzmann methods
Lattice Boltzmann methods
Lattice Boltzmann methods is a class of computational fluid dynamics methods for fluid simulation. Instead of solving the Navier–Stokes equations, the discrete Boltzmann equation is solved to simulate the flow of a Newtonian fluid with collision models such as Bhatnagar-Gross-Krook...
Cryptography
Cryptography
Cryptography is the practice and study of techniques for secure communication in the presence of third parties...

and cryptanalysis
Cryptanalysis
Cryptanalysis is the study of methods for obtaining the meaning of encrypted information, without access to the secret information that is normally required to do so. Typically, this involves knowing how the system works and finding a secret key...
- Implementation of MD6
  MD6
  The MD6 Message-Digest Algorithm is a cryptographic hash function. It uses a Merkle tree-like structure to allow for immense parallel computation of hashes for very long inputs...
- Implementation of AES
  Advanced Encryption Standard
  Advanced Encryption Standard is a specification for the encryption of electronic data. It has been adopted by the U.S. government and is now used worldwide. It supersedes DES...
- Implementation of DES
  Data Encryption Standard
  The Data Encryption Standard is a block cipher that uses shared secret encryption. It was selected by the National Bureau of Standards as an official Federal Information Processing Standard for the United States in 1976 and which has subsequently enjoyed widespread use internationally. It is...
- Implementation of RSA
- Implementation of ECC
  Elliptic curve cryptography
  Elliptic curve cryptography is an approach to public-key cryptography based on the algebraic structure of elliptic curves over finite fields. The use of elliptic curves in cryptography was suggested independently by Neal Koblitz and Victor S...
- Password cracking
  Password cracking
  Password cracking is the process of recovering passwords from data that has been stored in or transmitted by a computer system. A common approach is to repeatedly try guesses for the password...
Electronic Design Automation
Electronic design automation
Electronic design automation is a category of software tools for designing electronic systems such as printed circuit boards and integrated circuits...
Antivirus software
Antivirus software
Antivirus or anti-virus software is used to prevent, detect, and remove malware, including but not limited to computer viruses, computer worm, trojan horses, spyware and adware...
Intrusion Detection
Intrusion detection
In Information Security, intrusion detection is the act of detecting actions that attempt to compromise the confidentiality, integrity or availability of a resource. When Intrusion detection takes a preventive measure without direct human intervention, then it becomes an Intrusion-prevention...

External links

openhmpp.org - New Open Standard for Many-Core
OCLTools Open Source OpenCL Compiler and Linker
GPGPU.org - General-Purpose Computation Using Graphics Hardware
GPGPU Wiki
SIGGRAPH 2005 GPGPU Course Notes
IEEE VIS 2005 GPGPU Course Notes
NVIDIA Developer Zone
AMD GPU Tools
CPU vs. GPGPU
What is GPU Computing?
Tech Report article: "ATI stakes claims on physics, GPGPU ground" by Scott Wasson
GPU accelerated Monte Carlo simulation of the 2D and 3D Ising model - porting a standard model to GPU hardware
GPGPU Computing @ Duke Statistical Science
GPGPU Programming in F# using the Microsoft Research Accelerator system.
GPGPU Review, Tobias Preis
Tobias Preis
Tobias Preis is a German physicist and founder of the Artemis Capital Asset Management GmbH. Born in Limburg an der Lahn, Germany, he researches complex systems with H. Eugene Stanley at Boston University and Dirk Helbing at ETH Zurich. He was awarded a Ph.D...

, European Physical Journal Special Topics 194, 87-119 (2011)

The source of this article is wikipedia, the free encyclopedia. The text of this article is licensed under the GFDL.

GPU improvements

Programmability

Data types

GPGPU programming concepts

Stream processing

Computational resources

Textures as stream

Kernels

Flow control

Map

Reduce

Stream filtering

Scatter

Gather

Sort

Search

Data structures

Applications

See also

External links