Vector quantization
Encyclopedia
Vector quantization is a classical quantization
Quantization (signal processing)
Quantization, in mathematics and digital signal processing, is the process of mapping a large set of input values to a smaller set – such as rounding values to some unit of precision. A device or algorithmic function that performs quantization is called a quantizer. The error introduced by...

technique from signal processing
Signal processing
Signal processing is an area of systems engineering, electrical engineering and applied mathematics that deals with operations on or analysis of signals, in either discrete or continuous time...

which allows the modeling of probability density functions by the distribution of prototype vectors. It was originally used for data compression
Data compression
In computer science and information theory, data compression, source coding or bit-rate reduction is the process of encoding information using fewer bits than the original representation would use....

. It works by dividing a large set of points (vector
Coordinate vector
In linear algebra, a coordinate vector is an explicit representation of a vector in an abstract vector space as an ordered list of numbers or, equivalently, as an element of the coordinate space Fn....

s) into groups having approximately the same number of points closest to them. Each group is represented by its centroid
Centroid
In geometry, the centroid, geometric center, or barycenter of a plane figure or two-dimensional shape X is the intersection of all straight lines that divide X into two parts of equal moment about the line. Informally, it is the "average" of all points of X...

point, as in k-means and some other clustering
Clustering
Clustering can refer to the following:In demographics:* Clustering , the gathering of various populations based on factors such as ethnicity, economics or religion.In graph theory:...

algorithms.

The density matching property of vector quantization is powerful, especially for identifying the density of large and high-dimensioned data. Since data points are represented by the index of their closest centroid, commonly occurring data have low error, and rare data high error. This is why VQ is suitable for lossy data compression
Lossy data compression
In information technology, "lossy" compression is a data encoding method that compresses data by discarding some of it. The procedure aims to minimize the amount of data that need to be held, handled, and/or transmitted by a computer...

. It can also be used for lossy data correction and density estimation
Density estimation
In probability and statistics,density estimation is the construction of an estimate, based on observed data, of an unobservable underlying probability density function...

.

Vector quantization is based on the competitive learning
Competitive learning
Competitive learning is a form of unsupervised learning in artificial neural networks, in which nodes compete for the right to respond to a subset of the input data. A variant of Hebbian learning, competitive learning works by increasing the specialization of each node in the network...

paradigm, so it is closely related to the self-organizing map
Self-organizing map
A self-organizing map or self-organizing feature map is a type of artificial neural network that is trained using unsupervised learning to produce a low-dimensional , discretized representation of the input space of the training samples, called a map...

model.

Training

A simple training algorithm for vector quantization is:
1. Pick a sample point at random
2. Move the nearest quantization vector centroid towards this sample point, by a small fraction of the distance
3. Repeat

A more sophisticated algorithm reduces the bias in the density matching estimation, and ensures that all points are used, by including an extra sensitivity parameter:
1. Increase each centroid's sensitivity by a small amount
2. Pick a sample point at random
3. Find the quantization vector centroid with the smallest
1. Move the chosen centroid toward the sample point by a small fraction of the distance
2. Set the chosen centroid's sensitivity to zero
4. Repeat

It is desirable to use a cooling schedule to produce convergence: see Simulated annealing
Simulated annealing
Simulated annealing is a generic probabilistic metaheuristic for the global optimization problem of locating a good approximation to the global optimum of a given function in a large search space. It is often used when the search space is discrete...

.

The algorithm can be iteratively updated with 'live' data, rather than by picking random points from a data set, but this will introduce some bias if the data is temporally correlated over many samples.

Applications

Vector quantization is used for lossy data compression, lossy data correction and density estimation.

Lossy data correction, or prediction, is used to recover data missing from some dimensions. It is done by finding the nearest group with the data dimensions available, then predicting the result based on the values for the missing dimensions, assuming that they will have the same value as the group's centroid.

For density estimation
Density estimation
In probability and statistics,density estimation is the construction of an estimate, based on observed data, of an unobservable underlying probability density function...

, the area/volume that is closer to a particular centroid than to any other is inversely proportional to the density (due to the density matching property of the algorithm).

Use in data compression

Vector quantization, also called "block quantization" or "pattern matching quantization" is often used in lossy data compression
Lossy data compression
In information technology, "lossy" compression is a data encoding method that compresses data by discarding some of it. The procedure aims to minimize the amount of data that need to be held, handled, and/or transmitted by a computer...

. It works by encoding values from a multidimensional vector space
Vector space
A vector space is a mathematical structure formed by a collection of vectors: objects that may be added together and multiplied by numbers, called scalars in this context. Scalars are often taken to be real numbers, but one may also consider vector spaces with scalar multiplication by complex...

into a finite set of values from a discrete subspace
Linear subspace
The concept of a linear subspace is important in linear algebra and related fields of mathematics.A linear subspace is usually called simply a subspace when the context serves to distinguish it from other kinds of subspaces....

of lower dimension. A lower-space vector requires less storage space, so the data is compressed. Due to the density matching property of vector quantization, the compressed data have errors that are inversely proportional to their density.

The transformation is usually done by projection
Projection (mathematics)
Generally speaking, in mathematics, a projection is a mapping of a set which is idempotent, which means that a projection is equal to its composition with itself. A projection may also refer to a mapping which has a left inverse. Bot notions are strongly related, as follows...

or by using a codebook
Codebook
A codebook is a type of document used for gathering and storing codes. Originally codebooks were often literally books, but today codebook is a byword for the complete record of a series of codes, regardless of physical format.-Cryptography:...

. In some cases, a codebook can be also used to entropy code the discrete value in the same step, by generating a prefix code
Prefix code
A prefix code is a type of code system distinguished by its possession of the "prefix property"; which states that there is no valid code word in the system that is a prefix of any other valid code word in the set...

d variable-length encoded value as its output.

The set of discrete amplitude levels is quantized jointly rather than each sample being quantized separately. Consider a K-dimensional vector of amplitude levels. It is compressed by choosing the nearest matching vector from a set of N-dimensional vectors .

All possible combinations of the N-dimensional vector form the vector space
Vector space
A vector space is a mathematical structure formed by a collection of vectors: objects that may be added together and multiplied by numbers, called scalars in this context. Scalars are often taken to be real numbers, but one may also consider vector spaces with scalar multiplication by complex...

to which all the quantized vectors belong.

Block Diagram:
A simple vector quantizer is shown below
Only the index of the codeword in the codebook is sent instead of the quantized values. This conserves space and achieves more compression.

Twin vector quantization
Twin vector quantization
In data compression, twin vector quantization is related to vector quantization, but the speed of the quantization is doubled by the secondary vector analyzer.By using a subdimensional vector space useless hyperspace will be destroyed in the process....

(VQF) is part of the MPEG-4
MPEG-4
MPEG-4 is a method of defining compression of audio and visual digital data. It was introduced in late 1998 and designated a standard for a group of audio and video coding formats and related technology agreed upon by the ISO/IEC Moving Picture Experts Group under the formal standard ISO/IEC...

standard dealing with time domain weighted interleaved vector quantization.

Video codecs based on vector quantization

• Cinepak
Cinepak
Cinepak is a video codec developed by Peter Barrett at SuperMac Technologies, and released in 1991 with the Video Spigot, and then in 1992 as part of Apple Computer's QuickTime video suite. It was designed to encode 320x240 resolution video at 1x CD-ROM transfer rates. The codec was ported to the...

and old versions of its spiritual successors:
• Sorenson codec
Sorenson codec
Sorenson codec may refer to either of three proprietary video codecs: Sorenson Video, Sorenson Video 3 or Sorenson Spark. Sorenson Video is also known as Sorenson Video Codec, Sorenson Video Quantizer or SVQ...

• Indeo
Indeo
Indeo Video is a video codec developed by Intel in 1992. It was sold to Ligos Corporation in 2000. While its original version was related to Intel's DVI video stream format, a hardware-only codec for the compression of television-quality video onto compact disks, Indeo was distinguished by being...

• VQA
.VQA
Vector Quantized Animation, known by its acronym VQA is a file format originally developed by Westwood Studios for video encoding in their game The Legend of Kyrandia and monopoly.- Description :...

format, used in many games

All of which are superseded by the MPEG family.

Audio codecs based on vector quantization

• CELP
• G.729
G.729
G.729 is an audio data compression algorithm for voice that compresses digital voice in packets of 10 milliseconds duration. It is officially described as Coding of speech at 8 kbit/s using conjugate-structure algebraic code-excited linear prediction .Because of its low bandwidth requirements,...

• TwinVQ
TwinVQ
TwinVQ is an audio compression technique developed by Nippon Telegraph and Telephone Corporation Human Interface Laboratories in 1994...

• Ogg Vorbis
• AMR-WB+
AMR-WB+
Extended Adaptive Multi-Rate – Wideband is an audio codec that extends AMR-WB. It adds support for stereo signals and higher sampling rates. Another main improvement is the use of transform coding additionally to ACELP. This greatly improves the generic audio coding...

• DTS

• speech coding
Speech coding
Speech coding is the application of data compression of digital audio signals containing speech. Speech coding uses speech-specific parameter estimation using audio signal processing techniques to model the speech signal, combined with generic data compression algorithms to represent the resulting...

• Ogg Vorbis
• Voronoi diagram
Voronoi diagram
In mathematics, a Voronoi diagram is a special kind of decomposition of a given space, e.g., a metric space, determined by distances to a specified family of objects in the space...

• rate-distortion function
• data clustering
Data clustering
Cluster analysis or clustering is the task of assigning a set of objects into groups so that the objects in the same cluster are more similar to each other than to those in other clusters....

• Learning Vector Quantization
LVQ
In Computer Science, Learning Vector Quantization , is a prototype-based supervised classification algorithm. LVQ is the supervised counterpart of vector quantization systems.- Overview :...

• Centroidal Voronoi tessellation
Centroidal Voronoi tessellation
In geometry, a centroidal Voronoi tessellation is a special type of Voronoi tessellation or Voronoi diagrams. A Voronoi tessellation is called centroidal when the generating point of each Voronoi cell is also its mean . It can be viewed as an optimal partition corresponding to an optimal...

• Growing Neural Gas
Neural gas
Neural gas is an artificial neural network, inspired by the self-organizing map and introduced in 1991 by Thomas Martinetz and Klaus Schulten. The neural gas is a simple algorithm for finding optimal data representations based on feature vectors...

, a neural network-like system for vector quantization

Part of this article was originally based on material from the Free On-line Dictionary of Computing
Free On-line Dictionary of Computing
The Free On-line Dictionary of Computing is an online, searchable, encyclopedic dictionary of computing subjects. It was founded in 1985 by Denis Howe and is hosted by Imperial College London...

and is used with permission under the GFDL.