Vector quantization - AbsoluteAstronomy.com

Vector quantization is a classical quantization

Quantization (signal processing)

Quantization, in mathematics and digital signal processing, is the process of mapping a large set of input values to a smaller set – such as rounding values to some unit of precision. A device or algorithmic function that performs quantization is called a quantizer. The error introduced by...

technique from signal processing

Signal processing

Signal processing is an area of systems engineering, electrical engineering and applied mathematics that deals with operations on or analysis of signals, in either discrete or continuous time...

which allows the modeling of probability density functions by the distribution of prototype vectors. It was originally used for data compression

Data compression

In computer science and information theory, data compression, source coding or bit-rate reduction is the process of encoding information using fewer bits than the original representation would use....

. It works by dividing a large set of points (vector

Coordinate vector

In linear algebra, a coordinate vector is an explicit representation of a vector in an abstract vector space as an ordered list of numbers or, equivalently, as an element of the coordinate space Fn....

s) into groups having approximately the same number of points closest to them. Each group is represented by its centroid

Centroid

In geometry, the centroid, geometric center, or barycenter of a plane figure or two-dimensional shape X is the intersection of all straight lines that divide X into two parts of equal moment about the line. Informally, it is the "average" of all points of X...

point, as in k-means and some other clustering

Clustering

Clustering can refer to the following:In demographics:* Clustering , the gathering of various populations based on factors such as ethnicity, economics or religion.In graph theory:...

algorithms.

The density matching property of vector quantization is powerful, especially for identifying the density of large and high-dimensioned data. Since data points are represented by the index of their closest centroid, commonly occurring data have low error, and rare data high error. This is why VQ is suitable for lossy data compression

Lossy data compression

In information technology, "lossy" compression is a data encoding method that compresses data by discarding some of it. The procedure aims to minimize the amount of data that need to be held, handled, and/or transmitted by a computer...

. It can also be used for lossy data correction and density estimation

Density estimation

In probability and statistics,density estimation is the construction of an estimate, based on observed data, of an unobservable underlying probability density function...

.

Vector quantization is based on the competitive learning

Competitive learning

Competitive learning is a form of unsupervised learning in artificial neural networks, in which nodes compete for the right to respond to a subset of the input data. A variant of Hebbian learning, competitive learning works by increasing the specialization of each node in the network...

paradigm, so it is closely related to the self-organizing map

Self-organizing map

A self-organizing map or self-organizing feature map is a type of artificial neural network that is trained using unsupervised learning to produce a low-dimensional , discretized representation of the input space of the training samples, called a map...

model.

Training

A simple training algorithm for vector quantization is:

Pick a sample point at random
Move the nearest quantization vector centroid towards this sample point, by a small fraction of the distance
Repeat

A more sophisticated algorithm reduces the bias in the density matching estimation, and ensures that all points are used, by including an extra sensitivity parameter:

Increase each centroid's sensitivity by a small amount
Pick a sample point at random
Find the quantization vector centroid with the smallest
1. Move the chosen centroid toward the sample point by a small fraction of the distance
2. Set the chosen centroid's sensitivity to zero
Repeat

It is desirable to use a cooling schedule to produce convergence: see Simulated annealing

Simulated annealing

Simulated annealing is a generic probabilistic metaheuristic for the global optimization problem of locating a good approximation to the global optimum of a given function in a large search space. It is often used when the search space is discrete...

.

The algorithm can be iteratively updated with 'live' data, rather than by picking random points from a data set, but this will introduce some bias if the data is temporally correlated over many samples.

Applications

Vector quantization is used for lossy data compression, lossy data correction and density estimation.

Lossy data correction, or prediction, is used to recover data missing from some dimensions. It is done by finding the nearest group with the data dimensions available, then predicting the result based on the values for the missing dimensions, assuming that they will have the same value as the group's centroid.

For density estimation

Density estimation

In probability and statistics,density estimation is the construction of an estimate, based on observed data, of an unobservable underlying probability density function...

, the area/volume that is closer to a particular centroid than to any other is inversely proportional to the density (due to the density matching property of the algorithm).

Use in data compression

Vector quantization, also called "block quantization" or "pattern matching quantization" is often used in lossy data compression

Lossy data compression

. It works by encoding values from a multidimensional vector space

Vector space

A vector space is a mathematical structure formed by a collection of vectors: objects that may be added together and multiplied by numbers, called scalars in this context. Scalars are often taken to be real numbers, but one may also consider vector spaces with scalar multiplication by complex...

into a finite set of values from a discrete subspace

Linear subspace

The concept of a linear subspace is important in linear algebra and related fields of mathematics.A linear subspace is usually called simply a subspace when the context serves to distinguish it from other kinds of subspaces....

of lower dimension. A lower-space vector requires less storage space, so the data is compressed. Due to the density matching property of vector quantization, the compressed data have errors that are inversely proportional to their density.

The transformation is usually done by projection

Projection (mathematics)

Generally speaking, in mathematics, a projection is a mapping of a set which is idempotent, which means that a projection is equal to its composition with itself. A projection may also refer to a mapping which has a left inverse. Bot notions are strongly related, as follows...

or by using a codebook

Codebook

A codebook is a type of document used for gathering and storing codes. Originally codebooks were often literally books, but today codebook is a byword for the complete record of a series of codes, regardless of physical format.-Cryptography:...

. In some cases, a codebook can be also used to entropy code the discrete value in the same step, by generating a prefix code

Prefix code

A prefix code is a type of code system distinguished by its possession of the "prefix property"; which states that there is no valid code word in the system that is a prefix of any other valid code word in the set...

d variable-length encoded value as its output.

The set of discrete amplitude levels is quantized jointly rather than each sample being quantized separately. Consider a K-dimensional vector

of amplitude levels. It is compressed by choosing the nearest matching vector from a set of N-dimensional vectors

.

All possible combinations of the N-dimensional vector

form the vector space

Vector space

to which all the quantized vectors belong.

Block Diagram:
A simple vector quantizer is shown below
Only the index of the codeword in the codebook is sent instead of the quantized values. This conserves space and achieves more compression.

Twin vector quantization

Twin vector quantization

In data compression, twin vector quantization is related to vector quantization, but the speed of the quantization is doubled by the secondary vector analyzer.By using a subdimensional vector space useless hyperspace will be destroyed in the process....

(VQF) is part of the MPEG-4

MPEG-4

MPEG-4 is a method of defining compression of audio and visual digital data. It was introduced in late 1998 and designated a standard for a group of audio and video coding formats and related technology agreed upon by the ISO/IEC Moving Picture Experts Group under the formal standard ISO/IEC...

standard dealing with time domain weighted interleaved vector quantization.

Video codecs based on vector quantization

Cinepak
Cinepak
Cinepak is a video codec developed by Peter Barrett at SuperMac Technologies, and released in 1991 with the Video Spigot, and then in 1992 as part of Apple Computer's QuickTime video suite. It was designed to encode 320x240 resolution video at 1x CD-ROM transfer rates. The codec was ported to the...

and old versions of its spiritual successors:

Sorenson codec
Sorenson codec
Sorenson codec may refer to either of three proprietary video codecs: Sorenson Video, Sorenson Video 3 or Sorenson Spark. Sorenson Video is also known as Sorenson Video Codec, Sorenson Video Quantizer or SVQ...
Indeo
Indeo
Indeo Video is a video codec developed by Intel in 1992. It was sold to Ligos Corporation in 2000. While its original version was related to Intel's DVI video stream format, a hardware-only codec for the compression of television-quality video onto compact disks, Indeo was distinguished by being...
VQA
.VQA
Vector Quantized Animation, known by its acronym VQA is a file format originally developed by Westwood Studios for video encoding in their game The Legend of Kyrandia and monopoly.- Description :...

format, used in many games

All of which are superseded by the MPEG family.

Audio codecs based on vector quantization

CELP
G.729
G.729
G.729 is an audio data compression algorithm for voice that compresses digital voice in packets of 10 milliseconds duration. It is officially described as Coding of speech at 8 kbit/s using conjugate-structure algebraic code-excited linear prediction .Because of its low bandwidth requirements,...
TwinVQ
TwinVQ
TwinVQ is an audio compression technique developed by Nippon Telegraph and Telephone Corporation Human Interface Laboratories in 1994...
Ogg Vorbis
AMR-WB+
AMR-WB+
Extended Adaptive Multi-Rate – Wideband is an audio codec that extends AMR-WB. It adds support for stereo signals and higher sampling rates. Another main improvement is the use of transform coding additionally to ACELP. This greatly improves the generic audio coding...
DTS

External links

http://www.data-compression.com/vq.html
QccPack — Quantization, Compression, and Coding Library (open source)
http://www.mqasem.net/vectorquantization/vq.html
VQ Indexes Compression and Information Hiding Using Hybrid Lossless Index Coding, Wen-Jan Chen and Wen-Tsung Huang

The source of this article is wikipedia, the free encyclopedia. The text of this article is licensed under the GFDL.