Coding theory
Encyclopedia
Coding theory is the study of the properties of code
Code
A code is a rule for converting a piece of information into another form or representation , not necessarily of the same type....

s and their fitness for a specific application. Codes are used for data compression
Data compression
In computer science and information theory, data compression, source coding or bit-rate reduction is the process of encoding information using fewer bits than the original representation would use....

, cryptography
Cryptography
Cryptography is the practice and study of techniques for secure communication in the presence of third parties...

, error-correction and more recently also for network coding
Network coding
Network coding is a technique where, instead of simply relaying the packets of information they receive, the nodes of a network will take several packets and combine them together for transmission. This can be used to attain the maximum possible information flow in a network...

. Codes are studied by various scientific disciplines—such as information theory
Information theory
Information theory is a branch of applied mathematics and electrical engineering involving the quantification of information. Information theory was developed by Claude E. Shannon to find fundamental limits on signal processing operations such as compressing data and on reliably storing and...

, electrical engineering
Electrical engineering
Electrical engineering is a field of engineering that generally deals with the study and application of electricity, electronics and electromagnetism. The field first became an identifiable occupation in the late nineteenth century after commercialization of the electric telegraph and electrical...

, mathematics
Mathematics
Mathematics is the study of quantity, space, structure, and change. Mathematicians seek out patterns and formulate new conjectures. Mathematicians resolve the truth or falsity of conjectures by mathematical proofs, which are arguments sufficient to convince other mathematicians of their validity...

, and computer science
Computer science
Computer science or computing science is the study of the theoretical foundations of information and computation and of practical techniques for their implementation and application in computer systems...

—for the purpose of designing efficient and reliable data transmission
Data transmission
Data transmission, digital transmission, or digital communications is the physical transfer of data over a point-to-point or point-to-multipoint communication channel. Examples of such channels are copper wires, optical fibres, wireless communication channels, and storage media...

methods. This typically involves the removal of redundancy and the correction (or detection) of errors in the transmitted data.

There are essentially two aspects to Coding theory:
1. Data compression (or, source coding)
2. Error correction (or, channel coding).

These two aspects may be studied in combination
Joint source and channel coding
In information theory, joint source–channel coding is the encoding of a redundant information source for transmission over a noisy channel, and the corresponding decoding, using a single code instead of the more conventional steps of source coding followed by channel coding.Joint source–channel...

. Source encoding, attempts to compress the data from a source in order to transmit it more efficiently. This practice is found every day on the Internet where the common Zip data compression
ZIP (file format)
Zip is a file format used for data compression and archiving. A zip file contains one or more files that have been compressed, to reduce file size, or stored as is...

is used to reduce the network load and make files smaller. The second, channel encoding, adds extra data bits to make the transmission of data more robust to disturbances present on the transmission channel. The ordinary user may not be aware of many applications using channel coding. A typical music CD uses the Reed-Solomon code to correct for scratches and dust. In this application the transmission channel is the CD itself. Cell phones also use coding techniques to correct for the fading and noise of high frequency radio transmission. Data modems, telephone transmissions, and NASA
NASA
The National Aeronautics and Space Administration is the agency of the United States government that is responsible for the nation's civilian space program and for aeronautics and aerospace research...

all employ channel coding techniques to get the bits through, for example the turbo code
Turbo code
In information theory, turbo codes are a class of high-performance forward error correction codes developed in 1993, which were the first practical codes to closely approach the channel capacity, a theoretical maximum for the code rate at which reliable communication is still possible given a...

and LDPC codes.

## Source coding

The aim of source coding is to take the source data and make it smaller.

### Principle

Entropy of a source is the measure of information. Basically source codes try to reduce the redundancy present in the source, and represent the source with fewer bits that carry more information.

Data compression which explicitly tries to minimize the average length of messages according to a particular assumed probability model is called entropy encoding
Entropy encoding
In information theory an entropy encoding is a lossless data compression scheme that is independent of the specific characteristics of the medium....

.

Various techniques used by source coding schemes try to achieve the limit of Entropy of the source. C(x) ≥ H(x), where H(x) is entropy of source (bitrate), and C(x) is the bitrate after compression. In particular, no source coding scheme can be better than the entropy of the source.

### Example

Facsimile
Fax
Fax , sometimes called telecopying, is the telephonic transmission of scanned printed material , normally to a telephone number connected to a printer or other output device...

transmission uses a simple run length code
Run-length encoding
Run-length encoding is a very simple form of data compression in which runs of data are stored as a single data value and count, rather than as the original run...

.
Source coding includes also removal of all data that superfluous the need of transmitter,
this decreases the bandwidth required for the transmission process.

## Channel coding

The aim of channel coding theory is to find codes which transmit quickly, contain many valid code word
Code word
In communication, a code word is an element of a standardized code or protocol. Each code word is assembled in accordance with the specific rules of the code and assigned a unique meaning...

s and can correct or at least detect many errors. While not mutually exclusive, performance in these areas is a trade off. So, different codes are optimal for different applications. The needed properties of this code mainly depend on the probability of errors happening during transmission. In a typical CD, the impairment is mainly dust or scratches. Thus codes are used in an interleaved manner. The data is spread out over the disk. Although not a very good code, a simple repeat code can serve as an understandable example. Suppose we take a block of data bits (representing sound) and send it three times. At the receiver we will examine the three repetitions bit by bit and take a majority vote. The twist on this is that we don't merely send the bits in order. We interleave them. The block of data bits is first divided into 4 smaller blocks. Then we cycle through the block and send one bit from the first, then the second, etc. This is done three times to spread the data out over the surface of the disk. In the context of the simple repeat code, this may not appear effective. However, there are more powerful codes known which are very effective at correcting the "burst" error of a scratch or a dust spot when this interleaving technique is used.

Other codes are more appropriate for different applications. Deep space communications are limited by the thermal noise of the receiver which is more of a continuous nature than a bursty nature. Likewise, narrowband modems are limited by the noise, present in the telephone network and also modeled better as a continuous disturbance. Cell phones are subject to rapid fading. The high frequencies used can cause rapid fading of the signal even if the receiver is moved a few inches. Again there are a class of channel codes that are designed to combat fading.

### Linear codes

The term algebraic coding theory denotes the sub-field of coding theory where the properties of codes are expressed in algebraic terms and then further researched.

Algebraic coding theory is basically divided into two major types of codes:
1. Linear block codes
2. Convolutional codes.

It analyzes the following three properties of a code – mainly:
• code word length
• total number of valid code words
• the minimum distance
Distance
Distance is a numerical description of how far apart objects are. In physics or everyday discussion, distance may refer to a physical length, or an estimation based on other criteria . In mathematics, a distance function or metric is a generalization of the concept of physical distance...

between two valid code words, using mainly the Hamming distance
Hamming distance
In information theory, the Hamming distance between two strings of equal length is the number of positions at which the corresponding symbols are different...

, sometimes also other distances like the Lee distance.

#### Linear block codes

Linear block codes have the property of linearity, i.e the sum of any two codewords is also a code word, and they are applied to the source bits in blocks, hence the name linear block codes. There are block codes that are not linear, but it is difficult to prove that a code is a good one without this property.

Linear block codes are summarized by their symbol alphabets (e.g., binary or ternary) and parameters (n,m,dmin) where
1. n is the length of the codeword, in symbols,
2. m is the number of source symbols that will be used for encoding at once,
3. dmin is the minimum hamming distance for the code.

There are many types of linear block codes, such as
1. Cyclic codes (e.g., Hamming code
Hamming code
In telecommunication, Hamming codes are a family of linear error-correcting codes that generalize the Hamming-code invented by Richard Hamming in 1950. Hamming codes can detect up to two and correct up to one bit errors. By contrast, the simple parity code cannot correct errors, and can detect only...

s)
2. Repetition code
Repetition code
In coding theory, the repetition code is one of the most basic error-correcting codes. In order to transmit a message over a noisy channel that may corrupt the transmission in a few places, the idea of the repetition code is to just repeat the message several times. The hope is that the channel...

s
3. Parity codes
Parity bit
A parity bit is a bit that is added to ensure that the number of bits with the value one in a set of bits is even or odd. Parity bits are used as the simplest form of error detecting code....

4. Polynomial code
Polynomial code
In coding theory, a polynomial code is a type of linear code whose set of valid code words consists of those polynomials that are divisible by a given fixed polynomial ....

s (e.g., BCH code
BCH code
In coding theory the BCH codes form a class of parameterised error-correcting codes which have been the subject of much academic attention in the last fifty years. BCH codes were invented in 1959 by Hocquenghem, and independently in 1960 by Bose and Ray-Chaudhuri...

s)
5. Reed–Solomon codes
6. Algebraic geometric codes
7. Reed–Muller code
Reed–Muller code
Reed–Muller codes are a family of linear error-correcting codes used in communications. Reed–Muller codes belong to the classes of locally testable codes and locally decodable codes, which is why they are useful in the design of probabilistically checkable proofs in computational complexity theory....

s
8. Perfect codes
Hamming bound
In mathematics and computer science, in the field of coding theory, the Hamming bound is a limit on the parameters of an arbitrary block code: it is also known as the sphere-packing bound or the volume bound from an interpretation in terms of packing balls in the Hamming metric into the space of...

.

Block codes are tied to the sphere packing
Sphere packing
In geometry, a sphere packing is an arrangement of non-overlapping spheres within a containing space. The spheres considered are usually all of identical size, and the space is usually three-dimensional Euclidean space...

problem, which has received some attention over the years. In two dimensions, it is easy to visualize. Take a bunch of pennies flat on the table and push them together. The result is a hexagon pattern like a bee's nest. But block codes rely on more dimensions which cannot easily be visualized. The powerful (24,12) Golay code
Binary Golay code
In mathematics and electronics engineering, a binary Golay code is a type of error-correcting code used in digital communications. The binary Golay code, along with the ternary Golay code, has a particularly deep and interesting connection to the theory of finite sporadic groups in mathematics....

used in deep space communications uses 24 dimensions. If used as a binary code (which it usually is) the dimensions refer to the length of the codeword as defined above.

The theory of coding uses the N-dimensional sphere model. For example, how many pennies can be packed into a circle on a tabletop, or in 3 dimensions, how many marbles can be packed into a globe. Other considerations enter the choice of a code. For example, hexagon packing into the constraint of a rectangular box will leave empty space at the corners. As the dimensions get larger, the percentage of empty space grows smaller. But at certain dimensions, the packing uses all the space and these codes are the so-called "perfect" codes. The only nontrivial and useful perfect codes are the distance-3 Hamming codes with parameters satisfying (2r – 1, 2r – 1 – r, 3), and the [23,12,7] binary and [11,6,5] ternary Golay codes.

Another code property is the number of neighbors that a single codeword may have.
Again, consider pennies as an example. First we pack the pennies in a rectangular grid. Each penny will have 4 near neighbors (and 4 at the corners which are farther away). In a hexagon, each penny will have 6 near neighbors. When we increase the dimensions, the number of near neighbors increases very rapidly. The result is the number of ways for noise to make the receiver choose a neighbor (hence an error) grows as well. This is a fundamental limitation of block codes, and indeed all codes. It may be harder to cause an error to a single neighbor, but the number of neighbors can be large enough so the total error probability actually suffers.

Properties of linear block codes are used in many applications. For example, the syndrome-coset uniqueness property of linear block codes is used in trellis shaping, one of the best known shaping codes
Shaping codes
In digital communications shaping codes are a method of encoding that changes the distribution of signals to improve efficiency.-Description:...

. This same property is used in sensor networks for distributed source coding

#### Convolutional codes

The idea behind a convolutional code is to make every codeword symbol be the weighted sum of the various input message symbols. This is like convolution
Convolution
In mathematics and, in particular, functional analysis, convolution is a mathematical operation on two functions f and g, producing a third function that is typically viewed as a modified version of one of the original functions. Convolution is similar to cross-correlation...

used in LTI systems to find the output of a system, when you know the input and impulse response.

So we generally find the output of the system convolutional encoder, which is the convolution of the input bit, against the states of the convolution encoder, registers.

Fundamentally, convolutional codes do not offer more protection against noise than an equivalent block code. In many cases, they generally offer greater simplicity of implementation over a block code of equal power. The encoder is usually a simple circuit which has state memory and some feedback logic, normally XOR gates. The decoder can be implemented in software or firmware.

The Viterbi algorithm
Viterbi algorithm
The Viterbi algorithm is a dynamic programming algorithm for finding the most likely sequence of hidden states – called the Viterbi path – that results in a sequence of observed events, especially in the context of Markov information sources, and more generally, hidden Markov models...

is the optimum algorithm used to decode convolutional codes. There are simplifications to reduce the computational load. They rely on searching only the most likely paths. Although not optimum, they have generally found to give good results in the lower noise environments.

Convolutional codes are used in voiceband modems (V.32, V.17, V.34) and in GSM mobile phones, as well as satellite and military communication devices.

## Other applications of coding theory

Another concern of coding theory is designing codes that help synchronization
Synchronization
Synchronization is timekeeping which requires the coordination of events to operate a system in unison. The familiar conductor of an orchestra serves to keep the orchestra in time....

. A code may be designed so that a phase shift
Phase (waves)
Phase in waves is the fraction of a wave cycle which has elapsed relative to an arbitrary point.-Formula:The phase of an oscillation or wave refers to a sinusoidal function such as the following:...

can be easily detected and corrected and that multiple signals can be sent on the same channel.

Another application of codes, used in some mobile phone systems, is code-division multiple access (CDMA). Each phone is assigned a code sequence that is approximately uncorrelated with the codes of other phones. When transmitting, the code word is used to modulate the data bits representing the voice message. At the receiver, a demodulation process is performed to recover the data. The properties of this class of codes allow many users (with different codes) to use the same radio channel at the same time. To the receiver, the signals of other users will appear to the demodulator only as a low-level noise.

Another general class of codes are the automatic repeat-request (ARQ) codes. In these codes the sender adds redundancy to each message for error checking, usually by adding check bits. If the check bits are not consistent with the rest of the message when it arrives, the receiver will ask the sender to retransmit the message. All but the simplest wide area network
Wide area network
A wide area network is a telecommunication network that covers a broad area . Business and government entities utilize WANs to relay data among employees, clients, buyers, and suppliers from various geographical locations...

protocols use ARQ. Common protocols include SDLC
Synchronous Data Link Control is a computer communications protocol. It is the layer 2 protocol for IBM's Systems Network Architecture . SDLC supports multipoint links as well as error correction. It also runs under the assumption that an SNA header is present after the SDLC header...

(IBM), TCP
Transmission Control Protocol
The Transmission Control Protocol is one of the core protocols of the Internet Protocol Suite. TCP is one of the two original components of the suite, complementing the Internet Protocol , and therefore the entire suite is commonly referred to as TCP/IP...

(Internet), X.25
X.25
X.25 is an ITU-T standard protocol suite for packet switched wide area network communication. An X.25 WAN consists of packet-switching exchange nodes as the networking hardware, and leased lines, Plain old telephone service connections or ISDN connections as physical links...

(International) and many others. There is an extensive field of research on this topic because of the problem of matching a rejected packet against a new packet. Is it a new one or is it a retransmission? Typically numbering schemes are used, as in TCP.

### Group Testing

Group testing
Group testing
In combinatorial mathematics, group testing is a set of problems with the objective of reducing the cost of identifying certain elements of a set.-Background:Robert Dorfman's paper in 1943 introduced the field of Group Testing...

uses codes in a different way. Consider a large group of items in which a very few are different in a particular way (for eg. Defective products or infected test subjects). The idea of group testing is to determine which items are "different" by using as few tests as possible. The origin of the problem has its roots in the Second World War when the United States Army Air Forces
United States Army Air Forces
The United States Army Air Forces was the military aviation arm of the United States of America during and immediately after World War II, and the direct predecessor of the United States Air Force....

needed to test its soldiers for Syphilis
Syphilis
Syphilis is a sexually transmitted infection caused by the spirochete bacterium Treponema pallidum subspecies pallidum. The primary route of transmission is through sexual contact; however, it may also be transmitted from mother to fetus during pregnancy or at birth, resulting in congenital syphilis...

. It originated from a ground-breaking paper by Robert Dorfman
Robert Dorfman
Robert Dorfman was emeritus professor of political economy at Harvard University. Dorfman made great contributions to the fields of economics, group testing and in the process of coding theory....

.

### Analog coding

Information is encoded analogously in the neural network
Neural network
The term neural network was traditionally used to refer to a network or circuit of biological neurons. The modern usage of the term often refers to artificial neural networks, which are composed of artificial neurons or nodes...

s of brain
Brain
The brain is the center of the nervous system in all vertebrate and most invertebrate animals—only a few primitive invertebrates such as sponges, jellyfish, sea squirts and starfishes do not have one. It is located in the head, usually close to primary sensory apparatus such as vision, hearing,...

s, in analog signal processing
Analog signal processing
Analog signal processing is any signal processing conducted on analog signals by analog means. "Analog" indicates something that is mathematically represented as a set of continuous values. This differs from "digital" which uses a series of discrete quantities to represent signal...

, and analog electronics. Aspects of analog coding include analog error correction, analog data compression. analog encryption

## Neural coding

Neural coding
Neural coding
Neural coding is a neuroscience-related field concerned with how sensory and other information is represented in the brain by networks of neurons. The main goal of studying neural coding is to characterize the relationship between the stimulus and the individual or ensemble neuronal responses and...

is a neuroscience
Neuroscience
Neuroscience is the scientific study of the nervous system. Traditionally, neuroscience has been seen as a branch of biology. However, it is currently an interdisciplinary science that collaborates with other fields such as chemistry, computer science, engineering, linguistics, mathematics,...

-related field concerned with how sensory and other information is represented in the brain
Brain
The brain is the center of the nervous system in all vertebrate and most invertebrate animals—only a few primitive invertebrates such as sponges, jellyfish, sea squirts and starfishes do not have one. It is located in the head, usually close to primary sensory apparatus such as vision, hearing,...

by networks
Neural network
The term neural network was traditionally used to refer to a network or circuit of biological neurons. The modern usage of the term often refers to artificial neural networks, which are composed of artificial neurons or nodes...

of neurons. The main goal of studying neural coding is to characterize the relationship between the stimulus
Stimulus (physiology)
In physiology, a stimulus is a detectable change in the internal or external environment. The ability of an organism or organ to respond to external stimuli is called sensitivity....

and the individual or ensemble neuronal responses and the relationship among electrical activity of the neurons in the ensemble. It is thought that neurons can encode both digital
Digital
A digital system is a data technology that uses discrete values. By contrast, non-digital systems use a continuous range of values to represent information...

and analog
Analog signal
An analog or analogue signal is any continuous signal for which the time varying feature of the signal is a representation of some other time varying quantity, i.e., analogous to another time varying signal. It differs from a digital signal in terms of small fluctuations in the signal which are...

information, and that neurons follow the principles of information theory and compress information, and detect and correct errors in the signals that are sent throughout the brain and wider nervous system.

• Coding gain
• Covering code
Covering code
In coding theory, a covering code is an object satisfying a certain mathematical property: A code of length n over Q is an R-covering code if for every word of Q^n there is a codeword such that their Hamming distance is \le R.- Definition :...

• Error-correcting code
• Group testing
Group testing
In combinatorial mathematics, group testing is a set of problems with the objective of reducing the cost of identifying certain elements of a set.-Background:Robert Dorfman's paper in 1943 introduced the field of Group Testing...

• Hamming distance
Hamming distance
In information theory, the Hamming distance between two strings of equal length is the number of positions at which the corresponding symbols are different...

, Hamming weight
Hamming weight
The Hamming weight of a string is the number of symbols that are different from the zero-symbol of the alphabet used. It is thus equivalent to the Hamming distance from the all-zero string of the same length. For the most typical case, a string of bits, this is the number of 1's in the string...

• Information theory
Information theory
Information theory is a branch of applied mathematics and electrical engineering involving the quantification of information. Information theory was developed by Claude E. Shannon to find fundamental limits on signal processing operations such as compressing data and on reliably storing and...

• Lee distance
• Spatial coding and MIMO
MIMO
In radio, multiple-input and multiple-output, or MIMO , is the use of multiple antennas at both the transmitter and receiver to improve communication performance. It is one of several forms of smart antenna technology...

in multiple antenna research
• Spatial diversity coding
Space–time code
A space–time code is a method employed to improve the reliability of data transmission in wireless communication systems using multiple transmit antennas...

is spatial coding that transmits replicas of the information signal along different spatial paths, so as to increase the reliability of the data transmission.
• Spatial interference cancellation coding
• Spatial multiplex coding
Spatial multiplexing
Spatial multiplexing is a transmission technique in MIMO wireless communication to transmit independent and separately encoded data signals, so-called streams, from each of the multiple transmit antennas...

• Timeline of information theory, data compression, and error correcting codes
Timeline of information theory
A timeline of events related to  information theory,  quantum information theory,  data compression,  error correcting codes and related subjects....

• List of algebraic coding theory topics