Distance matrix
Encyclopedia
In mathematics
, computer science
and graph theory
, a distance matrix is a matrix
(two-dimensional array) containing the distance
s, taken pairwise, of a set of points. This matrix will have a size of N×N where N is the number of points, nodes or vertices (often in a graph).
, with the differences that (a) the latter only provides the information which vertices are connected but does not tell about costs or distances between the vertices and (b) an entry of a distance matrix is smaller if two elements are closer, while "close" (connected) vertices yield larger entries in an adjacency matrix.
(as they would be in the Euclidean distance matrix) but rather can have negative values, zeros or imaginary number
s depending on the cost metric and specific use. Although it is often the case, distance matrices are not restricted to being hollow
-- that is, they can have non-zero entries on the main diagonal.
euclidean distance
is the distance metric
.
The distance matrix would be:
These data can then be viewed in graphic form as a heat map
. In this image, black denotes a distance of 0 and white is maximal distance.
In bioinformatics
, distance matrices are used to represent protein
structures in a coordinate-independent manner, as well as the pairwise distances between two sequences in sequence space. They are used in structural
and sequential
alignment, and for the determination of protein structures from NMR
or X-ray crystallography
.
Sometimes it is more convenient to express data as a similarity matrix
.
Mathematics
Mathematics is the study of quantity, space, structure, and change. Mathematicians seek out patterns and formulate new conjectures. Mathematicians resolve the truth or falsity of conjectures by mathematical proofs, which are arguments sufficient to convince other mathematicians of their validity...
, computer science
Computer science
Computer science or computing science is the study of the theoretical foundations of information and computation and of practical techniques for their implementation and application in computer systems...
and graph theory
Graph theory
In mathematics and computer science, graph theory is the study of graphs, mathematical structures used to model pairwise relations between objects from a certain collection. A "graph" in this context refers to a collection of vertices or 'nodes' and a collection of edges that connect pairs of...
, a distance matrix is a matrix
Matrix (mathematics)
In mathematics, a matrix is a rectangular array of numbers, symbols, or expressions. The individual items in a matrix are called its elements or entries. An example of a matrix with six elements isMatrices of the same size can be added or subtracted element by element...
(two-dimensional array) containing the distance
Distance
Distance is a numerical description of how far apart objects are. In physics or everyday discussion, distance may refer to a physical length, or an estimation based on other criteria . In mathematics, a distance function or metric is a generalization of the concept of physical distance...
s, taken pairwise, of a set of points. This matrix will have a size of N×N where N is the number of points, nodes or vertices (often in a graph).
Comparison with Adjacency matrix
Distance matrices are related to adjacency matricesAdjacency matrix
In mathematics and computer science, an adjacency matrix is a means of representing which vertices of a graph are adjacent to which other vertices...
, with the differences that (a) the latter only provides the information which vertices are connected but does not tell about costs or distances between the vertices and (b) an entry of a distance matrix is smaller if two elements are closer, while "close" (connected) vertices yield larger entries in an adjacency matrix.
Comparison with Euclidean distance matrix
Unlike an Euclidean distance matrix, the matrix does not need to be symmetric -- that is, the values xi,j do not necessarily equal xj,i. Similarly, the matrix values are not restricted to non-negative realsReal number
In mathematics, a real number is a value that represents a quantity along a continuum, such as -5 , 4/3 , 8.6 , √2 and π...
(as they would be in the Euclidean distance matrix) but rather can have negative values, zeros or imaginary number
Imaginary number
An imaginary number is any number whose square is a real number less than zero. When any real number is squared, the result is never negative, but the square of an imaginary number is always negative...
s depending on the cost metric and specific use. Although it is often the case, distance matrices are not restricted to being hollow
Hollow matrix
In mathematics, a hollow matrix may refer to one of several related classes of matrix.-Diagonal entries all zero:A hollow matrix may be a square matrix whose diagonal elements are all equal to zero...
-- that is, they can have non-zero entries on the main diagonal.
Examples and uses
For example, suppose these data are to be analyzed, where pixelPixel
In digital imaging, a pixel, or pel, is a single point in a raster image, or the smallest addressable screen element in a display device; it is the smallest unit of picture that can be represented or controlled....
euclidean distance
Euclidean distance
In mathematics, the Euclidean distance or Euclidean metric is the "ordinary" distance between two points that one would measure with a ruler, and is given by the Pythagorean formula. By using this formula as distance, Euclidean space becomes a metric space...
is the distance metric
Metric (mathematics)
In mathematics, a metric or distance function is a function which defines a distance between elements of a set. A set with a metric is called a metric space. A metric induces a topology on a set but not all topologies can be generated by a metric...
.
The distance matrix would be:
a | b | c | d | e | f | |
---|---|---|---|---|---|---|
a | 0 | 184 | 222 | 177 | 216 | 231 |
b | 184 | 0 | 45 | 123 | 128 | 200 |
c | 222 | 45 | 0 | 129 | 121 | 203 |
d | 177 | 123 | 129 | 0 | 46 | 83 |
e | 216 | 128 | 121 | 46 | 0 | 83 |
f | 231 | 200 | 203 | 83 | 83 | 0 |
These data can then be viewed in graphic form as a heat map
Heat map
A heat map is a graphical representation of data where the values taken by a variable in a two-dimensional table are represented as colors. Fractal maps and tree maps both often use a similar system of color-coding to represent the values taken by a variable in a hierarchy...
. In this image, black denotes a distance of 0 and white is maximal distance.
In bioinformatics
Bioinformatics
Bioinformatics is the application of computer science and information technology to the field of biology and medicine. Bioinformatics deals with algorithms, databases and information systems, web technologies, artificial intelligence and soft computing, information and computation theory, software...
, distance matrices are used to represent protein
Protein
Proteins are biochemical compounds consisting of one or more polypeptides typically folded into a globular or fibrous form, facilitating a biological function. A polypeptide is a single linear polymer chain of amino acids bonded together by peptide bonds between the carboxyl and amino groups of...
structures in a coordinate-independent manner, as well as the pairwise distances between two sequences in sequence space. They are used in structural
Structural alignment
Structural alignment attempts to establish homology between two or more polymer structures based on their shape and three-dimensional conformation. This process is usually applied to protein tertiary structures but can also be used for large RNA molecules...
and sequential
Sequence alignment
In bioinformatics, a sequence alignment is a way of arranging the sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequences. Aligned sequences of nucleotide or amino acid residues are...
alignment, and for the determination of protein structures from NMR
Nuclear magnetic resonance
Nuclear magnetic resonance is a physical phenomenon in which magnetic nuclei in a magnetic field absorb and re-emit electromagnetic radiation...
or X-ray crystallography
X-ray crystallography
X-ray crystallography is a method of determining the arrangement of atoms within a crystal, in which a beam of X-rays strikes a crystal and causes the beam of light to spread into many specific directions. From the angles and intensities of these diffracted beams, a crystallographer can produce a...
.
Sometimes it is more convenient to express data as a similarity matrix
Similarity matrix
A similarity matrix is a matrix of scores which express the similarity between two data points. Similarity matrices are strongly related to their counterparts, distance matrices and substitution matrices.-Use in sequence alignment:...
.
See also
- Data clusteringData clusteringCluster analysis or clustering is the task of assigning a set of objects into groups so that the objects in the same cluster are more similar to each other than to those in other clusters....
- Computer VisionComputer visionComputer vision is a field that includes methods for acquiring, processing, analysing, and understanding images and, in general, high-dimensional data from the real world in order to produce numerical or symbolic information, e.g., in the forms of decisions...
- Min-plus matrix multiplication