Chemical similarity
Encyclopedia
Chemical similarity refers to the similarity of chemical element
s, molecule
s or chemical compound
s with respect to either structural
or functional qualities, i.e. the effect that the chemical compound has on reaction
partners in anorganic or biological settings. Biological effects and thus also similarity of effects are usually quantified using the biological activity
of a compound. In general terms, function can be related to the chemical activity of compounds (among others).
The notion of chemical similarity (or molecular similarity) is one of the most important concepts in chemoinformatics. It plays an important role in modern approaches to predicting the properties of chemical compounds, designing chemicals with a predefined set of properties and, especially, in conducting drug design studies by screening large databases containing structures of available (or potentially available) chemicals. These studies are based on the similar property principle of Johnson and Maggiora, which states: similar compounds have similar properties.
of a measure of distance
in descriptor space. Distance measures can be classified into Euclidean measure
s and non-Euclidean measures depending on whether the triangle inequality
holds.
(a kind of ligand-based virtual screening) assumes that all compounds in a database that are similar to a query compound have similar biological activity. Although this hypothesis is not always valid, quite often the set of retrieved compounds is considerably enriched with actives. To achieve high efficacy of similarity-based screening of databases containing millions of compounds, molecular structures are usually represented by molecular screens (structural keys) or by fixed-size or variable-size molecular fingerprints. Molecular screens and fingerprints can contain both 2D- and 3D-information. However, the 2D-fingerprints, which are a kind of binary fragment descriptors, dominate in this area. Fragment-based structural keys, like MDL keys, are sufficiently good for handling small and medium-sized chemical databases, whereas processing of large databases is performed with fingerprints having much higher information density. Fragment-based Daylight, BCI, and UNITY 2D (Tripos) fingerprints are the best known examples. The most popular similarity measure for comparing chemical structures represented by means of fingerprints is the Tanimoto (or Jaccard) coefficient
T. Two structures are usually considered similar if (for Daylight fingerprints).
Chemical element
A chemical element is a pure chemical substance consisting of one type of atom distinguished by its atomic number, which is the number of protons in its nucleus. Familiar examples of elements include carbon, oxygen, aluminum, iron, copper, gold, mercury, and lead.As of November 2011, 118 elements...
s, molecule
Molecule
A molecule is an electrically neutral group of at least two atoms held together by covalent chemical bonds. Molecules are distinguished from ions by their electrical charge...
s or chemical compound
Chemical compound
A chemical compound is a pure chemical substance consisting of two or more different chemical elements that can be separated into simpler substances by chemical reactions. Chemical compounds have a unique and defined chemical structure; they consist of a fixed ratio of atoms that are held together...
s with respect to either structural
Chemical structure
A chemical structure includes molecular geometry, electronic structure and crystal structure of molecules. Molecular geometry refers to the spatial arrangement of atoms in a molecule and the chemical bonds that hold the atoms together. Molecular geometry can range from the very simple, such as...
or functional qualities, i.e. the effect that the chemical compound has on reaction
Chemical reaction
A chemical reaction is a process that leads to the transformation of one set of chemical substances to another. Chemical reactions can be either spontaneous, requiring no input of energy, or non-spontaneous, typically following the input of some type of energy, such as heat, light or electricity...
partners in anorganic or biological settings. Biological effects and thus also similarity of effects are usually quantified using the biological activity
Biological activity
In pharmacology, biological activity or pharmacological activity describes the beneficial or adverse effects of a drug on living matter. When a drug is a complex chemical mixture, this activity is exerted by the substance's active ingredient or pharmacophore but can be modified by the other...
of a compound. In general terms, function can be related to the chemical activity of compounds (among others).
The notion of chemical similarity (or molecular similarity) is one of the most important concepts in chemoinformatics. It plays an important role in modern approaches to predicting the properties of chemical compounds, designing chemicals with a predefined set of properties and, especially, in conducting drug design studies by screening large databases containing structures of available (or potentially available) chemicals. These studies are based on the similar property principle of Johnson and Maggiora, which states: similar compounds have similar properties.
Similarity Measures
Chemical similarity is often described as an inverseInverse element
In abstract algebra, the idea of an inverse element generalises the concept of a negation, in relation to addition, and a reciprocal, in relation to multiplication. The intuition is of an element that can 'undo' the effect of combination with another given element...
of a measure of distance
Distance
Distance is a numerical description of how far apart objects are. In physics or everyday discussion, distance may refer to a physical length, or an estimation based on other criteria . In mathematics, a distance function or metric is a generalization of the concept of physical distance...
in descriptor space. Distance measures can be classified into Euclidean measure
Euclidean distance
In mathematics, the Euclidean distance or Euclidean metric is the "ordinary" distance between two points that one would measure with a ruler, and is given by the Pythagorean formula. By using this formula as distance, Euclidean space becomes a metric space...
s and non-Euclidean measures depending on whether the triangle inequality
Triangle inequality
In mathematics, the triangle inequality states that for any triangle, the sum of the lengths of any two sides must be greater than or equal to the length of the remaining side ....
holds.
Similarity Search and Virtual Screening
The similarity-based virtual screeningVirtual screening
Virtual screening is a computational technique used in drug discovery research. By using computers, it deals with the quick search of large libraries of chemical structures in order to identify those structures which are most likely to bind to a drug target, typically a protein receptor or...
(a kind of ligand-based virtual screening) assumes that all compounds in a database that are similar to a query compound have similar biological activity. Although this hypothesis is not always valid, quite often the set of retrieved compounds is considerably enriched with actives. To achieve high efficacy of similarity-based screening of databases containing millions of compounds, molecular structures are usually represented by molecular screens (structural keys) or by fixed-size or variable-size molecular fingerprints. Molecular screens and fingerprints can contain both 2D- and 3D-information. However, the 2D-fingerprints, which are a kind of binary fragment descriptors, dominate in this area. Fragment-based structural keys, like MDL keys, are sufficiently good for handling small and medium-sized chemical databases, whereas processing of large databases is performed with fingerprints having much higher information density. Fragment-based Daylight, BCI, and UNITY 2D (Tripos) fingerprints are the best known examples. The most popular similarity measure for comparing chemical structures represented by means of fingerprints is the Tanimoto (or Jaccard) coefficient
Jaccard index
The Jaccard index, also known as the Jaccard similarity coefficient , is a statistic used for comparing the similarity and diversity of sample sets....
T. Two structures are usually considered similar if (for Daylight fingerprints).
External links
- Small Molecule Subgraph Detector (SMSD) - is a Java based software library for calculating Maximum Common Subgraph (MCS) between small molecules. This will help us to find similarity/distance between two molecules. MCS is also used for screening drug like compounds by hitting molecules, which share common subgraph (substructure).
- Chemical Similarity (QSAR World)
- Similarity Principle
- Fingerprint-based Similarity used in QSAR Modeling
- Brutus is a similarity analysis tool based on molecular interaction fields.