Graph canonization
Encyclopedia
In graph theory
Graph theory
In mathematics and computer science, graph theory is the study of graphs, mathematical structures used to model pairwise relations between objects from a certain collection. A "graph" in this context refers to a collection of vertices or 'nodes' and a collection of edges that connect pairs of...

, a branch of mathematics, graph canonization is finding a canonical form
Canonical form
Generally, in mathematics, a canonical form of an object is a standard way of presenting that object....

 of a graph
G, which is a graph Canon(G) isomorphic
Graph isomorphism
In graph theory, an isomorphism of graphs G and H is a bijection between the vertex sets of G and H f \colon V \to V \,\!such that any two vertices u and v of G are adjacent in G if and only if ƒ and ƒ are adjacent in H...

 to G such that Canon(H) = Canon(G) if and only if H is isomorphic to G. The canonical form of a graph is an example of a complete graph invariant. Since the vertex sets of (finite) graphs are commonly identified with the intervals of integers 1,..., n, where n is the number of the vertices of a graph, a canonical form of a graph is commonly called canonical labeling of a graph. Graph canonization is also sometimes known as graph canonicalization.

A commonly known canonical form is the lexicographically smallest
Lexicographical order
In mathematics, the lexicographic or lexicographical order, , is a generalization of the way the alphabetical order of words is based on the alphabetical order of letters.-Definition:Given two partially ordered sets A and B, the lexicographical order on...

 graph within the isomorphism class
Isomorphism class
An isomorphism class is a collection of mathematical objects isomorphic to each other.Isomorphism classes are often defined if the exact identity of the elements of the set is considered irrelevant, and the properties of the structure of the mathematical object are studied. Examples of this are...

, which is the graph of the class with lexicographically smallest adjacency matrix
Adjacency matrix
In mathematics and computer science, an adjacency matrix is a means of representing which vertices of a graph are adjacent to which other vertices...

 considered as a linear string.

Computational complexity

Clearly, the graph canonization problem is at least as computationally hard
Computational Complexity
Computational Complexity may refer to:*Computational complexity theory*Computational Complexity...

 as the graph isomorphism problem
Graph isomorphism problem
The graph isomorphism problem is the computational problem of determining whether two finite graphs are isomorphic.Besides its practical importance, the graph isomorphism problem is a curiosity in computational complexity theory as it is one of a very small number of problems belonging to NP...

. In fact, Graph Isomorphism is even AC0
AC0
AC0 is a complexity class used in circuit complexity. It is the smallest class in the AC hierarchy, and consists of all families of circuits of depth O and polynomial size, with unlimited-fanin AND gates and OR gates....

-reducible to Graph Canonization. However it is still an open question whether the two problems are polynomial time equivalent.

While existence of (deterministic) polynomial algorithms for Graph Isomorphism is still an open problem in the computational complexity theory
Computational complexity theory
Computational complexity theory is a branch of the theory of computation in theoretical computer science and mathematics that focuses on classifying computational problems according to their inherent difficulty, and relating those classes to each other...

, in 1977 Laszlo Babai
László Babai
László Babai is a Hungarian professor of mathematics and computer science at the University of Chicago. His research focuses on computational complexity theory, algorithms, combinatorics, and finite groups, with an emphasis on the interactions between these fields...

 reported that a simple vertex classification algorithm after only two refinement steps produces a canonical labeling of an n-vertex random graph with probability 1 − exp(−O(n)). Small modifications and added depth-first search
Depth-first search
Depth-first search is an algorithm for traversing or searching a tree, tree structure, or graph. One starts at the root and explores as far as possible along each branch before backtracking....

 step produce canonical labeling of all graphs in linear average time. This result sheds some light on the fact why many reported graph isomorphism algorithms behave well in practice. This was an important breakthrough in probabilistic complexity theory which became widely known in its manuscript form and which was still cited as an "unpublished manuscript" long after it was reported at a symposium.

The computation of the lexicographically smallest graph is NP-Hard
NP-hard
NP-hard , in computational complexity theory, is a class of problems that are, informally, "at least as hard as the hardest problems in NP". A problem H is NP-hard if and only if there is an NP-complete problem L that is polynomial time Turing-reducible to H...

.

Applications

Graph canonization is the essence of many graph isomorphism algorithms.

A common application of graph canonization is in graphical data mining
Data mining
Data mining , a relatively young and interdisciplinary field of computer science is the process of discovering new patterns from large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics and database systems...

, in particular in chemical database
Chemical database
A chemical database is a database specifically designed to store chemical information. This information is about chemical and crystal structures, spectra, reactions and syntheses, and thermophysical data.- Chemical structures :...

 applications.

A number of identifier
Identifier
An identifier is a name that identifies either a unique object or a unique class of objects, where the "object" or class may be an idea, physical [countable] object , or physical [noncountable] substance...

s for chemical substance
Chemical substance
In chemistry, a chemical substance is a form of matter that has constant chemical composition and characteristic properties. It cannot be separated into components by physical separation methods, i.e. without breaking chemical bonds. They can be solids, liquids or gases.Chemical substances are...

s, such as SMILES and InChI, designed to provide a standard and human-readable way to encode molecular information and to facilitate the search for such information in databases and on the web, use canonization step in their computation, which is essentially the canonization of the graph which represents the molecule.
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK