Combinatorics on words
Encyclopedia
Combinatorics on words is a branch of mathematics which applies combinatorics
to words and formal language
s. The study of combinatorics on words arose independently within several branches of mathematics, e.g. number theory
, group theory
and probability
. It has applications to combinatorial enumeration and fractal analysis
and appears in problems of theoretical computer science
, automata theory
, symbolic dynamics
and linguistics
. While many applications are new, the classical Chomsky–Schützenberger hierarchy of classes of formal grammar
s is perhaps the best known result in the field. The development of computerized text and string processing has led to important applications of combinatorics on words. It is involved in the core algorithms for text processing, natural language processing
, speech processing
and bioinformatics
.
The study of combinatorics of words goes back to the works of Axel Thue
on nonrepetitive sequences of symbols at the beginning of the 20th century. In the years following there were a few researchers working on words, then major work began in the 1950s when Marcel-Paul Schützenberger
studied words in his work on coding theory
, and Pyotr Novikov and Sergei Adian
studied words in connection with Burnside's problem
for groups. The study of combinatorics on words really began to develop rapidly after the publication of Lothaire's book in 1983. Lothaire is a nom de plume of a group of researchers working in this area.
Combinatorics
Combinatorics is a branch of mathematics concerning the study of finite or countable discrete structures. Aspects of combinatorics include counting the structures of a given kind and size , deciding when certain criteria can be met, and constructing and analyzing objects meeting the criteria ,...
to words and formal language
Formal language
A formal language is a set of words—that is, finite strings of letters, symbols, or tokens that are defined in the language. The set from which these letters are taken is the alphabet over which the language is defined. A formal language is often defined by means of a formal grammar...
s. The study of combinatorics on words arose independently within several branches of mathematics, e.g. number theory
Number theory
Number theory is a branch of pure mathematics devoted primarily to the study of the integers. Number theorists study prime numbers as well...
, group theory
Group theory
In mathematics and abstract algebra, group theory studies the algebraic structures known as groups.The concept of a group is central to abstract algebra: other well-known algebraic structures, such as rings, fields, and vector spaces can all be seen as groups endowed with additional operations and...
and probability
Probability
Probability is ordinarily used to describe an attitude of mind towards some proposition of whose truth we arenot certain. The proposition of interest is usually of the form "Will a specific event occur?" The attitude of mind is of the form "How certain are we that the event will occur?" The...
. It has applications to combinatorial enumeration and fractal analysis
Fractal analysis
Fractal analysis is the modelling of data by fractals.It consists of methods to assign a fractal dimension and other fractal characteristics to a signal, dataset or object which may be sound, images, molecules, networks or other data....
and appears in problems of theoretical computer science
Theoretical computer science
Theoretical computer science is a division or subset of general computer science and mathematics which focuses on more abstract or mathematical aspects of computing....
, automata theory
Automata theory
In theoretical computer science, automata theory is the study of abstract machines and the computational problems that can be solved using these machines. These abstract machines are called automata...
, symbolic dynamics
Symbolic dynamics
In mathematics, symbolic dynamics is the practice of modeling a topological or smooth dynamical system by a discrete space consisting of infinite sequences of abstract symbols, each of which corresponds to a state of the system, with the dynamics given by the shift operator...
and linguistics
Linguistics
Linguistics is the scientific study of human language. Linguistics can be broadly broken into three categories or subfields of study: language form, language meaning, and language in context....
. While many applications are new, the classical Chomsky–Schützenberger hierarchy of classes of formal grammar
Formal grammar
A formal grammar is a set of formation rules for strings in a formal language. The rules describe how to form strings from the language's alphabet that are valid according to the language's syntax...
s is perhaps the best known result in the field. The development of computerized text and string processing has led to important applications of combinatorics on words. It is involved in the core algorithms for text processing, natural language processing
Natural language processing
Natural language processing is a field of computer science and linguistics concerned with the interactions between computers and human languages; it began as a branch of artificial intelligence....
, speech processing
Speech processing
Speech processing is the study of speech signals and the processing methods of these signals.The signals are usually processed in a digital representation, so speech processing can be regarded as a special case of digital signal processing, applied to speech signal.It is also closely tied to...
and bioinformatics
Bioinformatics
Bioinformatics is the application of computer science and information technology to the field of biology and medicine. Bioinformatics deals with algorithms, databases and information systems, web technologies, artificial intelligence and soft computing, information and computation theory, software...
.
The study of combinatorics of words goes back to the works of Axel Thue
Axel Thue
Axel Thue was a Norwegian mathematician, known for highly original work in diophantine approximation, and combinatorics....
on nonrepetitive sequences of symbols at the beginning of the 20th century. In the years following there were a few researchers working on words, then major work began in the 1950s when Marcel-Paul Schützenberger
Marcel-Paul Schützenberger
Marcel-Paul "Marco" Schützenberger was a French mathematician and Doctor of Medicine. His work had impact across the fields of formal language, combinatorics, and information theory...
studied words in his work on coding theory
Coding theory
Coding theory is the study of the properties of codes and their fitness for a specific application. Codes are used for data compression, cryptography, error-correction and more recently also for network coding...
, and Pyotr Novikov and Sergei Adian
Sergei Adian
Sergei Ivanovich Adian, also Adjan is one of the most prominent Soviet and Russian mathematicians. He is a professor at the Moscow State University. He is most famous for his work in group theory, especially on the Burnside problem.-Biography:...
studied words in connection with Burnside's problem
Burnside's problem
The Burnside problem, posed by William Burnside in 1902 and one of the oldest and most influential questions in group theory, asks whether a finitely generated group in which every element has finite order must necessarily be a finite group...
for groups. The study of combinatorics on words really began to develop rapidly after the publication of Lothaire's book in 1983. Lothaire is a nom de plume of a group of researchers working in this area.
See also
- Fibonacci wordFibonacci wordthumb|350px|Characterization by a [[cutting sequence]] with a line of slope 1/\varphi or \varphi-1, with \varphi the [[golden ratio]].A Fibonacci word is a specific sequence of binary digits...
- Kolakoski sequenceKolakoski sequenceIn mathematics, the Kolakoski sequence is an infinite list that begins withThe rules for generating this sequence are as follows: 1) The only numbers allowed are 1 and 2 2) Each number tells us how many numbers to append to the sequence...
- Lyndon wordLyndon wordIn mathematics, in the areas of combinatorics and computer science, a Lyndon word is a string that is strictly smaller in lexicographic order than all of its rotations...
- M. LothaireM. LothaireM. Lothaire is the pseudonym of a group of mathematicians, many of whom were students of Marcel-Paul Schützenberger. The name is used as the author of several of their joint books about combinatorics on words...
- Necklace (combinatorics)Necklace (combinatorics)In combinatorics, a k-ary necklace of length n is an equivalence class of n-character strings over an alphabet of size k, taking all rotations as equivalent...
- Partial word
- Shift spaceShift spaceIn symbolic dynamics and related branches of mathematics, a shift space or subshift is a set of infinite words representing the evolution of a discrete system. In fact, shift spaces and symbolic dynamical systems are often considered synonyms....
- Squarefree word
- Sturmian wordSturmian wordIn mathematics, a Sturmian word, named after Jacques Charles François Sturm, is a certain kind of infinite word.- Definition :A word is a infinite sequence of symbols drawn from a finite alphabet. Call any finite contiguous subsequence of a word a factor...
- Thue-Morse sequenceThue-Morse sequenceIn mathematics, the Thue–Morse sequence, or Prouhet–Thue–Morse sequence, is a binary sequence that begins:Any other ordered pair of symbols may be used instead of 0 and 1; the logical structure of the Thue–Morse sequence does not depend on the symbols that are used to represent it.- Direct...
- Word (group theory)
- Word metric
- Word problem (computability)
- Word problem (mathematics)Word problem (mathematics)In mathematics and computer science, a word problem for a set S with respect to a system of finite encodings of its elements is the algorithmic problem of deciding whether two given representatives represent the same element of the set...
- Word problem for groupsWord problem for groupsIn mathematics, especially in the area of abstract algebra known as combinatorial group theory, the word problem for a finitely generated group G is the algorithmic problem of deciding whether two words in the generators represent the same element...
- Young-Fibonacci lattice