Formal language
Encyclopedia
A formal language is a set of words—that is, finite strings
of letters, symbols, or tokens
that are defined in the language. The set from which these letters are taken is the alphabet over which the language is defined. A formal language is often defined by means of a formal grammar
(also called its formation rule
s). Words that belong to a formal language are sometimes called wellformed words or wellformed formula
s.
Formal languages are studied in computer science
and linguistics
. The field of formal language theory studies the purely syntactical aspects of such languages—that is, their internal structural patterns.
Formal languages are often used as the basis for richer constructs endowed with semantics
. In computer science they are used, among other things, for the precise definition of data formats and the syntax of programming language
s. Formal languages play a crucial role in the development of compiler
s, typically produced by means of a compiler compiler, which may be a single program or may be separated in tools like lexical analyzer generators (e.g.
), and parser generators (e.g.
and in foundations of mathematics
to represent the syntax of formal theories. Logical systems can be seen as a formal language with additional constructs, like proof calculi, which define a consequence relation. Alfred Tarski
's definition of truth, in terms of a Tschema
for firstorder logic
, is an example of fully interpreted formal language: all its sentences have meanings that make them either true or false.
In less technical contexts, the term artificial language is sometimes used to denote a formal language, although the latter term can also refer to artificially constructed natural language
s.
in his Begriffsschrift
(1879), literally meaning "concept writing", and which Frege described as a "formal language of pure thought."
in the usual sense of the word, or more generally a character set such as ASCII
. Alphabets can also be infinite; e.g. firstorder logic
is often expressed using an alphabet which, besides symbols such as ∧, ¬, ∀ and parentheses, contains infinitely many elements x_{0}, x_{1}, x_{2}, … that play the role of variables. The elements of an alphabet are called its letters.
A word over an alphabet can be any finite sequence, or string
, of letters. The set of all words over an alphabet Σ is usually denoted by Σ^{*} (using the Kleene star
). For any alphabet there is only one word of length 0, the empty word, which is often denoted by e, ε or λ. By concatenation
one can combine two words to form a new word, whose length is the sum of the lengths of the original words. The result of concatenating a word with the empty word is the original word.
In some applications, especially in logic
, the alphabet is also known as the vocabulary and words are known as formulas or sentences; this breaks the letter/word metaphor and replaces it by a word/sentence metaphor.
of Σ^{*}, that is, a set of words over that alphabet.
In computer science and mathematics, which do not usually deal with natural language
s, the adjective "formal" is often omitted as redundant.
While formal language theory usually concerns itself with formal languages that are described by some syntactical rules, the actual definition of the concept "formal language" is only as above: a (possibly infinite) set of finitelength strings, no more nor less. In practice, there are many languages that can be described by rules, such as regular language
s or contextfree language
s. The notion of a formal grammar
may be closer to the intuitive concept of a "language," one described by syntactic rules. By an abuse of the definition, a particular formal language is often thought of as being equipped with a formal grammar that describes it.
Under these rules, the string "23+4=555" is in L, but the string "=234=+" is not. This formal language expresses natural number
s, wellformed addition statements, and wellformed addition equalities, but it expresses only what they look like (their syntax
), not what they mean (semantics
). For instance, nowhere in these rules is there any indication that 0 means the number zero, or that + means addition.
For finite languages one can simply enumerate all wellformed words. For example, we can describe a language L as just L = {"a", "b", "ab", "cba"}.
However, even over a finite (nonempty) alphabet such as Σ = {a, b} there are infinitely many words: "a", "abb", "ababba", "aaababbbbaab", …. Therefore formal languages are typically infinite, and describing an infinite formal language is not as simple as writing L = {"a", "b", "ab", "cba"}. Here are some examples of formal languages:
Typical questions asked about such formalisms include:
Surprisingly often, the answer to these decision problems is "it cannot be done at all", or "it is extremely expensive" (with a characterization of how expensive). Therefore, formal language theory is a major application area of computability theory
and complexity theory
. Formal languages may be classified in the Chomsky hierarchy
based on the expressive power of their generative grammar as well as the complexity of their recognizing automaton
. Contextfree grammar
s and regular grammar
s provide a good compromise between expressivity and ease of parsing
, and are widely used in practical applications.
Examples: suppose L_{1} and L_{2} are languages over some common alphabet.
Such string operations
are used to investigate closure properties
of classes of languages. A class of languages is closed under a particular operation when the operation, applied to languages in the class, always produces a language in the same class again. For instance, the contextfree language
s are known to be closed under union, concatenation, and intersection with regular language
s, but not closed under intersection or complement. The theory of trios
and abstract families of languages
studies the most common closure properties of language families in their own right.
, identifies the tokens of the programming language grammar, e.g. identifier
s or keyword
s, which are themselves expressed in a simpler formal language, usually by means of regular expressions. At the most basic conceptual level, a parser, usually generated by a parser generator like
, which is used by subsequent stages of the compiler to eventually generate an executable
containing machine code
that runs directly on the hardware, or some intermediate code that requires a virtual machine
to execute.
, a formal theory is a set of sentence
s expressed in a formal language.
A formal system (also called a logical calculus, or a logical system) consists of a formal language together with a deductive apparatus (also called a deductive system). The deductive apparatus may consist of a set of transformation rules which may be interpreted as valid rules of inference or a set of axiom
s, or have both. A formal system is used to derive
one expression from one or more other expressions. Although a formal language can be identified with its formulas, a formal system cannot be likewise identified by its theorems. Two formal systems and may have all the same theorems and yet differ in some significant prooftheoretic way (a formula A may be a syntactic consequence of a formula B in one but not another for instance).
A formal proof or derivation is a finite sequence of wellformed formulas (which may be interpreted as proposition
s) each of which is an axiom or follows from the preceding formulas in the sequence by a rule of inference
. The last sentence in the sequence is a theorem of a formal system. Formal proofs are useful because their theorems can be interpreted as true propositions.
that give meaning to the elements of the language. For instance, in mathematical logic
, the set of possible formulas of a particular logic is a formal language, and an interpretation
assigns a meaning to each of the formulas—usually, a truth value.
The study of interpretations of formal languages is called formal semantics. In mathematical logic, this is often done in terms of model theory
. In model theory, the terms that occur in a formula are interpreted as mathematical structures
, and fixed compositional interpretation rules determine how the truth value of the formula can be derived from the interpretation of its terms; a model for a formula is an interpretation of terms such that the formula becomes true.
String (computer science)
In formal languages, which are used in mathematical logic and theoretical computer science, a string is a finite sequence of symbols that are chosen from a set or alphabet....
of letters, symbols, or tokens
Symbol (formal)
For other uses see Symbol In logic, symbols build literal utility to illustrate ideas. A symbol is an abstraction, tokens of which may be marks or a configuration of marks which form a particular pattern...
that are defined in the language. The set from which these letters are taken is the alphabet over which the language is defined. A formal language is often defined by means of a formal grammar
Formal grammar
A formal grammar is a set of formation rules for strings in a formal language. The rules describe how to form strings from the language's alphabet that are valid according to the language's syntax...
(also called its formation rule
Formation rule
In mathematical logic, formation rules are rules for describing which strings of symbols formed from the alphabet of a formal language are syntactically valid within the language. These rules only address the location and manipulation of the strings of the language. It does not describe anything...
s). Words that belong to a formal language are sometimes called wellformed words or wellformed formula
Wellformed formula
In mathematical logic, a wellformed formula, shortly wff, often simply formula, is a word which is part of a formal language...
s.
Formal languages are studied in computer science
Computer science
Computer science or computing science is the study of the theoretical foundations of information and computation and of practical techniques for their implementation and application in computer systems...
and linguistics
Linguistics
Linguistics is the scientific study of human language. Linguistics can be broadly broken into three categories or subfields of study: language form, language meaning, and language in context....
. The field of formal language theory studies the purely syntactical aspects of such languages—that is, their internal structural patterns.
Formal languages are often used as the basis for richer constructs endowed with semantics
Semantics
Semantics is the study of meaning. It focuses on the relation between signifiers, such as words, phrases, signs and symbols, and what they stand for, their denotata....
. In computer science they are used, among other things, for the precise definition of data formats and the syntax of programming language
Programming language
A programming language is an artificial language designed to communicate instructions to a machine, particularly a computer. Programming languages can be used to create programs that control the behavior of a machine and/or to express algorithms precisely....
s. Formal languages play a crucial role in the development of compiler
Compiler
A compiler is a computer program that transforms source code written in a programming language into another computer language...
s, typically produced by means of a compiler compiler, which may be a single program or may be separated in tools like lexical analyzer generators (e.g.
lex
Lex programming tool
Lex is a computer program that generates lexical analyzers . Lex is commonly used with the yacc parser generator. Lex, originally written by Mike Lesk and Eric Schmidt, is the standard lexical analyzer generator on many Unix systems, and a tool exhibiting its behavior is specified as part of the...
), and parser generators (e.g.
yaccYaccThe computer program yacc is a parser generator developed by Stephen C. Johnson at AT&T for the Unix operating system. The name is an acronym for "Yet Another Compiler Compiler." It generates a parser based on an analytic grammar written in a notation similar to BNF.Yacc used to be available as...
). Since formal languages alone do not have semantics, other formal constructs are needed for the formal specification of program semantics. Formal languages are also used in logicLogic
In philosophy, Logic is the formal systematic study of the principles of valid inference and correct reasoning. Logic is used in most intellectual activities, but is studied primarily in the disciplines of philosophy, mathematics, semantics, and computer science...
and in foundations of mathematics
Foundations of mathematics
Foundations of mathematics is a term sometimes used for certain fields of mathematics, such as mathematical logic, axiomatic set theory, proof theory, model theory, type theory and recursion theory...
to represent the syntax of formal theories. Logical systems can be seen as a formal language with additional constructs, like proof calculi, which define a consequence relation. Alfred Tarski
Alfred Tarski
Alfred Tarski was a Polish logician and mathematician. Educated at the University of Warsaw and a member of the LwowWarsaw School of Logic and the Warsaw School of Mathematics and philosophy, he emigrated to the USA in 1939, and taught and carried out research in mathematics at the University of...
's definition of truth, in terms of a Tschema
Tschema
The Tschema or truth schema is used to give an inductive definition of truth which lies at the heart of any realisation of Alfred Tarski's semantic theory of truth...
for firstorder logic
Firstorder logic
Firstorder logic is a formal logical system used in mathematics, philosophy, linguistics, and computer science. It goes by many names, including: firstorder predicate calculus, the lower predicate calculus, quantification theory, and predicate logic...
, is an example of fully interpreted formal language: all its sentences have meanings that make them either true or false.
In less technical contexts, the term artificial language is sometimes used to denote a formal language, although the latter term can also refer to artificially constructed natural language
Constructed language
A planned or constructed language—known colloquially as a conlang—is a language whose phonology, grammar, and/or vocabulary has been consciously devised by an individual or group, instead of having evolved naturally...
s.
History
The first formal language is thought be the one used by Gottlob FregeGottlob Frege
Friedrich Ludwig Gottlob Frege was a German mathematician, logician and philosopher. He is considered to be one of the founders of modern logic, and made major contributions to the foundations of mathematics. He is generally considered to be the father of analytic philosophy, for his writings on...
in his Begriffsschrift
Begriffsschrift
Begriffsschrift is a book on logic by Gottlob Frege, published in 1879, and the formal system set out in that book...
(1879), literally meaning "concept writing", and which Frege described as a "formal language of pure thought."
Words over an alphabet
An alphabet, in the context of formal languages can be any set, although it often makes sense to use an alphabetAlphabet
An alphabet is a standard set of letters—basic written symbols or graphemes—each of which represents a phoneme in a spoken language, either as it exists now or as it was in the past. There are other systems, such as logographies, in which each character represents a word, morpheme, or semantic...
in the usual sense of the word, or more generally a character set such as ASCII
ASCII
The American Standard Code for Information Interchange is a characterencoding scheme based on the ordering of the English alphabet. ASCII codes represent text in computers, communications equipment, and other devices that use text...
. Alphabets can also be infinite; e.g. firstorder logic
Firstorder logic
Firstorder logic is a formal logical system used in mathematics, philosophy, linguistics, and computer science. It goes by many names, including: firstorder predicate calculus, the lower predicate calculus, quantification theory, and predicate logic...
is often expressed using an alphabet which, besides symbols such as ∧, ¬, ∀ and parentheses, contains infinitely many elements x_{0}, x_{1}, x_{2}, … that play the role of variables. The elements of an alphabet are called its letters.
A word over an alphabet can be any finite sequence, or string
String (computer science)
In formal languages, which are used in mathematical logic and theoretical computer science, a string is a finite sequence of symbols that are chosen from a set or alphabet....
, of letters. The set of all words over an alphabet Σ is usually denoted by Σ^{*} (using the Kleene star
Kleene star
In mathematical logic and computer science, the Kleene star is a unary operation, either on sets of strings or on sets of symbols or characters. The application of the Kleene star to a set V is written as V*...
). For any alphabet there is only one word of length 0, the empty word, which is often denoted by e, ε or λ. By concatenation
Concatenation
In computer programming, string concatenation is the operation of joining two character strings endtoend. For example, the strings "snow" and "ball" may be concatenated to give "snowball"...
one can combine two words to form a new word, whose length is the sum of the lengths of the original words. The result of concatenating a word with the empty word is the original word.
In some applications, especially in logic
Logic
In philosophy, Logic is the formal systematic study of the principles of valid inference and correct reasoning. Logic is used in most intellectual activities, but is studied primarily in the disciplines of philosophy, mathematics, semantics, and computer science...
, the alphabet is also known as the vocabulary and words are known as formulas or sentences; this breaks the letter/word metaphor and replaces it by a word/sentence metaphor.
Definition
A formal language L over an alphabet Σ is a subsetSubset
In mathematics, especially in set theory, a set A is a subset of a set B if A is "contained" inside B. A and B may coincide. The relationship of one set being a subset of another is called inclusion or sometimes containment...
of Σ^{*}, that is, a set of words over that alphabet.
In computer science and mathematics, which do not usually deal with natural language
Natural language
In the philosophy of language, a natural language is any language which arises in an unpremeditated fashion as the result of the innate facility for language possessed by the human intellect. A natural language is typically used for communication, and may be spoken, signed, or written...
s, the adjective "formal" is often omitted as redundant.
While formal language theory usually concerns itself with formal languages that are described by some syntactical rules, the actual definition of the concept "formal language" is only as above: a (possibly infinite) set of finitelength strings, no more nor less. In practice, there are many languages that can be described by rules, such as regular language
Regular language
In theoretical computer science and formal language theory, a regular language is a formal language that can be expressed using regular expression....
s or contextfree language
Contextfree language
In formal language theory, a contextfree language is a language generated by some contextfree grammar. The set of all contextfree languages is identical to the set of languages accepted by pushdown automata.Examples:...
s. The notion of a formal grammar
Formal grammar
A formal grammar is a set of formation rules for strings in a formal language. The rules describe how to form strings from the language's alphabet that are valid according to the language's syntax...
may be closer to the intuitive concept of a "language," one described by syntactic rules. By an abuse of the definition, a particular formal language is often thought of as being equipped with a formal grammar that describes it.
Examples
The following rules describe a formal language L over the alphabet Σ = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, +, =}: Every nonempty string that does not contain + or = and does not start with 0 is in L.
 The string 0 is in L.
 A string containing = is in L if and only if there is exactly one =, and it separates two valid strings in L.
 A string containing + but not = is in L if and only if every + in the string separates two valid strings in L.
 No string is in L other than those implied by the previous rules.
Under these rules, the string "23+4=555" is in L, but the string "=234=+" is not. This formal language expresses natural number
Natural number
In mathematics, the natural numbers are the ordinary whole numbers used for counting and ordering . These purposes are related to the linguistic notions of cardinal and ordinal numbers, respectively...
s, wellformed addition statements, and wellformed addition equalities, but it expresses only what they look like (their syntax
Syntax
In linguistics, syntax is the study of the principles and rules for constructing phrases and sentences in natural languages....
), not what they mean (semantics
Semantics
Semantics is the study of meaning. It focuses on the relation between signifiers, such as words, phrases, signs and symbols, and what they stand for, their denotata....
). For instance, nowhere in these rules is there any indication that 0 means the number zero, or that + means addition.
For finite languages one can simply enumerate all wellformed words. For example, we can describe a language L as just L = {"a", "b", "ab", "cba"}.
However, even over a finite (nonempty) alphabet such as Σ = {a, b} there are infinitely many words: "a", "abb", "ababba", "aaababbbbaab", …. Therefore formal languages are typically infinite, and describing an infinite formal language is not as simple as writing L = {"a", "b", "ab", "cba"}. Here are some examples of formal languages:
 L = Σ^{*}, the set of all words over Σ;
 L = {a}^{*} = {a^{n}}, where n ranges over the natural numbers and a^{n} means "a" repeated n times (this is the set of words consisting only of the symbol "a");
 the set of syntactically correct programs in a given programming language (the syntax of which is usually defined by a contextfree grammarContextfree grammarIn formal language theory, a contextfree grammar is a formal grammar in which every production rule is of the formwhere V is a single nonterminal symbol, and w is a string of terminals and/or nonterminals ....
);  the set of inputs upon which a certain Turing machineTuring machineA Turing machine is a theoretical device that manipulates symbols on a strip of tape according to a table of rules. Despite its simplicity, a Turing machine can be adapted to simulate the logic of any computer algorithm, and is particularly useful in explaining the functions of a CPU inside a...
halts; or  the set of maximal strings of alphanumericAlphanumericAlphanumeric is a combination of alphabetic and numeric characters, and is used to describe the collection of Latin letters and Arabic digits or a text constructed from this collection. There are either 36 or 62 alphanumeric characters. The alphanumeric character set consists of the numbers 0 to...
ASCIIASCIIThe American Standard Code for Information Interchange is a characterencoding scheme based on the ordering of the English alphabet. ASCII codes represent text in computers, communications equipment, and other devices that use text...
characters on this line, (i.e., the set {"the", "set", "of", "maximal", "strings", "alphanumeric", "ASCII", "characters", "on", "this", "line", "i", "e"}).
Languagespecification formalisms
Formal language theory rarely concerns itself with particular languages (except as examples), but is mainly concerned with the study of various types of formalisms to describe languages. For instance, a language can be given as those strings generated by some formal grammarFormal grammarA formal grammar is a set of formation rules for strings in a formal language. The rules describe how to form strings from the language's alphabet that are valid according to the language's syntax...
;  those strings described or matched by a particular regular expressionRegular expressionIn computing, a regular expression provides a concise and flexible means for "matching" strings of text, such as particular characters, words, or patterns of characters. Abbreviations for "regular expression" include "regex" and "regexp"...
;  those strings accepted by some automatonAutomata theoryIn theoretical computer science, automata theory is the study of abstract machines and the computational problems that can be solved using these machines. These abstract machines are called automata...
, such as a Turing machineTuring machineA Turing machine is a theoretical device that manipulates symbols on a strip of tape according to a table of rules. Despite its simplicity, a Turing machine can be adapted to simulate the logic of any computer algorithm, and is particularly useful in explaining the functions of a CPU inside a...
or finite state automatonFinite state machineA finitestate machine or finitestate automaton , or simply a state machine, is a mathematical model used to design computer programs and digital logic circuits. It is conceived as an abstract machine that can be in one of a finite number of states...
;  those strings for which some decision procedureDecision problemIn computability theory and computational complexity theory, a decision problem is a question in some formal system with a yesorno answer, depending on the values of some input parameters. For example, the problem "given two numbers x and y, does x evenly divide y?" is a decision problem...
(an algorithmAlgorithmIn mathematics and computer science, an algorithm is an effective method expressed as a finite list of welldefined instructions for calculating a function. Algorithms are used for calculation, data processing, and automated reasoning...
that asks a sequence of related YES/NO questions) produces the answer YES.
Typical questions asked about such formalisms include:
 What is their expressive power? (Can formalism X describe every language that formalism Y can describe? Can it describe other languages?)
 What is their recognizability? (How difficult is it to decide whether a given word belongs to a language described by formalism X?)
 What is their comparability? (How difficult is it to decide whether two languages, one described in formalism X and one in formalism Y, or in X again, are actually the same language?).
Surprisingly often, the answer to these decision problems is "it cannot be done at all", or "it is extremely expensive" (with a characterization of how expensive). Therefore, formal language theory is a major application area of computability theory
Computability theory (computer science)
Computability is the ability to solve a problem in an effective manner. It is a key topic of the field of computability theory within mathematical logic and the theory of computation within computer science...
and complexity theory
Computational complexity theory
Computational complexity theory is a branch of the theory of computation in theoretical computer science and mathematics that focuses on classifying computational problems according to their inherent difficulty, and relating those classes to each other...
. Formal languages may be classified in the Chomsky hierarchy
Chomsky hierarchy
Within the field of computer science, specifically in the area of formal languages, the Chomsky hierarchy is a containment hierarchy of classes of formal grammars....
based on the expressive power of their generative grammar as well as the complexity of their recognizing automaton
Automata theory
In theoretical computer science, automata theory is the study of abstract machines and the computational problems that can be solved using these machines. These abstract machines are called automata...
. Contextfree grammar
Contextfree grammar
In formal language theory, a contextfree grammar is a formal grammar in which every production rule is of the formwhere V is a single nonterminal symbol, and w is a string of terminals and/or nonterminals ....
s and regular grammar
Regular grammar
In theoretical computer science, a regular grammar is a formal grammar that describes a regular language. Strictly regular grammars :A right regular grammar is a formal grammar such that all the production rules in P are of one of the following forms:# B → a  where B is a nonterminal in N and...
s provide a good compromise between expressivity and ease of parsing
Parsing
In computer science and linguistics, parsing, or, more formally, syntactic analysis, is the process of analyzing a text, made of a sequence of tokens , to determine its grammatical structure with respect to a given formal grammar...
, and are widely used in practical applications.
Operations on languages
Certain operations on languages are common. This includes the standard set operations, such as union, intersection, and complement. Another class of operation is the elementwise application of string operations.Examples: suppose L_{1} and L_{2} are languages over some common alphabet.
 The concatenationConcatenationIn computer programming, string concatenation is the operation of joining two character strings endtoend. For example, the strings "snow" and "ball" may be concatenated to give "snowball"...
L_{1}L_{2} consists of all strings of the form vw where v is a string from L_{1} and w is a string from L_{2}.  The intersection L_{1} ∩ L_{2} of L_{1} and L_{2} consists of all strings which are contained in both languages
 The complement ¬L of a language with respect to a given alphabet consists of all strings over the alphabet that are not in the language.
 The Kleene starKleene starIn mathematical logic and computer science, the Kleene star is a unary operation, either on sets of strings or on sets of symbols or characters. The application of the Kleene star to a set V is written as V*...
: the language consisting of all words that are concatenations of 0 or more words in the original language;  Reversal:
 Let e be the empty word, then e^{R} = e, and
 for each nonempty word w = x_{1}…x_{n} over some alphabet, let w^{R} = x_{n}…x_{1},
 then for a formal language L, L^{R} = {w^{R}  w ∈ L}.
 String homomorphism
Such string operations
String operations
In computer science, in the area of formal language theory, frequent use is made of a variety of string functions; however, the notation used is different from that used on computer programming, and some commonly used functions in the theoretical realm are rarely used when programming...
are used to investigate closure properties
Closure (mathematics)
In mathematics, a set is said to be closed under some operation if performance of that operation on members of the set always produces a unique member of the same set. For example, the real numbers are closed under subtraction, but the natural numbers are not: 3 and 8 are both natural numbers, but...
of classes of languages. A class of languages is closed under a particular operation when the operation, applied to languages in the class, always produces a language in the same class again. For instance, the contextfree language
Contextfree language
In formal language theory, a contextfree language is a language generated by some contextfree grammar. The set of all contextfree languages is identical to the set of languages accepted by pushdown automata.Examples:...
s are known to be closed under union, concatenation, and intersection with regular language
Regular language
In theoretical computer science and formal language theory, a regular language is a formal language that can be expressed using regular expression....
s, but not closed under intersection or complement. The theory of trios
Cone (formal languages)
In formal language theory, a cone is a set of formal languages that has some desirable closure properties enjoyed by some wellknown sets of languages, in particular by the families of regular languages, contextfree languages and the recursive languages...
and abstract families of languages
Abstract family of languages
In computer science, in particular in the field of formal language theory,the term abstract family of languages refers to an abstract mathematical notion generalizing characteristics common to the regular languages, the contextfree languages and the recursively enumerable languages, and other...
studies the most common closure properties of language families in their own right.
Operation  Regular  DCFL Deterministic contextfree language A deterministic contextfree language is a formal language which is defined by a deterministic contextfree grammar. The set of deterministic contextfree languages is called DCFL and is identical to the set of languages accepted by a deterministic pushdown automaton.The set of deterministic... 
CFL  IND Indexed language Indexed languages are a class of formal languages discovered by Alfred Aho; they are described by indexed grammars and can be recognized by nested stack automatons.... 
CSL  recursive Recursive language In mathematics, logic and computer science, a formal language is called recursive if it is a recursive subset of the set of all possible finite sequences over the alphabet of the language... 
RE Recursively enumerable language In mathematics, logic and computer science, a recursively enumerable language is a type of formal language which is also called partially decidable or Turingacceptable. It is known as a type0 language in the Chomsky hierarchy of formal languages... 


Union Union (set theory) In set theory, the union of a collection of sets is the set of all distinct elements in the collection. The union of a collection of sets S_1, S_2, S_3, \dots , S_n\,\! gives a set S_1 \cup S_2 \cup S_3 \cup \dots \cup S_n. Definition :... 

Intersection Intersection (set theory) In mathematics, the intersection of two sets A and B is the set that contains all elements of A that also belong to B , but no other elements.... 

Complement Complement (set theory) In set theory, a complement of a set A refers to things not in , A. The relative complement of A with respect to a set B, is the set of elements in B but not in A... 

Concatenation Concatenation In computer programming, string concatenation is the operation of joining two character strings endtoend. For example, the strings "snow" and "ball" may be concatenated to give "snowball"... 

Kleene star  
Homomorphism  
efree Homomorphism  
Substitution  
Inverse Homomorphism  
Reverse  
Intersection with a regular language Regular language In theoretical computer science and formal language theory, a regular language is a formal language that can be expressed using regular expression.... 
Programming languages
A compiler usually has two distinct components. A lexical analyzer, generated by a tool likelex
Lex programming tool
Lex is a computer program that generates lexical analyzers . Lex is commonly used with the yacc parser generator. Lex, originally written by Mike Lesk and Eric Schmidt, is the standard lexical analyzer generator on many Unix systems, and a tool exhibiting its behavior is specified as part of the...
, identifies the tokens of the programming language grammar, e.g. identifier
Identifier
An identifier is a name that identifies either a unique object or a unique class of objects, where the "object" or class may be an idea, physical [countable] object , or physical [noncountable] substance...
s or keyword
Keyword (computer programming)
In computer programming, a keyword is a word or identifier that has a particular meaning to the programming language. The meaning of keywords — and, indeed, the meaning of the notion of keyword — differs widely from language to language....
s, which are themselves expressed in a simpler formal language, usually by means of regular expressions. At the most basic conceptual level, a parser, usually generated by a parser generator like
yaccYaccThe computer program yacc is a parser generator developed by Stephen C. Johnson at AT&T for the Unix operating system. The name is an acronym for "Yet Another Compiler Compiler." It generates a parser based on an analytic grammar written in a notation similar to BNF.Yacc used to be available as...
, attempts to decide if the source program is valid, that is if it belongs to the programming language for which the compiler was built. Of course, compilers do more than just parse the source code—they usually translate it into some executable format. Because of this, a parser usually outputs more than a yes/no answer, typically an abstract syntax treeAbstract syntax tree
In computer science, an abstract syntax tree , or just syntax tree, is a tree representation of the abstract syntactic structure of source code written in a programming language. Each node of the tree denotes a construct occurring in the source code. The syntax is 'abstract' in the sense that it...
, which is used by subsequent stages of the compiler to eventually generate an executable
Executable
In computing, an executable file causes a computer "to perform indicated tasks according to encoded instructions," as opposed to a data file that must be parsed by a program to be meaningful. These instructions are traditionally machine code instructions for a physical CPU...
containing machine code
Machine code
Machine code or machine language is a system of impartible instructions executed directly by a computer's central processing unit. Each instruction performs a very specific task, typically either an operation on a unit of data Machine code or machine language is a system of impartible instructions...
that runs directly on the hardware, or some intermediate code that requires a virtual machine
Virtual machine
A virtual machine is a "completely isolated guest operating system installation within a normal host operating system". Modern virtual machines are implemented with either software emulation or hardware virtualization or both together.VM Definitions:A virtual machine is a software...
to execute.
Formal theories, systems and proofs
In mathematical logicMathematical logic
Mathematical logic is a subfield of mathematics with close connections to foundations of mathematics, theoretical computer science and philosophical logic. The field includes both the mathematical study of logic and the applications of formal logic to other areas of mathematics...
, a formal theory is a set of sentence
Sentence (mathematical logic)
In mathematical logic, a sentence of a predicate logic is a booleanvalued well formed formula with no free variables. A sentence can be viewed as expressing a proposition, something that may be true or false...
s expressed in a formal language.
A formal system (also called a logical calculus, or a logical system) consists of a formal language together with a deductive apparatus (also called a deductive system). The deductive apparatus may consist of a set of transformation rules which may be interpreted as valid rules of inference or a set of axiom
Axiom
In traditional logic, an axiom or postulate is a proposition that is not proven or demonstrated but considered either to be selfevident or to define and delimit the realm of analysis. In other words, an axiom is a logical statement that is assumed to be true...
s, or have both. A formal system is used to derive
Proof theory
Proof theory is a branch of mathematical logic that represents proofs as formal mathematical objects, facilitating their analysis by mathematical techniques. Proofs are typically presented as inductivelydefined data structures such as plain lists, boxed lists, or trees, which are constructed...
one expression from one or more other expressions. Although a formal language can be identified with its formulas, a formal system cannot be likewise identified by its theorems. Two formal systems and may have all the same theorems and yet differ in some significant prooftheoretic way (a formula A may be a syntactic consequence of a formula B in one but not another for instance).
A formal proof or derivation is a finite sequence of wellformed formulas (which may be interpreted as proposition
Proposition
In logic and philosophy, the term proposition refers to either the "content" or "meaning" of a meaningful declarative sentence or the pattern of symbols, marks, or sounds that make up a meaningful declarative sentence...
s) each of which is an axiom or follows from the preceding formulas in the sequence by a rule of inference
Rule of inference
In logic, a rule of inference, inference rule, or transformation rule is the act of drawing a conclusion based on the form of premises interpreted as a function which takes premises, analyses their syntax, and returns a conclusion...
. The last sentence in the sequence is a theorem of a formal system. Formal proofs are useful because their theorems can be interpreted as true propositions.
Interpretations and models
Formal languages are entirely syntactic in nature but may be given semanticsSemantics
Semantics is the study of meaning. It focuses on the relation between signifiers, such as words, phrases, signs and symbols, and what they stand for, their denotata....
that give meaning to the elements of the language. For instance, in mathematical logic
Logic
In philosophy, Logic is the formal systematic study of the principles of valid inference and correct reasoning. Logic is used in most intellectual activities, but is studied primarily in the disciplines of philosophy, mathematics, semantics, and computer science...
, the set of possible formulas of a particular logic is a formal language, and an interpretation
Interpretation (logic)
An interpretation is an assignment of meaning to the symbols of a formal language. Many formal languages used in mathematics, logic, and theoretical computer science are defined in solely syntactic terms, and as such do not have any meaning until they are given some interpretation...
assigns a meaning to each of the formulas—usually, a truth value.
The study of interpretations of formal languages is called formal semantics. In mathematical logic, this is often done in terms of model theory
Model theory
In mathematics, model theory is the study of mathematical structures using tools from mathematical logic....
. In model theory, the terms that occur in a formula are interpreted as mathematical structures
Structure (mathematical logic)
In universal algebra and in model theory, a structure consists of a set along with a collection of finitary operations and relations which are defined on it....
, and fixed compositional interpretation rules determine how the truth value of the formula can be derived from the interpretation of its terms; a model for a formula is an interpretation of terms such that the formula becomes true.
See also
 Combinatorics on wordsCombinatorics on wordsCombinatorics on words is a branch of mathematics which applies combinatorics to words and formal languages. The study of combinatorics on words arose independently within several branches of mathematics, e.g. number theory, group theory and probability...
 Grammar framework
 Formal method
 Mathematical notationMathematical notationMathematical notation is a system of symbolic representations of mathematical objects and ideas. Mathematical notations are used in mathematics, the physical sciences, engineering, and economics...
 Rational languageRational languageIn computer science rational languages are a category of formal languages. They are defined as the set of strings for which some rational series assigns nonzero values . When the semiring of the rational series is boolean, the associated rational languages are simply the regular languages....
General references
 A. G. Hamilton, Logic for Mathematicians, Cambridge University Press, 1978, ISBN 0 521 21838 1.
 Seymour GinsburgSeymour GinsburgSeymour Ginsburg was a pioneer of automata theory, formal language theory, anddatabase theory, in particular; and computer science, in general...
, Algebraic and automata theoretic properties of formal languages, NorthHolland, 1975, ISBN 0 7204 2506 9.  Michael A. Harrison, Introduction to Formal Language Theory, AddisonWesley, 1978.
 John E. Hopcroft and Jeffrey D. Ullman, Introduction to Automata Theory, Languages, and ComputationIntroduction to Automata Theory, Languages, and ComputationIntroduction to Automata Theory, Languages, and Computation, is an influential computer science textbook by John Hopcroft and Jeffrey Ullman on formal languages and the theory of computation. Nickname :...
, AddisonWesley Publishing, Reading Massachusetts, 1979. ISBN 0201029880X.  Grzegorz Rozenberg, Arto Salomaa, Handbook of Formal Languages: Volume IIII, Springer, 1997, ISBN 3 540 61486 9.
 Patrick Suppes, Introduction to Logic, D. Van Nostrand, 1957, ISBN 0 442 08072 7.
External links
 University of MarylandUniversity of Maryland, BaltimoreUniversity of Maryland, Baltimore, was founded in 1807. It comprises some of the oldest professional schools in the nation and world. It is the original campus of the University System of Maryland. Located on 60 acres in downtown Baltimore, Maryland, it is part of the University System of Maryland...
, Formal Language Definitions  James Power, "Notes on Formal Language Theory and Parsing", 29 November 2002.
 Drafts of some chapters in the "Handbook of Formal Language Theory", Vol. 13, G. Rozenberg and A. Salomaa (eds.), Springer Verlag, (1997):t
 Alexandru Mateescu and Arto Salomaa, "Preface" in Vol.1, pp. vviii, and "Formal Languages: An Introduction and a Synopsis", Chapter 1 in Vol. 1, pp.139
 Sheng Yu, "Regular Languages", Chapter 2 in Vol. 1
 JeanMichel Autebert, Jean Berstel, Luc Boasson, "ContextFree Languages and PushDown Automata", Chapter 3 in Vol. 1
 Christian Choffrut and Juhani Karhumäki, "Combinatorics of Words", Chapter 6 in Vol. 1
 Tero Harju and Juhani Karhumäki, "Morphisms", Chapter 7 in Vol. 1, pp. 439  510
 JeanEric Pin, "Syntactic semigroups", Chapter 10 in Vol. 1, pp. 679746
 M. Crochemore and C. Hancart, "Automata for matching patterns", Chapter 9 in Vol. 2
 Dora Giammarresi, Antonio Restivo, "Twodimensional Languages", Chapter 4 in Vol. 3, pp. 215  267