Ambiguous grammar
Encyclopedia
In computer science
, a context-free grammar
is said to be an ambiguous grammar if there exists a string
that can be generated by the grammar in more than one way (i.e., the string admits more than one parse tree
or, equivalently, more than one leftmost derivation). A context-free language
is inherently ambiguous if all context-free grammars generating that language are ambiguous.
Some programming language
s have ambiguous grammars; in this case, semantic information is needed to select the intended parse tree of an ambiguous construct. For example, in C
the following:
can be interpreted as either:
To correctly choose between the two possible interpretations, a compiler
must consult its symbol table
to find out whether
is ambiguous since there are two leftmost derivations for the string a + a + a:
As another example, the grammar is ambiguous since there are two parse trees for the string a + a − a:
The language that it generates, however, is not inherently ambiguous; the following is a non-ambiguous grammar generating the same language:
can exist to determine the ambiguity of a grammar because the undecidable Post correspondence problem
can be encoded as an ambiguity problem. At least, there are tools implementing some semi-decision procedure for detecting ambiguity of context-free grammars, see e.g. .
There is an obvious difficulty in parsing an ambiguous grammar by a deterministic parser (see deterministic context-free grammar
) but nondeterministic parsing imposes a great efficiency penalty. Most constructs of interest to parsing can be recognized by unambiguous grammars. Some ambiguous grammars can be converted into unambiguous grammars, but no general procedure for doing this is possible just as no algorithm exists for detecting ambiguous grammars. Compiler generators such as YACC
include features for disambiguating some kinds of ambiguity, such as by using the precedence and associativity constraints.
in an MIT research report and later a journal version ).
While some context-free languages (the set of strings that can be generated by a grammar) have both ambiguous and unambiguous grammars, there exist context-free languages for which no unambiguous context-free grammar can exist. An example of an inherently ambiguous language is the union of with . This set is context-free, since the union of two context-free languages is always context-free. But give a proof that there is no way to unambiguously parse strings in the (non-context-free) subset which is the intersection of these two languages.
Computer science
Computer science or computing science is the study of the theoretical foundations of information and computation and of practical techniques for their implementation and application in computer systems...
, a context-free grammar
Context-free grammar
In formal language theory, a context-free grammar is a formal grammar in which every production rule is of the formwhere V is a single nonterminal symbol, and w is a string of terminals and/or nonterminals ....
is said to be an ambiguous grammar if there exists a string
String (computer science)
In formal languages, which are used in mathematical logic and theoretical computer science, a string is a finite sequence of symbols that are chosen from a set or alphabet....
that can be generated by the grammar in more than one way (i.e., the string admits more than one parse tree
Parse tree
A concrete syntax tree or parse tree or parsing treeis an ordered, rooted tree that represents the syntactic structure of a string according to some formal grammar. In a parse tree, the interior nodes are labeled by non-terminals of the grammar, while the leaf nodes are labeled by terminals of the...
or, equivalently, more than one leftmost derivation). A context-free language
Context-free language
In formal language theory, a context-free language is a language generated by some context-free grammar. The set of all context-free languages is identical to the set of languages accepted by pushdown automata.-Examples:...
is inherently ambiguous if all context-free grammars generating that language are ambiguous.
Some programming language
Programming language
A programming language is an artificial language designed to communicate instructions to a machine, particularly a computer. Programming languages can be used to create programs that control the behavior of a machine and/or to express algorithms precisely....
s have ambiguous grammars; in this case, semantic information is needed to select the intended parse tree of an ambiguous construct. For example, in C
C (programming language)
C is a general-purpose computer programming language developed between 1969 and 1973 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system....
the following:
can be interpreted as either:
- the declaration of an identifier named
y
of type pointer-to-x
, or - an expression in which
x
is multiplied byy
and then the result is discarded.
To correctly choose between the two possible interpretations, a compiler
Compiler
A compiler is a computer program that transforms source code written in a programming language into another computer language...
must consult its symbol table
Symbol table
In computer science, a symbol table is a data structure used by a language translator such as a compiler or interpreter, where each identifier in a program's source code is associated with information relating to its declaration or appearance in the source, such as its type, scope level and...
to find out whether
x
has been declared as a typedef name that is visible at this point.Example
The context free grammar- A → A + A | A − A | a
is ambiguous since there are two leftmost derivations for the string a + a + a:
A | → A + A | A | → A + A | ||
→ a + A | → A + A + A (First A is replaced by A+A. Replacement of the second A would yield a similar derivation) | ||||
→ a + A + A | → a + A + A | ||||
→ a + a + A | → a + a + A | ||||
→ a + a + a | → a + a + a |
As another example, the grammar is ambiguous since there are two parse trees for the string a + a − a:
The language that it generates, however, is not inherently ambiguous; the following is a non-ambiguous grammar generating the same language:
- A → A + a | A − a | a
Recognizing ambiguous grammars
The general question of whether a grammar is not ambiguous is undecidable. No algorithmAlgorithm
In mathematics and computer science, an algorithm is an effective method expressed as a finite list of well-defined instructions for calculating a function. Algorithms are used for calculation, data processing, and automated reasoning...
can exist to determine the ambiguity of a grammar because the undecidable Post correspondence problem
Post correspondence problem
The Post correspondence problem is an undecidable decision problem that was introduced by Emil Post in 1946. Because it is simpler than the halting problem and the Entscheidungsproblem it is often used in proofs of undecidability....
can be encoded as an ambiguity problem. At least, there are tools implementing some semi-decision procedure for detecting ambiguity of context-free grammars, see e.g. .
There is an obvious difficulty in parsing an ambiguous grammar by a deterministic parser (see deterministic context-free grammar
Deterministic context-free grammar
In formal grammar theory, the deterministic context-free grammars are a proper subset of the context-free grammars. The deterministic context-free grammars are those a deterministic pushdown automaton can recognize...
) but nondeterministic parsing imposes a great efficiency penalty. Most constructs of interest to parsing can be recognized by unambiguous grammars. Some ambiguous grammars can be converted into unambiguous grammars, but no general procedure for doing this is possible just as no algorithm exists for detecting ambiguous grammars. Compiler generators such as YACC
Yacc
The computer program yacc is a parser generator developed by Stephen C. Johnson at AT&T for the Unix operating system. The name is an acronym for "Yet Another Compiler Compiler." It generates a parser based on an analytic grammar written in a notation similar to BNF.Yacc used to be available as...
include features for disambiguating some kinds of ambiguity, such as by using the precedence and associativity constraints.
Inherently ambiguous languages
Inherent ambiguity was first shown to exist in 1961 by Rohit ParikhRohit Jivanlal Parikh
Rohit Jivanlal Parikh , is a mathematician, logician, and philosopher who has worked in many areas in traditional logic, including recursion theory and proof theory...
in an MIT research report and later a journal version ).
While some context-free languages (the set of strings that can be generated by a grammar) have both ambiguous and unambiguous grammars, there exist context-free languages for which no unambiguous context-free grammar can exist. An example of an inherently ambiguous language is the union of with . This set is context-free, since the union of two context-free languages is always context-free. But give a proof that there is no way to unambiguously parse strings in the (non-context-free) subset which is the intersection of these two languages.
External links
- dk.brics.grammar - a grammar ambiguity analyzer.
- CFGAnalyzer - tool for analyzing context-free grammars with respect to language universality, ambiguity, and similar properties.