Left recursion
Encyclopedia
In computer science
, left recursion is a special case of recursion
.
In terms of context-free grammar
, a non-terminal
where and are sequences of nonterminals and terminals, and doesn't start with . For example, the rule
is immediately left-recursive. The recursive descent parser
for this rule might look like:
and a recursive descent parser would fall into infinite recursion when trying to parse a grammar which contains this rule.
possibly giving the derivation
More generally, for the nonterminals , indirect left recursion can be defined as being of the form:
where are sequences of nonterminals and terminals.
that contains left recursion cannot be parsed by a LL(k)-parser
or other naive recursive descent parser
unless it is converted to a weakly equivalent
right-recursive form. In contrast, left recursion is preferred for LALR parsers because it results in lower stack usage than right recursion. However, more sophisticated top-down parsers can implement general context-free grammar
s by use of curtailment. In 2006, Frost and Hafiz describe an algorithm which accommodates ambiguous grammar
s with direct left-recursive production rules. That algorithm was extended to a complete parsing
algorithm to accommodate indirect as well as direct left-recursion in polynomial
time, and to generate compact polynomial-size representations of the potentially-exponential number of parse trees for highly-ambiguous grammars by Frost, Hafiz and Callaghan in 2007. The authors then implemented the algorithm as a set of parser combinator
s written in the Haskell
programming language.
For each rule of the form
where:
replace the A-production by the production:
And create a new nonterminal
This newly created symbol is often called the "tail", or the "rest".
As an example, consider the rule
This could be rewritten to avoid left recursion as
The last rule happens to be equivalent to the slightly shorter form
Arrange the nonterminals in some (any) fixed order .
Example :
We start out with a grammar :
After having applied standard transformations to remove left-recursion, we have the following grammar :
Parsing the string 'a + a + a' with the first grammar in an LALR parser (which can recognize left-recursive grammars) would have resulted in the parse tree:
Expr
/ | \
Expr + Term
/ | \ \
Expr + Term Factor
| | |
Term Factor Int
| |
Factor Int
|
Int
This parse tree grows to the left, indicating that the '+' operator is left associative, representing (a + a) + a.
But now that we've changed the grammar, our parse tree looks like this :
Expr ---
/ \
Term Expr' --
| / | \
Factor + Term Expr' ------
| | | \ \
Int Factor + Term Expr'
| | |
Int Factor
|
Int
We can see that the tree grows to the right, representing a + ( a + a). We have changed the associativity of our operator '+', it is now right-associative. While this isn't a problem for the associativity of addition, it would have a significantly different value if this were subtraction.
The problem is that normal arithmetic requires left associativity. Several solutions are: (a) rewrite the grammar to be left recursive, or (b) rewrite the grammar with more nonterminals to force the correct precedence/associativity, or (c) if using YACC
or Bison
, there are operator declarations, %left, %right and %nonassoc, which tell the parser generator which associativity to force.
Computer science
Computer science or computing science is the study of the theoretical foundations of information and computation and of practical techniques for their implementation and application in computer systems...
, left recursion is a special case of recursion
Recursion
Recursion is the process of repeating items in a self-similar way. For instance, when the surfaces of two mirrors are exactly parallel with each other the nested images that occur are a form of infinite recursion. The term has a variety of meanings specific to a variety of disciplines ranging from...
.
In terms of context-free grammar
Context-free grammar
In formal language theory, a context-free grammar is a formal grammar in which every production rule is of the formwhere V is a single nonterminal symbol, and w is a string of terminals and/or nonterminals ....
, a non-terminal
r
is left-recursive if the left-most symbol in any of r
’s ‘alternatives’ either immediately (direct left-recursive) or through some other non-terminal definitions (indirect/hidden left-recursive) rewrites to r
again.Definition
"A grammar is left-recursive if we can find some non-terminal A which will eventually derive a sentential form with itself as the left-symbol."Immediate left recursion
Immediate left recursion occurs in rules of the formwhere and are sequences of nonterminals and terminals, and doesn't start with . For example, the rule
is immediately left-recursive. The recursive descent parser
Recursive descent parser
A recursive descent parser is a top-down parser built from a set of mutually-recursive procedures where each such procedure usually implements one of the production rules of the grammar...
for this rule might look like:
and a recursive descent parser would fall into infinite recursion when trying to parse a grammar which contains this rule.
Indirect left recursion
Indirect left recursion in its simplest form could be defined as:possibly giving the derivation
More generally, for the nonterminals , indirect left recursion can be defined as being of the form:
where are sequences of nonterminals and terminals.
Accommodating left recursion in top-down parsing
A formal grammarFormal grammar
A formal grammar is a set of formation rules for strings in a formal language. The rules describe how to form strings from the language's alphabet that are valid according to the language's syntax...
that contains left recursion cannot be parsed by a LL(k)-parser
LL parser
An LL parser is a top-down parser for a subset of the context-free grammars. It parses the input from Left to right, and constructs a Leftmost derivation of the sentence...
or other naive recursive descent parser
Recursive descent parser
A recursive descent parser is a top-down parser built from a set of mutually-recursive procedures where each such procedure usually implements one of the production rules of the grammar...
unless it is converted to a weakly equivalent
Weak equivalence
-Mathematics:In mathematics, a weak equivalence is a notion from homotopy theory which in some sense identifies objects that have the same basic "shape"...
right-recursive form. In contrast, left recursion is preferred for LALR parsers because it results in lower stack usage than right recursion. However, more sophisticated top-down parsers can implement general context-free grammar
Context-free grammar
In formal language theory, a context-free grammar is a formal grammar in which every production rule is of the formwhere V is a single nonterminal symbol, and w is a string of terminals and/or nonterminals ....
s by use of curtailment. In 2006, Frost and Hafiz describe an algorithm which accommodates ambiguous grammar
Ambiguous grammar
In computer science, a context-free grammar is said to be an ambiguous grammar if there exists a string that can be generated by the grammar in more than one way...
s with direct left-recursive production rules. That algorithm was extended to a complete parsing
Parsing
In computer science and linguistics, parsing, or, more formally, syntactic analysis, is the process of analyzing a text, made of a sequence of tokens , to determine its grammatical structure with respect to a given formal grammar...
algorithm to accommodate indirect as well as direct left-recursion in polynomial
Polynomial
In mathematics, a polynomial is an expression of finite length constructed from variables and constants, using only the operations of addition, subtraction, multiplication, and non-negative integer exponents...
time, and to generate compact polynomial-size representations of the potentially-exponential number of parse trees for highly-ambiguous grammars by Frost, Hafiz and Callaghan in 2007. The authors then implemented the algorithm as a set of parser combinator
Parser Combinator
In functional programming, a parser combinator is a higher-order function which accepts several parsers as input and returns a new parser as its output. In this context, a parser is a function accepting strings as input and returning some structure as output, typically a parse tree or a set of...
s written in the Haskell
Haskell (programming language)
Haskell is a standardized, general-purpose purely functional programming language, with non-strict semantics and strong static typing. It is named after logician Haskell Curry. In Haskell, "a function is a first-class citizen" of the programming language. As a functional programming language, the...
programming language.
Removing immediate left recursion
The general algorithm to remove immediate left recursion follows. Several improvements to this method have been made, including the ones described in "Removing Left Recursion from Context-Free Grammars", written by Robert C. Moore.For each rule of the form
where:
- A is a left-recursive nonterminal
- is a sequence of nonterminals and terminals that is not null ()
- is a sequence of nonterminals and terminals that does not start with A.
replace the A-production by the production:
And create a new nonterminal
This newly created symbol is often called the "tail", or the "rest".
As an example, consider the rule
This could be rewritten to avoid left recursion as
The last rule happens to be equivalent to the slightly shorter form
Removing indirect left recursion
If the grammar has no -productions (no productions of the form ) and is not cyclic (no derivations of the form for any nonterminal A), this general algorithm may be applied to remove indirect left recursion :Arrange the nonterminals in some (any) fixed order .
- for i = 1 to n {
- for j = 1 to i – 1 {
-
- let the current productions be
-
- replace each production by
-
- }
- remove direct left recursion for
- for j = 1 to i – 1 {
- }
Pitfalls
The above transformations remove left-recursion by creating a right-recursive grammar; but this changes the associativity of our rules. Left recursion makes left associativity; right recursion makes right associativity.Example :
We start out with a grammar :
After having applied standard transformations to remove left-recursion, we have the following grammar :
Parsing the string 'a + a + a' with the first grammar in an LALR parser (which can recognize left-recursive grammars) would have resulted in the parse tree:
Expr
/ | \
Expr + Term
/ | \ \
Expr + Term Factor
| | |
Term Factor Int
| |
Factor Int
|
Int
This parse tree grows to the left, indicating that the '+' operator is left associative, representing (a + a) + a.
But now that we've changed the grammar, our parse tree looks like this :
Expr ---
/ \
Term Expr' --
| / | \
Factor + Term Expr' ------
| | | \ \
Int Factor + Term Expr'
| | |
Int Factor
|
Int
We can see that the tree grows to the right, representing a + ( a + a). We have changed the associativity of our operator '+', it is now right-associative. While this isn't a problem for the associativity of addition, it would have a significantly different value if this were subtraction.
The problem is that normal arithmetic requires left associativity. Several solutions are: (a) rewrite the grammar to be left recursive, or (b) rewrite the grammar with more nonterminals to force the correct precedence/associativity, or (c) if using YACC
Yacc
The computer program yacc is a parser generator developed by Stephen C. Johnson at AT&T for the Unix operating system. The name is an acronym for "Yet Another Compiler Compiler." It generates a parser based on an analytic grammar written in a notation similar to BNF.Yacc used to be available as...
or Bison
GNU bison
GNU bison, commonly known as Bison, is a parser generator that is part of the GNU Project. Bison reads a specification of a context-free language, warns about any parsing ambiguities, and generates a parser which reads sequences of tokens and decides whether the sequence conforms to the syntax...
, there are operator declarations, %left, %right and %nonassoc, which tell the parser generator which associativity to force.
External links
- CMU lecture on left-recursion
- Practical Considerations for LALR(1) Grammars
- X-SAIGA - eXecutable SpecificAtIons of GrAmmars