Top-down parsing
Encyclopedia
Top-down parsing is a type of parsing strategy where in one first looks at the highest level of the parse tree
Parse tree
A concrete syntax tree or parse tree or parsing treeis an ordered, rooted tree that represents the syntactic structure of a string according to some formal grammar. In a parse tree, the interior nodes are labeled by non-terminals of the grammar, while the leaf nodes are labeled by terminals of the...

 and works down the parse tree by using the rewriting rules of a formal grammar
Formal grammar
A formal grammar is a set of formation rules for strings in a formal language. The rules describe how to form strings from the language's alphabet that are valid according to the language's syntax...

. LL parser
LL parser
An LL parser is a top-down parser for a subset of the context-free grammars. It parses the input from Left to right, and constructs a Leftmost derivation of the sentence...

s are a type of parser that uses a top-down parsing strategy.

Top-down parsing is a strategy of analyzing unknown data relationships by hypothesizing general parse tree
Parse tree
A concrete syntax tree or parse tree or parsing treeis an ordered, rooted tree that represents the syntactic structure of a string according to some formal grammar. In a parse tree, the interior nodes are labeled by non-terminals of the grammar, while the leaf nodes are labeled by terminals of the...

 structures and then considering whether the known fundamental structures are compatible with the hypothesis. It occurs in the analysis of both natural language
Language
Language may refer either to the specifically human capacity for acquiring and using complex systems of communication, or to a specific instance of such a system of complex communication...

s and computer languages.

Top-down parsing can be viewed as an attempt to find left-most derivations of an input-stream by searching for parse-trees
Parse tree
A concrete syntax tree or parse tree or parsing treeis an ordered, rooted tree that represents the syntactic structure of a string according to some formal grammar. In a parse tree, the interior nodes are labeled by non-terminals of the grammar, while the leaf nodes are labeled by terminals of the...

 using a top-down expansion of the given formal grammar
Formal grammar
A formal grammar is a set of formation rules for strings in a formal language. The rules describe how to form strings from the language's alphabet that are valid according to the language's syntax...

 rules. Tokens are consumed from left to right. Inclusive choice is used to accommodate ambiguity
Ambiguity
Ambiguity of words or phrases is the ability to express more than one interpretation. It is distinct from vagueness, which is a statement about the lack of precision contained or available in the information.Context may play a role in resolving ambiguity...

 by expanding all alternative right-hand-sides of grammar rules.

Simple implementations of top-down parsing do not terminate for left-recursive
Left recursion
In computer science, left recursion is a special case of recursion.In terms of context-free grammar, a non-terminal r is left-recursive if the left-most symbol in any of r’s ‘alternatives’ either immediately or through some other non-terminal definitions rewrites to r again.- Definition :"A...

 grammars, and top-down parsing with backtracking may have exponential time complexity with respect to the length of the input for ambiguous CFGs
Context-free grammar
In formal language theory, a context-free grammar is a formal grammar in which every production rule is of the formwhere V is a single nonterminal symbol, and w is a string of terminals and/or nonterminals ....

. However, more sophisticated top-down parsers have been created by Frost, Hafiz, and Callaghan which do accommodate ambiguity and left recursion in polynomial time and which generate polynomial-sized representations of the potentially-exponential number of parse trees.

Programming language application

A compiler
Compiler
A compiler is a computer program that transforms source code written in a programming language into another computer language...

 parses input from a programming language to assembly language or an internal representation by matching the incoming symbols to Backus-Naur form production rules. An LL parser
LL parser
An LL parser is a top-down parser for a subset of the context-free grammars. It parses the input from Left to right, and constructs a Leftmost derivation of the sentence...

, is a type of parser that does top-down parsing by applying each production rule to the incoming symbols, working from the left-most symbol yielded on a production rule and then proceeding to the next production rule for each non-terminal symbol encountered. In this way the parsing starts on the Left of the result side (right side) of the production rule and evaluates non-terminals from the Left first and, thus, proceeds down the parse tree for each new non-terminal before continuing to the next symbol for a production rule.

For example:


would match and attempt to match next. Then would be tried. As one may expect, some languages are more ambiguous
Ambiguity
Ambiguity of words or phrases is the ability to express more than one interpretation. It is distinct from vagueness, which is a statement about the lack of precision contained or available in the information.Context may play a role in resolving ambiguity...

 than others. For a non-ambiguous language in which all productions for a non-terminal produce distinct strings: the string produced by one production will not start with the same symbol as the string produced by another production. A non-ambiguous language may be parsed by an LL(1) grammar where the (1) signifies the parser reads ahead one token at a time. For an ambiguous language to be parsed by an LL parser, the parser must lookahead more than 1 symbol, e.g. LL(3).

The common solution to this problem is to use an LR parser
LR parser
In computer science, an LR parser is a parser that reads input from Left to right and produces a Rightmost derivation. The term LR parser is also used; where the k refers to the number of unconsumed "look ahead" input symbols that are used in making parsing decisions...

, which is a type of shift-reduce parser, and does bottom-up parsing
Bottom-up parsing
Bottom-up parsing is a strategy for analyzing unknown information that attempts to identify the most fundamental units first, and then to infer higher-order structures from them...

.

Accommodating left recursion in top-down parsing

A formal grammar
Formal grammar
A formal grammar is a set of formation rules for strings in a formal language. The rules describe how to form strings from the language's alphabet that are valid according to the language's syntax...

 that contains left recursion
Left recursion
In computer science, left recursion is a special case of recursion.In terms of context-free grammar, a non-terminal r is left-recursive if the left-most symbol in any of r’s ‘alternatives’ either immediately or through some other non-terminal definitions rewrites to r again.- Definition :"A...

 cannot be parsed
Parsing
In computer science and linguistics, parsing, or, more formally, syntactic analysis, is the process of analyzing a text, made of a sequence of tokens , to determine its grammatical structure with respect to a given formal grammar...

 by a naive recursive descent parser
Recursive descent parser
A recursive descent parser is a top-down parser built from a set of mutually-recursive procedures where each such procedure usually implements one of the production rules of the grammar...

 unless they are converted to a weakly equivalent right-recursive form. However, recent research demonstrates that it is possible to accommodate left-recursive grammars (along with all other forms of general CFGs
Context-free grammar
In formal language theory, a context-free grammar is a formal grammar in which every production rule is of the formwhere V is a single nonterminal symbol, and w is a string of terminals and/or nonterminals ....

) in a more sophisticated top-down parser by use of curtailment. A recognition algorithm which accommodates ambiguous
Ambiguity
Ambiguity of words or phrases is the ability to express more than one interpretation. It is distinct from vagueness, which is a statement about the lack of precision contained or available in the information.Context may play a role in resolving ambiguity...

 grammars and curtails an ever-growing direct left-recursive parse by imposing depth restrictions with respect to input length and current input position, is described by Frost and Hafiz in 2006. That algorithm was extended to a complete parsing
Parsing
In computer science and linguistics, parsing, or, more formally, syntactic analysis, is the process of analyzing a text, made of a sequence of tokens , to determine its grammatical structure with respect to a given formal grammar...

 algorithm to accommodate indirect (by comparing previously-computed context with current context) as well as direct left-recursion in polynomial
Polynomial
In mathematics, a polynomial is an expression of finite length constructed from variables and constants, using only the operations of addition, subtraction, multiplication, and non-negative integer exponents...

 time, and to generate compact polynomial-size representations of the potentially-exponential number of parse trees for highly-ambiguous grammars by Frost, Hafiz and Callaghan in 2007. The algorithm has since been implemented as a set of parser combinator
Parser Combinator
In functional programming, a parser combinator is a higher-order function which accepts several parsers as input and returns a new parser as its output. In this context, a parser is a function accepting strings as input and returning some structure as output, typically a parse tree or a set of...

s written in the Haskell
Haskell (programming language)
Haskell is a standardized, general-purpose purely functional programming language, with non-strict semantics and strong static typing. It is named after logician Haskell Curry. In Haskell, "a function is a first-class citizen" of the programming language. As a functional programming language, the...

 programming language. The implementation details of these new set of combinators can be found in a paper by the above-mentioned authors, which was presented in PADL'08.
The X-SAIGA site has more about the algorithms and implementation details.

Time and space complexity of top-down parsing

When top-down parser tries to parse an ambiguous input with respect to an ambiguous CFG, it may need exponential number of steps (with respect to the length of the input) to try all alternatives of the CFG in order to produce all possible parse trees, which eventually would require exponential memory space. The problem of exponential time complexity in top-down parsers constructed as sets of mutually-recursive functions has been solved by Norvig in 1991. His technique is similar to the use of dynamic programming and state-sets in Earley's algorithm
Earley parser
In computer science, the Earley parser is an algorithm for parsing strings that belong to a given context-free language, named after its inventor, Jay Earley...

 (1970), and tables in the CYK algorithm
CYK algorithm
The Cocke–Younger–Kasami algorithm is a parsing algorithm for context-free grammars. It employs bottom-up parsing and dynamic programming....

 of Cocke, Younger and Kasami.

The key idea is to store results of applying a parser p at position j in a memotable and to reuse results whenever the same situation arises. Frost, Hafiz and Callaghan also use memoization
Memoization
In computing, memoization is an optimization technique used primarily to speed up computer programs by having function calls avoid repeating the calculation of results for previously processed inputs...

 for refraining redundant computations to accommodate any form of CFG in polynomial
Polynomial
In mathematics, a polynomial is an expression of finite length constructed from variables and constants, using only the operations of addition, subtraction, multiplication, and non-negative integer exponents...

 time (Θ
Big O notation
In mathematics, big O notation is used to describe the limiting behavior of a function when the argument tends towards a particular value or infinity, usually in terms of simpler functions. It is a member of a larger family of notations that is called Landau notation, Bachmann-Landau notation, or...

(n4) for left-recursive grammars and Θ
Big O notation
In mathematics, big O notation is used to describe the limiting behavior of a function when the argument tends towards a particular value or infinity, usually in terms of simpler functions. It is a member of a larger family of notations that is called Landau notation, Bachmann-Landau notation, or...

(n3) for non left-recursive grammars). Their top-down parsing algorithm also requires polynomial space for potentially exponential ambiguous parse trees by 'compact representation' and 'local ambiguities grouping'. Their compact representation is comparable with Tomita’s compact representation of bottom-up parsing
Bottom-up parsing
Bottom-up parsing is a strategy for analyzing unknown information that attempts to identify the most fundamental units first, and then to infer higher-order structures from them...

.

Using PEG's, another representation of grammars, packrat parsers provide an elegant and powerful parsing algorithm. See Parsing expression grammar
Parsing expression grammar
A parsing expression grammar, or PEG, is a type of analytic formal grammar, i.e. it describes a formal language in terms of a set of rules for recognizing strings in the language...

.

External links

  • X-SAIGA - eXecutable SpecificAtIons of GrAmmars
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK