LALR parser - AbsoluteAstronomy.com

Computer science

Computer science or computing science is the study of the theoretical foundations of information and computation and of practical techniques for their implementation and application in computer systems...

, an LALR parser (or look-ahead LR parser) is a type of LR parser

LR parser

In computer science, an LR parser is a parser that reads input from Left to right and produces a Rightmost derivation. The term LR parser is also used; where the k refers to the number of unconsumed "look ahead" input symbols that are used in making parsing decisions...

based on a finite-state-automata concept. The data structure used by an LALR parser is a pushdown automaton

Pushdown automaton

In automata theory, a pushdown automaton is a finite automaton that can make use of a stack containing data.- Operation :Pushdown automata differ from finite state machines in two ways:...

(PDA). A deterministic PDA is a deterministic-finite automaton (DFA) with the addition of a stack for a memory, indicating which states the parser has passed through to arrive at the current state. Because of the stack, a PDA can recognize grammars that would be impossible with a DFA; for example, a PDA can determine whether an expression has any unmatched parentheses, whereas an automaton with no stack would require an infinite number of states due to unlimited nesting of parentheses.

LALR parsers are driven by a parser table in a finite-state machine (FSM) format. An FSM is tedious enough for humans to construct by hand that it is more convenient to use a software tool called an LALR parser generator

LALR parser generator

An LALR parser generator is a software tool that reads a BNF grammar and creates an LALR parser which is capable of parsing text files written in the computer language defined by the BNF grammar...

to generate a parser table automatically from a grammar in Backus–Naur Form

Backus–Naur form

In computer science, BNF is a notation technique for context-free grammars, often used to describe the syntax of languages used in computing, such as computer programming languages, document formats, instruction sets and communication protocols.It is applied wherever exact descriptions of...

which defines the syntax of the computer language the parser will process. The parser table is often generated in source code format in a computer language (such as C++ or Java). When the parser (with parser table) is compiled and/or executed, it will recognize text files written in the language defined by the BNF grammar.

LALR parsers are generated from LALR grammars, which are capable of defining a larger class of languages than
SLR grammar

SLR Grammar

In computer science, an SLR grammar is a class of formal grammar that can be parsed by a simple LR parser.- Definitions :A rule of the form B → y. within a state of a SLR automaton is said to be irreducible or in a reduce state because it has been completely expanded and is incapable of undergoing...

s, but not as large a class as LR grammars. Real computer languages can often be expressed as LALR(1) grammars, and in cases where a LALR(1) grammar is insufficient, usually an LALR(2) grammar is adequate. If the parser generator handles only LALR(1) grammars, then the LALR parser will have to interface with some hand-written code when it encounters the special LALR(2) situation in the input language.

History

LR parsing was invented by Donald Knuth

Donald Knuth

Donald Ervin Knuth is a computer scientist and Professor Emeritus at Stanford University.He is the author of the seminal multi-volume work The Art of Computer Programming. Knuth has been called the "father" of the analysis of algorithms...

in 1965 in a paper, "On the Translation of Languages from Left to Right". LR parsers have the potential of providing the highest parsing speed of all parsing algorithms. However, LR parsers were once considered impractical because algorithms to generate LR parsers of tractable size were not known until the mid 1970s.

LALR parsing was invented by Frank DeRemer in 1969 in a paper, "Practical LR(k) Translators". LALR parsers offer the same high performance of LR parsers, and produce much smaller tables than the early LR parser generation algorithms of the late 1960s. For this reason, the LALR algorithm became popular, and LALR parsers are the ones most often generated by compiler-compiler

Compiler-compiler

A compiler-compiler or compiler generator is a tool that creates a parser, interpreter, or compiler from some form of formal description of a language and machine...

s such as yacc

Yacc

The computer program yacc is a parser generator developed by Stephen C. Johnson at AT&T for the Unix operating system. The name is an acronym for "Yet Another Compiler Compiler." It generates a parser based on an analytic grammar written in a notation similar to BNF.Yacc used to be available as...

and GNU bison

GNU bison

GNU bison, commonly known as Bison, is a parser generator that is part of the GNU Project. Bison reads a specification of a context-free language, warns about any parsing ambiguities, and generates a parser which reads sequences of tokens and decides whether the sequence conforms to the syntax...

.

(Compiler-compilers such as Menhir and HYacc that generate true LR parser tables using Pager's algorithm have appeared in recent years but have not seen widespread adoption -- their chief advantage is that they do not create spurious reduce/reduce conflicts for unambiguous deterministic grammars.)

Generating LALR Parsers

Similar to an SLR parser generator, an LALR parser generator

LALR parser generator

An LALR parser generator is a software tool that reads a BNF grammar and creates an LALR parser which is capable of parsing text files written in the computer language defined by the BNF grammar...

constructs the LR(0) state machine first and then computes the lookahead sets for all rules in the grammar, checking for ambiguity. An SLR parser generator computes the lookahead sets by examining the BNF grammar (these are called follow sets). However, an LALR parser generator computes the lookahead sets by examining the LR(0) state machine (these are called LALR lookahead sets, which are more accurate and less likely to cause a conflict/ambiguity to be reported by the parser generator).

Like SLR, LALR is a refinement to the technique for constructing LR(0) parse tables. While SLR uses follow sets to construct reduce actions, LALR uses lookahead sets, which are more specific because they take more of the parsing context into account. Follow sets are associated with a symbol, while lookahead sets are specific to an LR(0) item and a parser state.

Specifically, the follow set for a given LR(0) item I in a given parser state S contains all symbols that are allowed by the grammar to appear after I's left-hand-side nonterminal. In contrast, the lookahead set for item I in state S contains only those symbols that are allowed by the grammar to appear after I's right-hand-side has been parsed starting from state S. follow(I) is effectively the union of the lookahead sets for all LR(0) items with the same left-hand-side as I, regardless of parser states or right-hand-sides, therefore losing all context information. Because the lookahead set is specific to a particular parsing context, it can be more selective, therefore allowing finer distinctions than the follow set.

Unfortunately, computing LALR lookahead sets is much more complicated than SLR. Frank DeRemer and Tom Pennello wrote a paper about this, published in SIGPLAN Notices in 1979 and in TOPLAS in 1982, called "Efficient Computation Of LALR(1) Look-Ahead Sets".

Advantages

An LALR parser can be automatically generated from an LALR grammar.
An LALR grammar can be used to define many computer languages.
An LALR parser is small.
An LALR parser is fast (if the parsing algorithm uses a matrix parser-table format).
An LALR parser is linear in speed (i.e. the speed is based on the size of the input text file only and not based on the size of the language being recognized).
The LALR grammar provides valuable documentation of the language being recognized.
Error recovery may already be built-in to the parser.
Generalized error messages may already be built into the parser.
Abstract-syntax-tree construction may already be built into the parser.
Recognition of context-sensitive language constructs may already be built into the parser.

Disadvantages

Software engineers are required to use an LALR parser generator, which may or may not be user friendly and may require some learning time.
Implementing meaningful error messages in the parser may be very difficult or impossible.
Understanding the parsing algorithm is often quite difficult.
If an error occurs, it may be difficult to determine whether it's in the grammar or the parser code.
If there is an error in the parser generator, this may be very difficult to fix.

External links

Parsing Simulator This simulator is used to generate parsing tables LALR and resolve the exercises of the book.
JS/CC JavaScript based implementation of a LALR(1) parser generator, which can be run in a web-browser or from the command-line.
LALR(1) tutorial A flash card like tutorial on LALR(1) parsing.

The source of this article is wikipedia, the free encyclopedia. The text of this article is licensed under the GFDL.

History

Generating LALR Parsers

Advantages

Disadvantages

See also

External links