LALR parser
Encyclopedia
In computer science
, an LALR parser (or look-ahead LR parser) is a type of LR parser
based on a finite-state-automata concept. The data structure used by an LALR parser is a pushdown automaton
(PDA). A deterministic PDA is a deterministic-finite automaton (DFA) with the addition of a stack for a memory, indicating which states the parser has passed through to arrive at the current state. Because of the stack, a PDA can recognize grammars that would be impossible with a DFA; for example, a PDA can determine whether an expression has any unmatched parentheses, whereas an automaton with no stack would require an infinite number of states due to unlimited nesting of parentheses.
LALR parsers are driven by a parser table in a finite-state machine (FSM) format. An FSM is tedious enough for humans to construct by hand that it is more convenient to use a software tool called an LALR parser generator
to generate a parser table automatically from a grammar in Backus–Naur Form
which defines the syntax of the computer language the parser will process. The parser table is often generated in source code format in a computer language (such as C++ or Java). When the parser (with parser table) is compiled and/or executed, it will recognize text files written in the language defined by the BNF grammar.
LALR parsers are generated from LALR grammars, which are capable of defining a larger class of languages than
SLR grammar
s, but not as large a class as LR grammars. Real computer languages can often be expressed as LALR(1) grammars, and in cases where a LALR(1) grammar is insufficient, usually an LALR(2) grammar is adequate. If the parser generator handles only LALR(1) grammars, then the LALR parser will have to interface with some hand-written code when it encounters the special LALR(2) situation in the input language.
in 1965 in a paper, "On the Translation of Languages from Left to Right". LR parsers have the potential of providing the highest parsing speed of all parsing algorithms. However, LR parsers were once considered impractical because algorithms to generate LR parsers of tractable size were not known until the mid 1970s.
LALR parsing was invented by Frank DeRemer in 1969 in a paper, "Practical LR(k) Translators". LALR parsers offer the same high performance of LR parsers, and produce much smaller tables than the early LR parser generation algorithms of the late 1960s. For this reason, the LALR algorithm became popular, and LALR parsers are the ones most often generated by compiler-compiler
s such as yacc
and GNU bison
.
(Compiler-compilers such as Menhir and HYacc that generate true LR parser tables using Pager's algorithm have appeared in recent years but have not seen widespread adoption -- their chief advantage is that they do not create spurious reduce/reduce conflicts for unambiguous deterministic grammars.)
constructs the LR(0) state machine first and then computes the lookahead sets for all rules in the grammar, checking for ambiguity. An SLR parser generator computes the lookahead sets by examining the BNF grammar (these are called follow sets). However, an LALR parser generator computes the lookahead sets by examining the LR(0) state machine (these are called LALR lookahead sets, which are more accurate and less likely to cause a conflict/ambiguity to be reported by the parser generator).
Like SLR, LALR is a refinement to the technique for constructing LR(0) parse tables. While SLR uses follow sets to construct reduce actions, LALR uses lookahead sets, which are more specific because they take more of the parsing context into account. Follow sets are associated with a symbol, while lookahead sets are specific to an LR(0) item and a parser state.
Specifically, the follow set for a given LR(0) item I in a given parser state S contains all symbols that are allowed by the grammar to appear after I's left-hand-side nonterminal. In contrast, the lookahead set for item I in state S contains only those symbols that are allowed by the grammar to appear after I's right-hand-side has been parsed starting from state S. follow(I) is effectively the union of the lookahead sets for all LR(0) items with the same left-hand-side as I, regardless of parser states or right-hand-sides, therefore losing all context information. Because the lookahead set is specific to a particular parsing context, it can be more selective, therefore allowing finer distinctions than the follow set.
Unfortunately, computing LALR lookahead sets is much more complicated than SLR. Frank DeRemer and Tom Pennello wrote a paper about this, published in SIGPLAN Notices in 1979 and in TOPLAS in 1982, called "Efficient Computation Of LALR(1) Look-Ahead Sets".
Computer science
Computer science or computing science is the study of the theoretical foundations of information and computation and of practical techniques for their implementation and application in computer systems...
, an LALR parser (or look-ahead LR parser) is a type of LR parser
LR parser
In computer science, an LR parser is a parser that reads input from Left to right and produces a Rightmost derivation. The term LR parser is also used; where the k refers to the number of unconsumed "look ahead" input symbols that are used in making parsing decisions...
based on a finite-state-automata concept. The data structure used by an LALR parser is a pushdown automaton
Pushdown automaton
In automata theory, a pushdown automaton is a finite automaton that can make use of a stack containing data.- Operation :Pushdown automata differ from finite state machines in two ways:...
(PDA). A deterministic PDA is a deterministic-finite automaton (DFA) with the addition of a stack for a memory, indicating which states the parser has passed through to arrive at the current state. Because of the stack, a PDA can recognize grammars that would be impossible with a DFA; for example, a PDA can determine whether an expression has any unmatched parentheses, whereas an automaton with no stack would require an infinite number of states due to unlimited nesting of parentheses.
LALR parsers are driven by a parser table in a finite-state machine (FSM) format. An FSM is tedious enough for humans to construct by hand that it is more convenient to use a software tool called an LALR parser generator
LALR parser generator
An LALR parser generator is a software tool that reads a BNF grammar and creates an LALR parser which is capable of parsing text files written in the computer language defined by the BNF grammar...
to generate a parser table automatically from a grammar in Backus–Naur Form
Backus–Naur form
In computer science, BNF is a notation technique for context-free grammars, often used to describe the syntax of languages used in computing, such as computer programming languages, document formats, instruction sets and communication protocols.It is applied wherever exact descriptions of...
which defines the syntax of the computer language the parser will process. The parser table is often generated in source code format in a computer language (such as C++ or Java). When the parser (with parser table) is compiled and/or executed, it will recognize text files written in the language defined by the BNF grammar.
LALR parsers are generated from LALR grammars, which are capable of defining a larger class of languages than
SLR grammar
SLR Grammar
In computer science, an SLR grammar is a class of formal grammar that can be parsed by a simple LR parser.- Definitions :A rule of the form B → y. within a state of a SLR automaton is said to be irreducible or in a reduce state because it has been completely expanded and is incapable of undergoing...
s, but not as large a class as LR grammars. Real computer languages can often be expressed as LALR(1) grammars, and in cases where a LALR(1) grammar is insufficient, usually an LALR(2) grammar is adequate. If the parser generator handles only LALR(1) grammars, then the LALR parser will have to interface with some hand-written code when it encounters the special LALR(2) situation in the input language.
History
LR parsing was invented by Donald KnuthDonald Knuth
Donald Ervin Knuth is a computer scientist and Professor Emeritus at Stanford University.He is the author of the seminal multi-volume work The Art of Computer Programming. Knuth has been called the "father" of the analysis of algorithms...
in 1965 in a paper, "On the Translation of Languages from Left to Right". LR parsers have the potential of providing the highest parsing speed of all parsing algorithms. However, LR parsers were once considered impractical because algorithms to generate LR parsers of tractable size were not known until the mid 1970s.
LALR parsing was invented by Frank DeRemer in 1969 in a paper, "Practical LR(k) Translators". LALR parsers offer the same high performance of LR parsers, and produce much smaller tables than the early LR parser generation algorithms of the late 1960s. For this reason, the LALR algorithm became popular, and LALR parsers are the ones most often generated by compiler-compiler
Compiler-compiler
A compiler-compiler or compiler generator is a tool that creates a parser, interpreter, or compiler from some form of formal description of a language and machine...
s such as yacc
Yacc
The computer program yacc is a parser generator developed by Stephen C. Johnson at AT&T for the Unix operating system. The name is an acronym for "Yet Another Compiler Compiler." It generates a parser based on an analytic grammar written in a notation similar to BNF.Yacc used to be available as...
and GNU bison
GNU bison
GNU bison, commonly known as Bison, is a parser generator that is part of the GNU Project. Bison reads a specification of a context-free language, warns about any parsing ambiguities, and generates a parser which reads sequences of tokens and decides whether the sequence conforms to the syntax...
.
(Compiler-compilers such as Menhir and HYacc that generate true LR parser tables using Pager's algorithm have appeared in recent years but have not seen widespread adoption -- their chief advantage is that they do not create spurious reduce/reduce conflicts for unambiguous deterministic grammars.)
Generating LALR Parsers
Similar to an SLR parser generator, an LALR parser generatorLALR parser generator
An LALR parser generator is a software tool that reads a BNF grammar and creates an LALR parser which is capable of parsing text files written in the computer language defined by the BNF grammar...
constructs the LR(0) state machine first and then computes the lookahead sets for all rules in the grammar, checking for ambiguity. An SLR parser generator computes the lookahead sets by examining the BNF grammar (these are called follow sets). However, an LALR parser generator computes the lookahead sets by examining the LR(0) state machine (these are called LALR lookahead sets, which are more accurate and less likely to cause a conflict/ambiguity to be reported by the parser generator).
Like SLR, LALR is a refinement to the technique for constructing LR(0) parse tables. While SLR uses follow sets to construct reduce actions, LALR uses lookahead sets, which are more specific because they take more of the parsing context into account. Follow sets are associated with a symbol, while lookahead sets are specific to an LR(0) item and a parser state.
Specifically, the follow set for a given LR(0) item I in a given parser state S contains all symbols that are allowed by the grammar to appear after I's left-hand-side nonterminal. In contrast, the lookahead set for item I in state S contains only those symbols that are allowed by the grammar to appear after I's right-hand-side has been parsed starting from state S. follow(I) is effectively the union of the lookahead sets for all LR(0) items with the same left-hand-side as I, regardless of parser states or right-hand-sides, therefore losing all context information. Because the lookahead set is specific to a particular parsing context, it can be more selective, therefore allowing finer distinctions than the follow set.
Unfortunately, computing LALR lookahead sets is much more complicated than SLR. Frank DeRemer and Tom Pennello wrote a paper about this, published in SIGPLAN Notices in 1979 and in TOPLAS in 1982, called "Efficient Computation Of LALR(1) Look-Ahead Sets".
Advantages
- An LALR parser can be automatically generated from an LALR grammar.
- An LALR grammar can be used to define many computer languages.
- An LALR parser is small.
- An LALR parser is fast (if the parsing algorithm uses a matrix parser-table format).
- An LALR parser is linear in speed (i.e. the speed is based on the size of the input text file only and not based on the size of the language being recognized).
- The LALR grammar provides valuable documentation of the language being recognized.
- Error recovery may already be built-in to the parser.
- Generalized error messages may already be built into the parser.
- Abstract-syntax-tree construction may already be built into the parser.
- Recognition of context-sensitive language constructs may already be built into the parser.
Disadvantages
- Software engineers are required to use an LALR parser generator, which may or may not be user friendly and may require some learning time.
- Implementing meaningful error messages in the parser may be very difficult or impossible.
- Understanding the parsing algorithm is often quite difficult.
- If an error occurs, it may be difficult to determine whether it's in the grammar or the parser code.
- If there is an error in the parser generator, this may be very difficult to fix.
See also
- LR parserLR parserIn computer science, an LR parser is a parser that reads input from Left to right and produces a Rightmost derivation. The term LR parser is also used; where the k refers to the number of unconsumed "look ahead" input symbols that are used in making parsing decisions...
- parser generator
- LALR parser generatorLALR parser generatorAn LALR parser generator is a software tool that reads a BNF grammar and creates an LALR parser which is capable of parsing text files written in the computer language defined by the BNF grammar...
- Comparison of parser generatorsComparison of parser generatorsThis is a list of notable lexer generators and parser generators for various language classes.- Regular languages :- Deterministic context-free languages :-Parsing expression grammars, deterministic boolean grammars:...
- Lookahead in parsing
- LL parserLL parserAn LL parser is a top-down parser for a subset of the context-free grammars. It parses the input from Left to right, and constructs a Leftmost derivation of the sentence...
External links
- Parsing Simulator This simulator is used to generate parsing tables LALR and resolve the exercises of the book.
- JS/CC JavaScript based implementation of a LALR(1) parser generator, which can be run in a web-browser or from the command-line.
- LALR(1) tutorial A flash card like tutorial on LALR(1) parsing.