GNU bison
Encyclopedia
GNU bison, commonly known as Bison, is a parser generator that is part of the GNU Project
. Bison reads a specification of a context-free language
, warns about any parsing
ambiguities, and generates a parser (either in C
, C++
, or Java
) which reads sequences of tokens and decides whether the sequence conforms to the syntax specified by the grammar. Bison generates LALR parsers. Bison also supports “Generalized Left-to-right Rightmost” (GLR
) parsers for grammars that are not LALR.
In POSIX
mode, Bison is compatible with yacc
, but also supports several improvements over this earlier program. flex
, an automatic lexical analyser
, is often used with Bison, to tokenise input data and provide Bison with tokens.
Bison is licensed as free software
and is available in source code
form. Earlier releases of bison used to stipulate that parts of its output were protected under the GPL, due to the inclusion of the yyparse function from the original source code in the output. However, an exception was made, to allow other licenses to apply to the use of the output.
. The next two files provide definition and implementation of the syntax tree functions.
Since the tokens are provided by flex we must provide the means to communicate between the parser and the lexer. This is accomplished by defining the YYSTYPE. More details on this can be found on the flex manual.
Since in this sample we use the reentrant version of both flex and yacc we are forced to provide parameters for the yylex function, when called from yyparse.
The tokens needed by the bison parser will be generated using flex.
The bison file providing describing the grammar of the numerical expressions.
The code needed to obtain the syntax tree using the parser generated by bison and the scanner generated by flex is the following.
, C++
and Java
. For using the bison generated parser from other languages a language binding
tool such as SWIG
can be used.
GNU Project
The GNU Project is a free software, mass collaboration project, announced on September 27, 1983, by Richard Stallman at MIT. It initiated GNU operating system development in January, 1984...
. Bison reads a specification of a context-free language
Context-free language
In formal language theory, a context-free language is a language generated by some context-free grammar. The set of all context-free languages is identical to the set of languages accepted by pushdown automata.-Examples:...
, warns about any parsing
Parsing
In computer science and linguistics, parsing, or, more formally, syntactic analysis, is the process of analyzing a text, made of a sequence of tokens , to determine its grammatical structure with respect to a given formal grammar...
ambiguities, and generates a parser (either in C
C (programming language)
C is a general-purpose computer programming language developed between 1969 and 1973 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system....
, C++
C++
C++ is a statically typed, free-form, multi-paradigm, compiled, general-purpose programming language. It is regarded as an intermediate-level language, as it comprises a combination of both high-level and low-level language features. It was developed by Bjarne Stroustrup starting in 1979 at Bell...
, or Java
Java (programming language)
Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities...
) which reads sequences of tokens and decides whether the sequence conforms to the syntax specified by the grammar. Bison generates LALR parsers. Bison also supports “Generalized Left-to-right Rightmost” (GLR
GLR parser
A GLR parser is an extension of an LR parser algorithm to handle nondeterministic and ambiguous grammars. First described in a 1984 paper by Masaru Tomita, it has also been referred to as a "parallel parser"...
) parsers for grammars that are not LALR.
In POSIX
POSIX
POSIX , an acronym for "Portable Operating System Interface", is a family of standards specified by the IEEE for maintaining compatibility between operating systems...
mode, Bison is compatible with yacc
Yacc
The computer program yacc is a parser generator developed by Stephen C. Johnson at AT&T for the Unix operating system. The name is an acronym for "Yet Another Compiler Compiler." It generates a parser based on an analytic grammar written in a notation similar to BNF.Yacc used to be available as...
, but also supports several improvements over this earlier program. flex
Flex lexical analyser
flex is a free software alternative to lex. It is frequently used with the free Bison parser generator. Unlike Bison, flex is not part of the GNU Project. Flex was written in C by Vern Paxson around 1987...
, an automatic lexical analyser
Lexical analysis
In computer science, lexical analysis is the process of converting a sequence of characters into a sequence of tokens. A program or function which performs lexical analysis is called a lexical analyzer, lexer or scanner...
, is often used with Bison, to tokenise input data and provide Bison with tokens.
Bison is licensed as free software
Free software
Free software, software libre or libre software is software that can be used, studied, and modified without restriction, and which can be copied and redistributed in modified or unmodified form either without restriction, or with restrictions that only ensure that further recipients can also do...
and is available in source code
Source code
In computer science, source code is text written using the format and syntax of the programming language that it is being written in. Such a language is specially designed to facilitate the work of computer programmers, who specify the actions to be performed by a computer mostly by writing source...
form. Earlier releases of bison used to stipulate that parts of its output were protected under the GPL, due to the inclusion of the yyparse function from the original source code in the output. However, an exception was made, to allow other licenses to apply to the use of the output.
A complete reentrant parser example
The example shows how to use bison and flex to do a simple calculator (only add and multiply) and provides also an example for creating an abstract syntax treeAbstract syntax tree
In computer science, an abstract syntax tree , or just syntax tree, is a tree representation of the abstract syntactic structure of source code written in a programming language. Each node of the tree denotes a construct occurring in the source code. The syntax is 'abstract' in the sense that it...
. The next two files provide definition and implementation of the syntax tree functions.
Since the tokens are provided by flex we must provide the means to communicate between the parser and the lexer. This is accomplished by defining the YYSTYPE. More details on this can be found on the flex manual.
Since in this sample we use the reentrant version of both flex and yacc we are forced to provide parameters for the yylex function, when called from yyparse.
The tokens needed by the bison parser will be generated using flex.
The bison file providing describing the grammar of the numerical expressions.
The code needed to obtain the syntax tree using the parser generated by bison and the scanner generated by flex is the following.
Reentrancy
Normally, Bison generates a parser which is not reentrant. In order to achieve reentrancy the declaration %define api.pure must be used. More details on Bison reentrancy can be found in the Bison manual.Using bison from other languages
Bison can only generate code for CC (programming language)
C is a general-purpose computer programming language developed between 1969 and 1973 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system....
, C++
C++
C++ is a statically typed, free-form, multi-paradigm, compiled, general-purpose programming language. It is regarded as an intermediate-level language, as it comprises a combination of both high-level and low-level language features. It was developed by Bjarne Stroustrup starting in 1979 at Bell...
and Java
Java (programming language)
Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities...
. For using the bison generated parser from other languages a language binding
Language binding
In computing, a binding from a programming language to a library or OS service is an API providing that service in the language.Many software libraries are written in systems programming languages such as C or C++...
tool such as SWIG
SWIG
SWIG is an open source software tool used to connect computer programs or libraries written in C or C++ with scripting languages such as Lua, Perl, PHP, Python, R, Ruby, Tcl, and other languages like C#, Java, Modula-3, Objective Caml, Octave, and Scheme...
can be used.
Where is it used?
Here is a non-comprehensive list of software built using Bison:- The Ruby Programming Language (YARV);
- The PHPPHPPHP is a general-purpose server-side scripting language originally designed for web development to produce dynamic web pages. For this purpose, PHP code is embedded into the HTML source document and interpreted by a web server with a PHP processor module, which generates the web page document...
Programming Language (Zend Parser); - GCCGNU Compiler CollectionThe GNU Compiler Collection is a compiler system produced by the GNU Project supporting various programming languages. GCC is a key component of the GNU toolchain...
started out using Bison, but switched to a hand-written parser in 2000; - The GoGo (programming language)Go is a compiled, garbage-collected, concurrent programming language developed by Google Inc.The initial design of Go was started in September 2007 by Robert Griesemer, Rob Pike, and Ken Thompson. Go was officially announced in November 2009. In May 2010, Rob Pike publicly stated that Go was being...
Programming Language (GC); - BashBashBash is a Unix shell written by Brian Fox for the GNU Project as a free software replacement for the Bourne shell . Released in 1989, it has been distributed widely as the shell for the GNU operating system and as the default shell on Linux, Mac OS X and Darwin...
shell uses a yacc grammar for parsing the command input. It is distributed with bison-generated files.