GNU bison
Encyclopedia
GNU bison, commonly known as Bison, is a parser generator that is part of the GNU Project
GNU Project
The GNU Project is a free software, mass collaboration project, announced on September 27, 1983, by Richard Stallman at MIT. It initiated GNU operating system development in January, 1984...

. Bison reads a specification of a context-free language
Context-free language
In formal language theory, a context-free language is a language generated by some context-free grammar. The set of all context-free languages is identical to the set of languages accepted by pushdown automata.-Examples:...

, warns about any parsing
Parsing
In computer science and linguistics, parsing, or, more formally, syntactic analysis, is the process of analyzing a text, made of a sequence of tokens , to determine its grammatical structure with respect to a given formal grammar...

 ambiguities, and generates a parser (either in C
C (programming language)
C is a general-purpose computer programming language developed between 1969 and 1973 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system....

, C++
C++
C++ is a statically typed, free-form, multi-paradigm, compiled, general-purpose programming language. It is regarded as an intermediate-level language, as it comprises a combination of both high-level and low-level language features. It was developed by Bjarne Stroustrup starting in 1979 at Bell...

, or Java
Java (programming language)
Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities...

) which reads sequences of tokens and decides whether the sequence conforms to the syntax specified by the grammar. Bison generates LALR parsers. Bison also supports “Generalized Left-to-right Rightmost” (GLR
GLR parser
A GLR parser is an extension of an LR parser algorithm to handle nondeterministic and ambiguous grammars. First described in a 1984 paper by Masaru Tomita, it has also been referred to as a "parallel parser"...

) parsers for grammars that are not LALR.

In POSIX
POSIX
POSIX , an acronym for "Portable Operating System Interface", is a family of standards specified by the IEEE for maintaining compatibility between operating systems...

 mode, Bison is compatible with yacc
Yacc
The computer program yacc is a parser generator developed by Stephen C. Johnson at AT&T for the Unix operating system. The name is an acronym for "Yet Another Compiler Compiler." It generates a parser based on an analytic grammar written in a notation similar to BNF.Yacc used to be available as...

, but also supports several improvements over this earlier program. flex
Flex lexical analyser
flex is a free software alternative to lex. It is frequently used with the free Bison parser generator. Unlike Bison, flex is not part of the GNU Project. Flex was written in C by Vern Paxson around 1987...

, an automatic lexical analyser
Lexical analysis
In computer science, lexical analysis is the process of converting a sequence of characters into a sequence of tokens. A program or function which performs lexical analysis is called a lexical analyzer, lexer or scanner...

, is often used with Bison, to tokenise input data and provide Bison with tokens.

Bison is licensed as free software
Free software
Free software, software libre or libre software is software that can be used, studied, and modified without restriction, and which can be copied and redistributed in modified or unmodified form either without restriction, or with restrictions that only ensure that further recipients can also do...

 and is available in source code
Source code
In computer science, source code is text written using the format and syntax of the programming language that it is being written in. Such a language is specially designed to facilitate the work of computer programmers, who specify the actions to be performed by a computer mostly by writing source...

 form. Earlier releases of bison used to stipulate that parts of its output were protected under the GPL, due to the inclusion of the yyparse function from the original source code in the output. However, an exception was made, to allow other licenses to apply to the use of the output.

A complete reentrant parser example

The example shows how to use bison and flex to do a simple calculator (only add and multiply) and provides also an example for creating an abstract syntax tree
Abstract syntax tree
In computer science, an abstract syntax tree , or just syntax tree, is a tree representation of the abstract syntactic structure of source code written in a programming language. Each node of the tree denotes a construct occurring in the source code. The syntax is 'abstract' in the sense that it...

. The next two files provide definition and implementation of the syntax tree functions.


/*
* Expression.h
* Definition of the structure used to build the syntax tree.
*/
  1. ifndef __EXPRESSION_H__
  2. define __EXPRESSION_H__


/**
* @brief The operation type
*/
typedef enum tagEOperationType
{
eVALUE,
eMULTIPLY,
ePLUS
}EOperationType;

/**
* @brief The expression structure
*/
typedef struct tagSExpression
{
EOperationType type;///< type of operation

int value;///< valid only when type is eVALUE
struct tagSExpression* left; ///< left side of the tree
struct tagSExpression* right;///< right side of the tree
}SExpression;

/**
* @brief It creates an identifier
* @param value The number value
* @return The expression or NULL in case of no memory
*/
SExpression* createNumber(int value);

/**
* @brief It creates an operation
* @param type The operation type
* @param left The left operand
* @param right The right operand
* @return The expression or NULL in case of no memory
*/
SExpression* createOperation(
EOperationType type,
SExpression *left,
SExpression *right);

/**
* @brief Deletes a expression
* @param b The expression
*/
void deleteExpression(SExpression *b);
  1. endif // __EXPRESSION_H__




/*
* Expression.c
* Implementation of functions used to build the syntax tree.
*/
  1. include "Expression.h"

  1. include


/**
* @brief Allocates space for expression
* @return The expression or NULL if not enough memory
*/
static SExpression* allocateExpression
{
SExpression* b = malloc(sizeof *b);

if( b

NULL ) return NULL;

b->type = eVALUE;
b->value = 0;

b->left = NULL;
b->right = NULL;

return b;
}

SExpression* createNumber(int value)
{
SExpression* b = allocateExpression;

if( b

NULL ) return NULL;

b->type = eVALUE;
b->value = value;

return b;
}

SExpression *createOperation(
EOperationType type,
SExpression *left,
SExpression *right)
{
SExpression* b = allocateExpression;

if( b

NULL ) return NULL;

b->type = type;
b->left = left;
b->right = right;

return b;
}

void deleteExpression(SExpression *b)
{
if (b

NULL) return;

deleteExpression(b->left);
deleteExpression(b->right);

free(b);
}


Since the tokens are provided by flex we must provide the means to communicate between the parser and the lexer. This is accomplished by defining the YYSTYPE. More details on this can be found on the flex manual.


/*
* TypeParser.h
* Definition of the structure used internally by the parser and lexer
* to exchange data.
*/
  1. ifndef __TYPE_PARSER_H__
  2. define __TYPE_PARSER_H__

  1. include "Expression.h"


/**
* @brief The structure used by flex and bison
*/
typedef union tagTypeParser
{
SExpression *expression;
int value;
}STypeParser;

// define the type for flex and bison
  1. define YYSTYPE STypeParser

  1. endif // __TYPE_PARSER_H__



Since in this sample we use the reentrant version of both flex and yacc we are forced to provide parameters for the yylex function, when called from yyparse.


/*
* ParserParam.h
* Definitions of the parameters for the reentrant functions
* of flex (yylex) and bison (yyparse)
*/
  1. ifndef __PARSERPARAM_H__
  2. define __PARSERPARAM_H__

  1. ifndef YY_NO_UNISTD_H
  2. define YY_NO_UNISTD_H 1
  3. endif // YY_NO_UNISTD_H

  1. include "TypeParser.h"
  2. include "Lexer.h"
  3. include "Expression.h"


/**
* @brief structure given as argument to the reentrant 'yyparse' function.
*/
typedef struct tagSParserParam
{
yyscan_t scanner;
SExpression *expression;
}SParserParam;

// the parameter name (of the reentrant 'yyparse' function)
// data is a pointer to a 'SParserParam' structure
  1. define YYPARSE_PARAM data


// the argument for the 'yylex' function
  1. define YYLEX_PARAM ((SParserParam*)data)->scanner

  1. endif // __PARSERPARAM_H__



The tokens needed by the bison parser will be generated using flex.


%{

/*
* Lexer.l file
* To generate the lexical analyzer run: "flex --outfile=Lexer.c --header-file=Lexer.h Lexer.l"
*/
  1. include "TypeParser.h"
  2. include "Parser.h"


%}

%option reentrant noyywrap never-interactive nounistd
%option bison-bridge

LPAREN "("
RPAREN ")"
PLUS "+"
MULTIPLY "*"

NUMBER [0-9]+
WS [ \r\n\t]*

%%

{WS} { /* Skip blanks. */ }
{NUMBER} { sscanf(yytext,"%d",&yylval->value); return TOKEN_NUMBER; }

{MULTIPLY} { return TOKEN_MULTIPLY; }
{PLUS} { return TOKEN_PLUS; }
{LPAREN} { return TOKEN_LPAREN; }
{RPAREN} { return TOKEN_RPAREN; }
. { }

%%

int yyerror(const char *msg) { fprintf(stderr,"Error:%s\n",msg); return 0; }


The bison file providing describing the grammar of the numerical expressions.


%{

/*
* Parser.y file
* To generate the parser run: "bison --defines=Parser.h Parser.y"
*/
  1. include "TypeParser.h"
  2. include "ParserParam.h"


%}

%define api.pure

%left '+' TOKEN_PLUS
%left '*' TOKEN_MULTIPLY

%token TOKEN_LPAREN
%token TOKEN_RPAREN
%token TOKEN_PLUS
%token TOKEN_MULTIPLY

%token TOKEN_NUMBER

%type expr

%%

input:
expr { ((SParserParam*)data)->expression = $1; }
;

expr:
expr TOKEN_PLUS expr { $$ = createOperation( ePLUS, $1, $3 ); }
| expr TOKEN_MULTIPLY expr { $$ = createOperation( eMULTIPLY, $1, $3 ); }
| TOKEN_LPAREN expr TOKEN_RPAREN { $$ = $2; }
| TOKEN_NUMBER { $$ = createNumber($1); }


%%


The code needed to obtain the syntax tree using the parser generated by bison and the scanner generated by flex is the following.

  1. include "ParserParam.h"
  2. include "Parser.h"
  3. include "Lexer.h"

  1. include


int yyparse(void *param);

static int initParserParam(SParserParam* param)
{
int ret = 0;

ret = yylex_init(¶m->scanner);
param->expression = NULL;

return ret;
}

static int destroyParserParam(SParserParam* param)
{
return yylex_destroy(param->scanner);
}

SExpression *getAST(const char *expr)
{
SParserParam p;
YY_BUFFER_STATE state;

if ( initParserParam(&p) )
{
// couldn't initialize
return NULL;
}

state = yy_scan_string(expr, p.scanner);

if ( yyparse(&p) )
{
// error parsing
return NULL;
}

yy_delete_buffer(state, p.scanner);

destroyParserParam(&p);

return p.expression;
}

int evaluate(SExpression *e)
{
switch(e->type)
{
case eVALUE:
return e->value;
case eMULTIPLY:
return evaluate(e->left) * evaluate(e->right);
case ePLUS:
return evaluate(e->left) + evaluate(e->right);
default:
// shouldn't be here
return 0;
}
}

int main(void)
{
SExpression *e = NULL;
char test[]=" 4 + 2*10 + 3*( 5 + 1 )";
int result = 0;

e = getAST(test);

result = evaluate(e);

printf("Result of '%s' is %d\n", test, result);

deleteExpression(e);

return 0;
}


Reentrancy

Normally, Bison generates a parser which is not reentrant. In order to achieve reentrancy the declaration %define api.pure must be used. More details on Bison reentrancy can be found in the Bison manual.

Using bison from other languages

Bison can only generate code for C
C (programming language)
C is a general-purpose computer programming language developed between 1969 and 1973 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system....

, C++
C++
C++ is a statically typed, free-form, multi-paradigm, compiled, general-purpose programming language. It is regarded as an intermediate-level language, as it comprises a combination of both high-level and low-level language features. It was developed by Bjarne Stroustrup starting in 1979 at Bell...

 and Java
Java (programming language)
Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities...

. For using the bison generated parser from other languages a language binding
Language binding
In computing, a binding from a programming language to a library or OS service is an API providing that service in the language.Many software libraries are written in systems programming languages such as C or C++...

 tool such as SWIG
SWIG
SWIG is an open source software tool used to connect computer programs or libraries written in C or C++ with scripting languages such as Lua, Perl, PHP, Python, R, Ruby, Tcl, and other languages like C#, Java, Modula-3, Objective Caml, Octave, and Scheme...

 can be used.

Where is it used?

Here is a non-comprehensive list of software built using Bison:
  • The Ruby Programming Language (YARV);
  • The PHP
    PHP
    PHP is a general-purpose server-side scripting language originally designed for web development to produce dynamic web pages. For this purpose, PHP code is embedded into the HTML source document and interpreted by a web server with a PHP processor module, which generates the web page document...

     Programming Language (Zend Parser);
  • GCC
    GNU Compiler Collection
    The GNU Compiler Collection is a compiler system produced by the GNU Project supporting various programming languages. GCC is a key component of the GNU toolchain...

     started out using Bison, but switched to a hand-written parser in 2000;
  • The Go
    Go (programming language)
    Go is a compiled, garbage-collected, concurrent programming language developed by Google Inc.The initial design of Go was started in September 2007 by Robert Griesemer, Rob Pike, and Ken Thompson. Go was officially announced in November 2009. In May 2010, Rob Pike publicly stated that Go was being...

     Programming Language (GC);
  • Bash
    Bash
    Bash is a Unix shell written by Brian Fox for the GNU Project as a free software replacement for the Bourne shell . Released in 1989, it has been distributed widely as the shell for the GNU operating system and as the default shell on Linux, Mac OS X and Darwin...

    shell uses a yacc grammar for parsing the command input. It is distributed with bison-generated files.

External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK