Extended Backus–Naur form
Encyclopedia
In computer science
, Extended Backus–Naur Form (EBNF) is a family of metasyntax
notations used for expressing context-free grammar
s: that is, a formal way to describe computer programming language
s and other formal language
s. They are extensions of the basic Backus–Naur Form
(BNF) metasyntax notation.
The earliest EBNF was originally developed by Niklaus Wirth
. However, many variants of EBNF are in use. The International Organization for Standardization
has adopted an EBNF standard (ISO/IEC 14977). This article uses EBNF as specified by the ISO for examples applying to all EBNF:s. Other EBNF variants use somewhat different syntactic conventions.
that expresses the grammar of a computer language. An EBNF consists of terminal symbol and non-terminal production rules which are the restrictions governing how terminal symbols can be combined into a legal sequence. Examples of terminal symbols include alphanumeric characters, punctuation marks, and white space characters.
The EBNF defines production rule
s where sequences of symbols are respectively assigned to a nonterminal
:
digit excluding zero = "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" ;
digit = "0" | digit excluding zero ;
This production rule defines the nonterminal digit which is on the left side of the assignment. The vertical bar represents an alternative and the terminal symbols are enclosed with quotation marks followed by a semicolon as terminating character. Hence a Digit is a 0 or a digit excluding zero that can be 1 or 2 or 3 and so forth until 9.
A production rule can also include a sequence of terminals or nonterminals, each separated by a comma:
twelve = "1" , "2" ;
two hundred one = "2" , "0" , "1" ;
three hundred twelve = "3" , twelve ;
twelve thousand two hundred one = twelve , two hundred one ;
Expressions that may be omitted or repeated can be represented through curly braces { ... }:
natural number = digit excluding zero , { digit } ;
In this case, the strings 1, 2, ...,10,...,12345,... are correct expressions. To represent this, everything that is set within the curly braces may be repeated arbitrarily often, including not at all.
An option can be represented through squared brackets [ ... ]. That is, everything that is set within the square brackets may be present just once, or not at all:
integer = "0" | [ "-" ] , natural number ;
Therefore an integer is a zero (0) or a natural number that may be preceded by an optional minus sign.
EBNF also provides, among other things, the syntax to describe repetitions of a specified number of times, to exclude some part of a production, or to insert comments in an EBNF grammar.
(* a simple program syntax in EBNF − Wikipedia *)
program = 'PROGRAM' , white space , identifier , white space ,
'BEGIN' , white space ,
{ assignment , ";" , white space } ,
'END.' ;
identifier = alphabetic character , { alphabetic character | digit } ;
number = [ "-" ] , digit , { digit } ;
string = '"' , { all characters − '"' } , '"' ;
assignment = identifier , ":=" , ( number | identifier | string ) ;
alphabetic character = "A" | "B" | "C" | "D" | "E" | "F" | "G"
| "H" | "I" | "J" | "K" | "L" | "M" | "N"
| "O" | "P" | "Q" | "R" | "S" | "T" | "U"
| "V" | "W" | "X" | "Y" | "Z" ;
digit = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" ;
white space = ? white space characters ? ;
all characters = ? all visible characters ? ;
A syntactically correct program then would be:
PROGRAM DEMO1
BEGIN
A0:=3;
B:=45;
H:=-100023;
C:=A;
D123:=B34A;
BABOON:=GIRAFFE;
TEXT:="Hello world!";
END.
The language can easily be extended with control flow
s, arithmetical expressions, and Input/Output instructions. Then a small, usable programming language would be developed.
The BNF used the symbols (<, >, |, ::=) for itself, but did not include quotes around terminal strings. This prevented these characters from being used in the languages, and required a special symbol for the empty string. In EBNF, terminals are strictly enclosed within quotation marks ("..." or '...'). The angle brackets ("<...>") for nonterminals
can be omitted.
BNF syntax can only represent a rule in one line, whereas in EBNF a terminating character, the semicolon, marks the end of a rule.
Furthermore, EBNF includes mechanisms for enhancements, defining the number of repetitions, excluding alternatives, comments, etc.
It should be noted that EBNF is not "more powerful" than the BNF: As a matter of principle, any grammar
defined in EBNF can also be represented in BNF (though representations in the latter are generally lengthier).
2. The normal character representing each operator of Extended BNF and its implied precedence is (highest precedence at the top):
* repetition-symbol
- except-symbol
, concatenate-symbol
| definition-separator-symbol
= defining-symbol
; terminator-symbol
3. The normal precedence is overridden by the following bracket pairs:
' first-quote-symbol first-quote-symbol '
" second-quote-symbol second-quote-symbol "
(* start-comment-symbol end-comment-symbol *)
( start-group-symbol end-group-symbol )
[ start-option-symbol end-option-symbol ]
{ start-repeat-symbol end-repeat-symbol }
? special-sequence-symbol special-sequence-symbol ?
The first-quote-symbol is the apostrophe
as defined by ISO/IEC 646:1991, that is to say Unicode U+0027 ('); the font used in ISO/IEC 14977:1996(E) renders it very much like the acute, Unicode U+00B4 (´), so confusion sometimes arises. However, the ISO Extended BNF standard invokes ISO/IEC 646:1991, "ISO 7-bit coded character set for information interchange", as a normative reference and makes no mention of any other character sets, so formally, there is no confusion with Unicode characters outside the 7-bit ASCII range.
As examples, the following syntax rules illustrate the facilities for expressing repetition:
aa = "A";
bb = 3 * aa, "B";
cc = 3 * [aa], "C";
dd = {aa}, "D";
ee = aa, {aa}, "E";
ff = 3 * aa, 3 * [aa], "F";
gg = {3 * aa}, "G";
Terminal strings defined by these rules are as follows:
aa: A
bb: AAAB
cc: C AC AAC AAAC
dd: D AD AAD AAAD AAAAD etc.
ee: AE AAE AAAE AAAAE AAAAAE etc.
ff: AAAF AAAAF AAAAAF AAAAAAF
gg: G AAAG AAAAAAG etc.
space = ? US-ASCII character 32 ?;
The second facility for extension is using the fact that parentheses cannot in EBNF be placed next to identifiers (they must be concatenated with them). The following is valid EBNF:
something = foo, ( bar );
The following is not valid EBNF:
something = foo ( bar );
Therefore, an extension of EBNF could use that notation. For example, in a Lisp grammar, function application could be defined by the following rule:
function application = list( symbol , { expression } );
Computer science
Computer science or computing science is the study of the theoretical foundations of information and computation and of practical techniques for their implementation and application in computer systems...
, Extended Backus–Naur Form (EBNF) is a family of metasyntax
Metasyntax
A metasyntax describes the allowable structure and composition of phrases and sentences of a metalanguage, which is used to describe either a natural language or a computer programming language...
notations used for expressing context-free grammar
Context-free grammar
In formal language theory, a context-free grammar is a formal grammar in which every production rule is of the formwhere V is a single nonterminal symbol, and w is a string of terminals and/or nonterminals ....
s: that is, a formal way to describe computer programming language
Programming language
A programming language is an artificial language designed to communicate instructions to a machine, particularly a computer. Programming languages can be used to create programs that control the behavior of a machine and/or to express algorithms precisely....
s and other formal language
Formal language
A formal language is a set of words—that is, finite strings of letters, symbols, or tokens that are defined in the language. The set from which these letters are taken is the alphabet over which the language is defined. A formal language is often defined by means of a formal grammar...
s. They are extensions of the basic Backus–Naur Form
Backus–Naur form
In computer science, BNF is a notation technique for context-free grammars, often used to describe the syntax of languages used in computing, such as computer programming languages, document formats, instruction sets and communication protocols.It is applied wherever exact descriptions of...
(BNF) metasyntax notation.
The earliest EBNF was originally developed by Niklaus Wirth
Niklaus Wirth
Niklaus Emil Wirth is a Swiss computer scientist, best known for designing several programming languages, including Pascal, and for pioneering several classic topics in software engineering. In 1984 he won the Turing Award for developing a sequence of innovative computer languages.-Biography:Wirth...
. However, many variants of EBNF are in use. The International Organization for Standardization
International Organization for Standardization
The International Organization for Standardization , widely known as ISO, is an international standard-setting body composed of representatives from various national standards organizations. Founded on February 23, 1947, the organization promulgates worldwide proprietary, industrial and commercial...
has adopted an EBNF standard (ISO/IEC 14977). This article uses EBNF as specified by the ISO for examples applying to all EBNF:s. Other EBNF variants use somewhat different syntactic conventions.
Basics
EBNF is a codeCode
A code is a rule for converting a piece of information into another form or representation , not necessarily of the same type....
that expresses the grammar of a computer language. An EBNF consists of terminal symbol and non-terminal production rules which are the restrictions governing how terminal symbols can be combined into a legal sequence. Examples of terminal symbols include alphanumeric characters, punctuation marks, and white space characters.
The EBNF defines production rule
Production rule
Production rule may refer to:*For production rules used in business rule engines, cognitive modeling and artificial intelligence, see production system*For production rules that expand nodes in formal grammars, see formal grammar-See also:...
s where sequences of symbols are respectively assigned to a nonterminal
Terminal and nonterminal symbols
In computer science, terminal and nonterminal symbols are the lexical elements used in specifying the production rules that constitute a formal grammar...
:
digit excluding zero = "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" ;
digit = "0" | digit excluding zero ;
This production rule defines the nonterminal digit which is on the left side of the assignment. The vertical bar represents an alternative and the terminal symbols are enclosed with quotation marks followed by a semicolon as terminating character. Hence a Digit is a 0 or a digit excluding zero that can be 1 or 2 or 3 and so forth until 9.
A production rule can also include a sequence of terminals or nonterminals, each separated by a comma:
twelve = "1" , "2" ;
two hundred one = "2" , "0" , "1" ;
three hundred twelve = "3" , twelve ;
twelve thousand two hundred one = twelve , two hundred one ;
Expressions that may be omitted or repeated can be represented through curly braces { ... }:
natural number = digit excluding zero , { digit } ;
In this case, the strings 1, 2, ...,10,...,12345,... are correct expressions. To represent this, everything that is set within the curly braces may be repeated arbitrarily often, including not at all.
An option can be represented through squared brackets [ ... ]. That is, everything that is set within the square brackets may be present just once, or not at all:
integer = "0" | [ "-" ] , natural number ;
Therefore an integer is a zero (0) or a natural number that may be preceded by an optional minus sign.
EBNF also provides, among other things, the syntax to describe repetitions of a specified number of times, to exclude some part of a production, or to insert comments in an EBNF grammar.
Table of symbols
The following represents a proposed standard.Usage | Notation |
---|---|
definition | = |
concatenation | , |
termination | ; |
alternation | > |
option | [ ... ] |
repetition | { ... } |
grouping | ( ... ) |
terminal string | " ... " |
terminal string | ' ... ' |
comment | (* ... *) |
special sequence | ? ... ? |
exception | - |
Another example
A Pascal-like programming language that allows only assignments can be defined in EBNF as follows:(* a simple program syntax in EBNF − Wikipedia *)
program = 'PROGRAM' , white space , identifier , white space ,
'BEGIN' , white space ,
{ assignment , ";" , white space } ,
'END.' ;
identifier = alphabetic character , { alphabetic character | digit } ;
number = [ "-" ] , digit , { digit } ;
string = '"' , { all characters − '"' } , '"' ;
assignment = identifier , ":=" , ( number | identifier | string ) ;
alphabetic character = "A" | "B" | "C" | "D" | "E" | "F" | "G"
| "H" | "I" | "J" | "K" | "L" | "M" | "N"
| "O" | "P" | "Q" | "R" | "S" | "T" | "U"
| "V" | "W" | "X" | "Y" | "Z" ;
digit = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" ;
white space = ? white space characters ? ;
all characters = ? all visible characters ? ;
A syntactically correct program then would be:
PROGRAM DEMO1
BEGIN
A0:=3;
B:=45;
H:=-100023;
C:=A;
D123:=B34A;
BABOON:=GIRAFFE;
TEXT:="Hello world!";
END.
The language can easily be extended with control flow
Control flow
In computer science, control flow refers to the order in which the individual statements, instructions, or function calls of an imperative or a declarative program are executed or evaluated....
s, arithmetical expressions, and Input/Output instructions. Then a small, usable programming language would be developed.
Advantages of EBNF over BNF
The BNF had the problem that options and repetitions could not be directly expressed. Instead, they needed the use of an intermediate rule or alternative production defined to be either nothing or the optional production for option, or either the repeated production or itself, recursively, for repetition. The same constructs can still be used in EBNF.The BNF used the symbols (<, >, |, ::=) for itself, but did not include quotes around terminal strings. This prevented these characters from being used in the languages, and required a special symbol for the empty string. In EBNF, terminals are strictly enclosed within quotation marks ("..." or '...'). The angle brackets ("<...>") for nonterminals
Terminal and nonterminal symbols
In computer science, terminal and nonterminal symbols are the lexical elements used in specifying the production rules that constitute a formal grammar...
can be omitted.
BNF syntax can only represent a rule in one line, whereas in EBNF a terminating character, the semicolon, marks the end of a rule.
Furthermore, EBNF includes mechanisms for enhancements, defining the number of repetitions, excluding alternatives, comments, etc.
It should be noted that EBNF is not "more powerful" than the BNF: As a matter of principle, any grammar
Grammar
In linguistics, grammar is the set of structural rules that govern the composition of clauses, phrases, and words in any given natural language. The term refers also to the study of such rules, and this field includes morphology, syntax, and phonology, often complemented by phonetics, semantics,...
defined in EBNF can also be represented in BNF (though representations in the latter are generally lengthier).
Conventions
1. The following conventions are used:- Each meta-identifier of Extended BNF is written as one or more words joined together by hyphenHyphenThe hyphen is a punctuation mark used to join words and to separate syllables of a single word. The use of hyphens is called hyphenation. The hyphen should not be confused with dashes , which are longer and have different uses, or with the minus sign which is also longer...
s - A meta-identifier ending with -symbol is the name of a terminal symbol of Extended BNF.
2. The normal character representing each operator of Extended BNF and its implied precedence is (highest precedence at the top):
* repetition-symbol
- except-symbol
, concatenate-symbol
| definition-separator-symbol
= defining-symbol
; terminator-symbol
3. The normal precedence is overridden by the following bracket pairs:
' first-quote-symbol first-quote-symbol '
" second-quote-symbol second-quote-symbol "
(* start-comment-symbol end-comment-symbol *)
( start-group-symbol end-group-symbol )
[ start-option-symbol end-option-symbol ]
{ start-repeat-symbol end-repeat-symbol }
? special-sequence-symbol special-sequence-symbol ?
The first-quote-symbol is the apostrophe
Apostrophe
The apostrophe is a punctuation mark, and sometimes a diacritic mark, in languages that use the Latin alphabet or certain other alphabets...
as defined by ISO/IEC 646:1991, that is to say Unicode U+0027 ('); the font used in ISO/IEC 14977:1996(E) renders it very much like the acute, Unicode U+00B4 (´), so confusion sometimes arises. However, the ISO Extended BNF standard invokes ISO/IEC 646:1991, "ISO 7-bit coded character set for information interchange", as a normative reference and makes no mention of any other character sets, so formally, there is no confusion with Unicode characters outside the 7-bit ASCII range.
As examples, the following syntax rules illustrate the facilities for expressing repetition:
aa = "A";
bb = 3 * aa, "B";
cc = 3 * [aa], "C";
dd = {aa}, "D";
ee = aa, {aa}, "E";
ff = 3 * aa, 3 * [aa], "F";
gg = {3 * aa}, "G";
Terminal strings defined by these rules are as follows:
aa: A
bb: AAAB
cc: C AC AAC AAAC
dd: D AD AAD AAAD AAAAD etc.
ee: AE AAE AAAE AAAAE AAAAAE etc.
ff: AAAF AAAAF AAAAAF AAAAAAF
gg: G AAAG AAAAAAG etc.
EBNF extensibility
According to the ISO 14977 standard EBNF is meant to be extensible, and two facilities are mentioned. The first is part of EBNF grammar, the special sequence, which is arbitrary text enclosed with question marks. The interpretation of the text inside a special sequence is beyond the scope of the EBNF standard. For example, the space character could be defined by the following rule:space = ? US-ASCII character 32 ?;
The second facility for extension is using the fact that parentheses cannot in EBNF be placed next to identifiers (they must be concatenated with them). The following is valid EBNF:
something = foo, ( bar );
The following is not valid EBNF:
something = foo ( bar );
Therefore, an extension of EBNF could use that notation. For example, in a Lisp grammar, function application could be defined by the following rule:
function application = list( symbol , { expression } );
Related work
- The W3C used a different EBNF to specify the XMLXMLExtensible Markup Language is a set of rules for encoding documents in machine-readable form. It is defined in the XML 1.0 Specification produced by the W3C, and several other related specifications, all gratis open standards....
syntax. - The British Standards Institution published a standard for an EBNF: BS 6154 in 1981.
- The IETF uses Augmented BNFAugmented Backus–Naur formIn computer science, Augmented Backus–Naur Form is a metalanguage based on Backus–Naur Form , but consisting of its own syntax and derivation rules. The motive principle for ABNF is to describe a formal system of a language to be used as a bidirectional communications protocol...
(ABNF), specified in RFC 5234.
See also
- Augmented Backus–Naur FormAugmented Backus–Naur formIn computer science, Augmented Backus–Naur Form is a metalanguage based on Backus–Naur Form , but consisting of its own syntax and derivation rules. The motive principle for ABNF is to describe a formal system of a language to be used as a bidirectional communications protocol...
- Backus–Naur FormBackus–Naur formIn computer science, BNF is a notation technique for context-free grammars, often used to describe the syntax of languages used in computing, such as computer programming languages, document formats, instruction sets and communication protocols.It is applied wherever exact descriptions of...
- Regular expressionRegular expressionIn computing, a regular expression provides a concise and flexible means for "matching" strings of text, such as particular characters, words, or patterns of characters. Abbreviations for "regular expression" include "regex" and "regexp"...
- Spirit Parser FrameworkSpirit Parser FrameworkThe Spirit Parser Framework is an object oriented recursive descent parser generator framework implemented using template metaprogramming techniques. Expression templates allow users to approximate the syntax of Extended Backus Naur Form completely in C++...
- Wirth syntax notationWirth syntax notationWirth syntax notation is a metasyntax, that is, a formal way to describe formal languages. Originally proposed by Niklaus Wirth in 1977 as an alternative to Backus-Naur form , it has several advantages over BNF in that it can be defined using itself, it contains an explicit iteration construct,...
- Phrase structure rulesPhrase structure rulesPhrase-structure rules are a way to describe a given language's syntax. They are used to break down a natural language sentence into its constituent parts namely phrasal categories and lexical categories...
, the direct equivalent of EBNF in natural languages.
External links
- Article "EBNF: A Notation to Describe Syntax (PDF)" by Richard E. PattisRichard E. PattisRichard E. Pattis was a professor of Computer Science at Carnegie Mellon University. He is the author of the Karel programming language, and published Karel the Robot: A gentle introduction to the art of programming....
describing the functions and syntax of EBNF - Article "BNF and EBNF: What are they and how do they work?" by Lars Marius Garshol
- Article "The Naming of Parts" by John E. Simpson
- ISO/IEC 14977 : 1996(E)
- RFC 4234 – Augmented BNF for Syntax Specifications: ABNF
- BNF/EBNF variants – a table by Pete Jinks comparing several syntaxes.
- Create syntax diagrams from EBNF
- EBNF Parser & Renderer
- Parser & Renderer in PHP5