History of compiler writing
Encyclopedia
In computing
Computing
Computing is usually defined as the activity of using and improving computer hardware and software. It is the computer-specific part of information technology...

, a compiler
Compiler
A compiler is a computer program that transforms source code written in a programming language into another computer language...

 is a computer program
Computer program
A computer program is a sequence of instructions written to perform a specified task with a computer. A computer requires programs to function, typically executing the program's instructions in a central processor. The program has an executable form that the computer can use directly to execute...

 that transforms source code
Source code
In computer science, source code is text written using the format and syntax of the programming language that it is being written in. Such a language is specially designed to facilitate the work of computer programmers, who specify the actions to be performed by a computer mostly by writing source...

 written in a programming language
Programming language
A programming language is an artificial language designed to communicate instructions to a machine, particularly a computer. Programming languages can be used to create programs that control the behavior of a machine and/or to express algorithms precisely....

 or computer language (the source language), into another computer language (the target language, often having a binary form known as object code
Object code
Object code, or sometimes object module, is what a computer compiler produces. In a general sense object code is a sequence of statements in a computer language, usually a machine code language....

). The most common reason for wanting to transform source code is to create an executable
Executable
In computing, an executable file causes a computer "to perform indicated tasks according to encoded instructions," as opposed to a data file that must be parsed by a program to be meaningful. These instructions are traditionally machine code instructions for a physical CPU...

 program.

Any program written in a high level programming language must be translated to object code before it can be executed, so all programmers using such a language use a compiler or an interpreter
Interpreter (computing)
In computer science, an interpreter normally means a computer program that executes, i.e. performs, instructions written in a programming language...

. Thus, compilers are very important to programmers. Any improvement to a compiler leads to a large number of improved programs.

Compilers are large and complex programs, but systematic analysis and research by computer scientists
Computer science
Computer science or computing science is the study of the theoretical foundations of information and computation and of practical techniques for their implementation and application in computer systems...

 has led to a clearer understanding of compiler construction and a large body of theory has been developed around them. Research into compiler construction has led to tools that make it much easier to create compilers, so that today computer science students can create their own small language and develop a simple compiler for it in a few weeks.

First compilers

Software for early computers was primarily written in assembly language
Assembly language
An assembly language is a low-level programming language for computers, microprocessors, microcontrollers, and other programmable devices. It implements a symbolic representation of the machine codes and other constants needed to program a given CPU architecture...

. It is usually more productive for a programmer to use a high level language, and programs written in a high level language are able to be reused on different kinds of computers. Even so, it took a while for compilers to become established, because they generated code that did not perform as well as hand-written assembler, they were daunting development projects in their own right, and the very limited memory
Computer storage
Computer data storage, often called storage or memory, refers to computer components and recording media that retain digital data. Data storage is one of the core functions and fundamental components of computers....

 capacity of early computers created many technical problems for practical compiler implementations.

The first compiler was written by Grace Hopper
Grace Hopper
Rear Admiral Grace Murray Hopper was an American computer scientist and United States Navy officer. A pioneer in the field, she was one of the first programmers of the Harvard Mark I computer, and developed the first compiler for a computer programming language...

, in 1952, for the A-0 System language. The term compiler was coined by Hopper. The FORTRAN
Fortran
Fortran is a general-purpose, procedural, imperative programming language that is especially suited to numeric computation and scientific computing...

 team led by John W. Backus at IBM
IBM
International Business Machines Corporation or IBM is an American multinational technology and consulting corporation headquartered in Armonk, New York, United States. IBM manufactures and sells computer hardware and software, and it offers infrastructure, hosting and consulting services in areas...

 is generally credited as having introduced the first complete compiler, in 1957. The first FORTRAN compiler took 18 person-years to create.

By 1960, an extended Fortran compiler, ALTAC, was also available on the Philco
Philco
Philco, the Philadelphia Storage Battery Company , was a pioneer in early battery, radio, and television production as well as former employer of Philo Farnsworth, inventor of cathode ray tube television...

 2000, so it is probable that a Fortran program was compiled for both computer architecture
Computer architecture
In computer science and engineering, computer architecture is the practical art of selecting and interconnecting hardware components to create computers that meet functional, performance and cost goals and the formal modelling of those systems....

s in mid-1960. The first known demonstrated cross-platform
Cross-platform
In computing, cross-platform, or multi-platform, is an attribute conferred to computer software or computing methods and concepts that are implemented and inter-operate on multiple computer platforms...

 high level language was COBOL
COBOL
COBOL is one of the oldest programming languages. Its name is an acronym for COmmon Business-Oriented Language, defining its primary domain in business, finance, and administrative systems for companies and governments....

. In a demonstration in December 1960, a COBOL program was compiled and executed on both the UNIVAC II
UNIVAC II
The UNIVAC II was an improvement to the UNIVAC I that UNIVAC first delivered in 1958. The improvements included core memory of 2000 to 10000 words, UNISERVO II tape drives which could use either the old UNIVAC I metal tapes or the new PET tapes, and some of the circuits were transistorized...

 and the RCA
RCA
RCA Corporation, founded as the Radio Corporation of America, was an American electronics company in existence from 1919 to 1986. The RCA trademark is currently owned by the French conglomerate Technicolor SA through RCA Trademark Management S.A., a company owned by Technicolor...

 501.

The COBOL compiler for the UNIVAC II was probably the first to be written in a high level language, namely FLOW-MATIC
FLOW-MATIC
FLOW-MATIC, originally known as B-0 , was the first English-like data processing language. It was developed for the UNIVAC I at Remington Rand under Grace Hopper.-Development:...

, by a team led by Grace Hopper.

Self-hosting compilers

Like any other software, there are benefits from implementing a compiler in a high-level language. In particular, a compiler can be self-hosted
Self-hosting
The term self-hosting was coined to refer to the use of a computer program as part of the toolchain or operating system that produces new versions of that same program—for example, a compiler that can compile its own source code. Self-hosting software is commonplace on personal computers and larger...

 - that is, written in the programming language it compiles. Building a self-hosting compiler is a bootstrapping
Bootstrapping (compilers)
In computer science, bootstrapping is the process of writing a compiler in the target programming language which it is intended to compile...

 problem—the first such compiler for a language must be compiled either by a compiler written in a different language, or compiled by running the compiler in an interpreter
Interpreter (computing)
In computer science, an interpreter normally means a computer program that executes, i.e. performs, instructions written in a programming language...


ELIAC

The Navy Electronics Laboratory International ALGOL
ALGOL
ALGOL is a family of imperative computer programming languages originally developed in the mid 1950s which greatly influenced many other languages and became the de facto way algorithms were described in textbooks and academic works for almost the next 30 years...

 Compiler
or NELIAC
NELIAC
The Navy Electronics Laboratory International ALGOL Compiler or NELIAC is a dialect and compiler implementation of the ALGOL 58 programming language developed by the Naval Electronics Laboratory in 1958....

 was a dialect
Programming language dialect
A dialect of a programming language is a variation or extension of the language that does not change its intrinsic nature. With languages such as Scheme and Forth, standards may be considered insufficient, inadequate or even illegitimate by implementors, so often they will deviate from the...

 and compiler implementation of the ALGOL 58
ALGOL 58
ALGOL 58, originally known as IAL, is one of the family of ALGOL computer programming languages. It was an early compromise design soon superseded by ALGOL 60...

 programming language
Programming language
A programming language is an artificial language designed to communicate instructions to a machine, particularly a computer. Programming languages can be used to create programs that control the behavior of a machine and/or to express algorithms precisely....

 developed by the Naval Electronics Laboratory
Naval Electronics Laboratory
The U.S. Navy Electronics Laboratory was created in 1945, with the consolidation of the Navy Radio and Sound Lab and its wartime partner, the University of California Division of War Research...

 in 1958.

ELIAC was the brainchild of Harry Huskey
Harry Huskey
Harry Douglas Huskey is an American computer designer pioneer.Huskey was born in the Smoky Mountains region of North Carolina and grew up in Idaho. He gained his Master's and then his PhD in 1943 from the Ohio State University on Contributions to the Problem of Geocze...

 — then Chairman of the ACM
Association for Computing Machinery
The Association for Computing Machinery is a learned society for computing. It was founded in 1947 as the world's first scientific and educational computing society. Its membership is more than 92,000 as of 2009...

 and a well known computer scientist
Computer science
Computer science or computing science is the study of the theoretical foundations of information and computation and of practical techniques for their implementation and application in computer systems...

, and supported by Maury Halstead, the head of the computational center at NEL. The earliest version was implemented on the prototype USQ-17
AN/USQ-17
The AN/USQ-17 or Naval Tactical Data System computer referred to in Sperry Rand documents as the Univac M-460, was Seymour Cray's last design for UNIVAC...

 computer (called the Countess) at the laboratory. It was the world's first self-compiling compiler. This means that the compiler was first coded in simplified form in assembly language (the bootstrap), and then re-written in its own language, compiled by this bootstrap compiler, and re-compiled by itself, making the bootstrap obsolete.

Lisp

The first self-hosting
Self-hosting
The term self-hosting was coined to refer to the use of a computer program as part of the toolchain or operating system that produces new versions of that same program—for example, a compiler that can compile its own source code. Self-hosting software is commonplace on personal computers and larger...

 compiler (excluding assemblers) was written for Lisp
Lisp programming language
Lisp is a family of computer programming languages with a long history and a distinctive, fully parenthesized syntax. Originally specified in 1958, Lisp is the second-oldest high-level programming language in widespread use today; only Fortran is older...

 by Tim Hart and Mike Levin at MIT
Massachusetts Institute of Technology
The Massachusetts Institute of Technology is a private research university located in Cambridge, Massachusetts. MIT has five schools and one college, containing a total of 32 academic departments, with a strong emphasis on scientific and technological education and research.Founded in 1861 in...

 in 1962. They wrote a Lisp compiler in Lisp, testing it inside an existing Lisp interpreter. Once they had improved the compiler to the point where it could compile its own source code, it was self-hosting.
The compiler as it exists on the standard compiler tape is a machine language program that was obtained by having the S-expression
S-expression
S-expressions or sexps are list-based data structures that represent semi-structured data. An S-expression may be a nested list of smaller S-expressions. S-expressions are probably best known for their use in the Lisp family of programming languages...

 definition of the compiler work on itself through the interpreter.
(AI Memo 39)

This technique is only possible when an interpreter already exists for the very same language that is to be compiled. It borrows directly from the notion of running a program on itself as input, which is also used in various proofs in theoretical computer science
Theoretical computer science
Theoretical computer science is a division or subset of general computer science and mathematics which focuses on more abstract or mathematical aspects of computing....

, such as the proof that the halting problem
Halting problem
In computability theory, the halting problem can be stated as follows: Given a description of a computer program, decide whether the program finishes running or continues to run forever...

 is undecidable.

Context-free grammars and parsers

A parser
Parsing
In computer science and linguistics, parsing, or, more formally, syntactic analysis, is the process of analyzing a text, made of a sequence of tokens , to determine its grammatical structure with respect to a given formal grammar...

 is an important component of a compiler. It parses the source code of a computer programming language to create some form of internal representation. Programming languages tend to be specified in terms of a context-free grammar
Context-free grammar
In formal language theory, a context-free grammar is a formal grammar in which every production rule is of the formwhere V is a single nonterminal symbol, and w is a string of terminals and/or nonterminals ....

 because fast and efficient parsers can be written for them. Parsers can be written by hand or generated by a parser generator. A context-free grammar provides a simple and precise mechanism for describing the block structure of a program - that is, how programming language constructs are built from smaller blocks. The formalism of context-free grammars was developed in the mid-1950s by Noam Chomsky
Noam Chomsky
Avram Noam Chomsky is an American linguist, philosopher, cognitive scientist, and activist. He is an Institute Professor and Professor in the Department of Linguistics & Philosophy at MIT, where he has worked for over 50 years. Chomsky has been described as the "father of modern linguistics" and...

.

Block structure was introduced into computer programming languages by the ALGOL project (1957–1960), which, as a consequence, also featured a context-free grammar to describe the resulting ALGOL syntax.

Context-free grammars are simple enough to allow the construction of efficient parsing algorithms which, for a given string, determine whether and how it can be generated from the grammar. If a programming language designer is willing to work within some limited subsets of context-free grammars, more efficient parsers are possible.

LR parsing

The LR parser
LR parser
In computer science, an LR parser is a parser that reads input from Left to right and produces a Rightmost derivation. The term LR parser is also used; where the k refers to the number of unconsumed "look ahead" input symbols that are used in making parsing decisions...

 (left to right) was invented by Donald Knuth
Donald Knuth
Donald Ervin Knuth is a computer scientist and Professor Emeritus at Stanford University.He is the author of the seminal multi-volume work The Art of Computer Programming. Knuth has been called the "father" of the analysis of algorithms...

 in 1965 in a paper, "On the Translation of Languages from Left to Right". An LR parser is a parser that reads input from Left to right (as it would appear if visually displayed) and produces a Rightmost derivation. The term LR(k) parser is also used, where k refers to the number of unconsumed "look ahead
Lookahead
Lookahead is a tool in algorithms for looking ahead a few more input items before making a cost effective decision at one stage of the algorithm.- Lookahead in search problems :...

" input symbols that are used in making parsing decisions.

Knuth proved that LR(k) grammars can be parsed with an execution time essentially proportional to the length of the program, and that every LR(k) grammar for k > 1 can be mechanically transformed into an LR(1) grammar for the same language. In other words, it is only necessary to have one symbol lookahead to parse any deterministic context-free grammar
Deterministic context-free grammar
In formal grammar theory, the deterministic context-free grammars are a proper subset of the context-free grammars. The deterministic context-free grammars are those a deterministic pushdown automaton can recognize...

(DCFG).

Korenjak (1969) was the first to show parsers for programming languages could be produced using these techniques. Frank DeRemer devised the more practical SLR and LALR
LALR parser
In computer science, an LALR parser is a type of LR parser based on a finite-state-automata concept. The data structure used by an LALR parser is a pushdown automaton...

 techniques, published in his PhD dissertation at MIT in 1969. This was an important breakthrough, because LR(k) translators, as defined by Donald Knuth, were much too large for implementation on computer systems in the 1960s and 1970s.

In practice, LALR offers a good solution, because LALR(1) grammars are more powerful than SLR(1) and LL(1) grammars. LR(1) grammars are more powerful than LALR(1), however, canonical LR(1) parsers can be extremely large in size and are not considered practical. Minimal LR(1) parsers are small in size and comparable to LALR(1) parsers. The syntax of many programming language
Programming language
A programming language is an artificial language designed to communicate instructions to a machine, particularly a computer. Programming languages can be used to create programs that control the behavior of a machine and/or to express algorithms precisely....

s are defined by a grammar that is LALR(1), and for this reason LALR parsers are often used by compilers to perform syntax analysis of source code.

A recursive ascent parser
Recursive ascent parser
In computing, recursive ascent parsing is a technique for implementing an LALR parser which uses mutually-recursive functions rather than tables. Thus, the parser is directly encoded in the host language similar to recursive descent...

 implements an LALR parser using mutually-recursive functions rather than tables. Thus, the parser is directly encoded in the host language similar to recursive descent. Direct encoding usually yields a parser which is faster than its table-driven equivalent for the same reason that compilation is faster than interpretation. It is also (in principle) possible to hand edit a recursive ascent parser, whereas a tabular implementation is nigh unreadable to the average human.

Recursive ascent was first described by Thomas Pennello in his article "Very fast LR parsing" in 1986. The technique was later expounded upon by G.H. Roberts in 1988 as well as in an article by Leermakers, Augusteijn, Kruseman Aretz in 1992 in the journal Theoretical Computer Science.

LL parsing

An LL parser
LL parser
An LL parser is a top-down parser for a subset of the context-free grammars. It parses the input from Left to right, and constructs a Leftmost derivation of the sentence...

 parses the input from Left to right, and constructs a Leftmost derivation of the sentence (hence LL, as opposed to LR). The class of grammars which are parsable in this way is known as the LL grammars. LL grammars are an even more restricted class of context-free grammars than LR grammars. Nevertheless, they are of great interest to compiler writers, because such a parser is simple and efficient to implement.

LL(k) grammars can be parsed by a recursive descent parser
Recursive descent parser
A recursive descent parser is a top-down parser built from a set of mutually-recursive procedures where each such procedure usually implements one of the production rules of the grammar...

 which is usually coded by hand.

The design of ALGOL sparked investigation of recursive descent, since the ALGOL language itself is recursive. The concept of recursive descent parsing was discussed in the January 1961 issue of CACM
Communications of the ACM
Communications of the ACM is the flagship monthly journal of the Association for Computing Machinery . First published in 1957, CACM is sent to all ACM members, currently numbering about 80,000. The articles are intended for readers with backgrounds in all areas of computer science and information...

 in separate papers by A.A. Grau and Edgar T. "Ned" Irons.
Richard Waychoff independently conceived of and used recursive descent in the Burroughs ALGOL compiler in March 1961.

The idea of LL(1) grammars was introduced by Lewis and Stearns (1968).

Recursive descent was popularised by Niklaus Wirth
Niklaus Wirth
Niklaus Emil Wirth is a Swiss computer scientist, best known for designing several programming languages, including Pascal, and for pioneering several classic topics in software engineering. In 1984 he won the Turing Award for developing a sequence of innovative computer languages.-Biography:Wirth...

 with PL/0
PL/0
At least two programming languages are known as PL/0. One is a subset of IBM's general-purpose programming language PL/I.The other PL/0, covered here, is similar to but much simpler than the general-purpose programming language Pascal, intended as an educational programming language. It serves as...

, an educational programming language
Educational programming language
An educational programming language is a programming language that is designed primarily as a learning instrument and not so much as a tool for writing programs for real-world work.-Learning paths:...

 used to teach compiler construction in the 1970s.

LR parsing can handle a larger range of languages than LL parsing, and is also better at error reporting, i.e. it detects syntactic errors when the input does not conform to the grammar as soon as possible.

Earley parser

In 1970, Jay Earley
Jay Earley
Jay Earley is an American computer scientist and psychologist. He invented the Earley parser in his early career in computer science. Later he became a clinical psychologist specializing in group therapy and Internal Family Systems Therapy ....

 invented what came to be known as the Earley parser
Earley parser
In computer science, the Earley parser is an algorithm for parsing strings that belong to a given context-free language, named after its inventor, Jay Earley...

. Earley parsers are appealing because they can parse all context-free language
Context-free language
In formal language theory, a context-free language is a language generated by some context-free grammar. The set of all context-free languages is identical to the set of languages accepted by pushdown automata.-Examples:...

s reasonably efficiently.

Grammar description languages

John Backus proposed "metalinguistic formulas"
to describe the syntax of the new programming language IAL, known today as ALGOL 58
ALGOL 58
ALGOL 58, originally known as IAL, is one of the family of ALGOL computer programming languages. It was an early compromise design soon superseded by ALGOL 60...

 (1959). Backus's work was based on the Post canonical system
Post canonical system
A Post canonical system, as created by Emil Post, is a string-manipulation system that starts with finitely-many strings and repeatedly transforms them by applying a finite set of specified rules of a certain form, thus generating a formal language...

 devised by Emil Post.

Further development of ALGOL led to ALGOL 60
ALGOL 60
ALGOL 60 is a member of the ALGOL family of computer programming languages. It gave rise to many other programming languages, including BCPL, B, Pascal, Simula, C, and many others. ALGOL 58 introduced code blocks and the begin and end pairs for delimiting them...

; in its report (1963), Peter Naur
Peter Naur
Peter Naur is a Danish pioneer in computer science and Turing award winner. His last name is the N in the BNF notation , used in the description of the syntax for most programming languages...

 named Backus's notation Backus Normal Form (BNF), and simplified it to minimize the character set used. However, Donald Knuth argued that BNF should rather be read as Backus–Naur Form
Backus–Naur form
In computer science, BNF is a notation technique for context-free grammars, often used to describe the syntax of languages used in computing, such as computer programming languages, document formats, instruction sets and communication protocols.It is applied wherever exact descriptions of...

, and that has become the commonly accepted usage.

Niklaus Wirth
Niklaus Wirth
Niklaus Emil Wirth is a Swiss computer scientist, best known for designing several programming languages, including Pascal, and for pioneering several classic topics in software engineering. In 1984 he won the Turing Award for developing a sequence of innovative computer languages.-Biography:Wirth...

 defined Extended Backus–Naur Form
Extended Backus–Naur form
In computer science, Extended Backus–Naur Form is a family of metasyntax notations used for expressing context-free grammars: that is, a formal way to describe computer programming languages and other formal languages. They are extensions of the basic Backus–Naur Form metasyntax notation.The...

 (EBNF), a refined version of BNF, in the early 1970s for PL/0. Augmented Backus–Naur Form
Augmented Backus–Naur form
In computer science, Augmented Backus–Naur Form is a metalanguage based on Backus–Naur Form , but consisting of its own syntax and derivation rules. The motive principle for ABNF is to describe a formal system of a language to be used as a bidirectional communications protocol...

 (ABNF) is another variant. Both EBNF and ABNF are widely used to specify the grammar of programming languages, as the inputs to parser generators, and in other fields such as defining communication protocols.

Parser Generators

A parser generator or compiler-compiler is a program that takes a description of the formal grammar
Formal grammar
A formal grammar is a set of formation rules for strings in a formal language. The rules describe how to form strings from the language's alphabet that are valid according to the language's syntax...

 of a programming language and produces a parser for that language.

The first compiler-compiler to use that name was written by Tony Brooker
Tony Brooker
Tony Brooker graduated in Mathematics from Imperial College in 1945 and returned there in 1947 as Assistant Lecturer. His first computer project was the construction of a fast multiplier unit from electro-mechanical relays. This was taken over by Professor K D Tocher and incorporated into ICCE, the...

 in 1960 and was used to create compilers for the Atlas
Atlas Computer (Manchester)
The Atlas Computer was a joint development between the University of Manchester, Ferranti, and Plessey. The first Atlas, installed at Manchester University and officially commissioned in 1962, was one of the world's first supercomputers, considered to be the most powerful computer in the world at...

 computer at the University of Manchester
University of Manchester
The University of Manchester is a public research university located in Manchester, United Kingdom. It is a "red brick" university and a member of the Russell Group of research-intensive British universities and the N8 Group...

, including the Atlas Autocode
Atlas Autocode
Atlas Autocode was a programming language developed around 1965 at Manchester University for the Atlas Computer. It was developed by Tony Brooker and Derrick Morris as an improvement on the ALGOL programming languages, removing some of Algol's poorer features such as "passing parameters by name"...

 compiler. However it was rather different from modern compiler-compilers, and today would probably be described as being somewhere between a highly customisable generic compiler and an extensible-syntax language
Extensible programming
Extensible programming is a term used in computer science to describe a style of computer programming that focuses on mechanisms to extend the programming language, compiler and runtime environment. Extensible programming languages, supporting this style of programming, were an active area of work...

. The name 'compiler-compiler' was far more appropriate for Brooker's system than it is for most modern compiler-compilers, which are more accurately described as parser generators. It is almost certain that the "Compiler Compiler" name has entered common use due to Yacc
Yacc
The computer program yacc is a parser generator developed by Stephen C. Johnson at AT&T for the Unix operating system. The name is an acronym for "Yet Another Compiler Compiler." It generates a parser based on an analytic grammar written in a notation similar to BNF.Yacc used to be available as...

 rather than Brooker's work being remembered.

In the early 1960s, Robert McClure at Texas Instruments
Texas Instruments
Texas Instruments Inc. , widely known as TI, is an American company based in Dallas, Texas, United States, which develops and commercializes semiconductor and computer technology...

 invented a compiler-compiler called TMG, the name taken from "transmogrification". In the following years TMG was ported
Porting
In computer science, porting is the process of adapting software so that an executable program can be created for a computing environment that is different from the one for which it was originally designed...

 to several UNIVAC and IBM mainframe computers.

The Multics
Multics
Multics was an influential early time-sharing operating system. The project was started in 1964 in Cambridge, Massachusetts...

 project, a joint venture between MIT and Bell Labs
Bell Labs
Bell Laboratories is the research and development subsidiary of the French-owned Alcatel-Lucent and previously of the American Telephone & Telegraph Company , half-owned through its Western Electric manufacturing subsidiary.Bell Laboratories operates its...

, was one of the first to develop an operating system
Operating system
An operating system is a set of programs that manage computer hardware resources and provide common services for application software. The operating system is the most important type of system software in a computer system...

 in a high level language. PL/I
PL/I
PL/I is a procedural, imperative computer programming language designed for scientific, engineering, business and systems programming applications...

 was chosen as the language, but an external supplier could not supply a working compiler. The Multics team developed their own subset dialect of PL/I
PL/I
PL/I is a procedural, imperative computer programming language designed for scientific, engineering, business and systems programming applications...

 known as Early PL/I (EPL) as their implementation language in 1964. TMG was ported to GE-600 series
GE-600 series
The GE-600 series was a family of 36-bit mainframe computers originating in the 1960s, built by General Electric . When GE left the mainframe business the line was sold to Honeywell, who built similar systems into the 1990s as the division moved to Groupe Bull and then NEC.-Architecture:The 600...

 and used to develop EPL by Douglas McIlroy
Douglas McIlroy
Malcolm Douglas McIlroy is a mathematician, engineer, and programmer. As of 2007 he is an Adjunct Professor of Computer Science at Dartmouth College. Dr...

, Robert Morris
Robert Morris (cryptographer)
Robert Morris , was an American cryptographer and computer scientist. -Family and Education:Morris was born in Boston, Massachusetts. His parents were Walter W. Morris, a salesman, and Helen Kelly Morris...

, and others.

Not long after Ken Thompson
Ken Thompson
Kenneth Lane Thompson , commonly referred to as ken in hacker circles, is an American pioneer of computer science...

 wrote the first version of Unix
Unix
Unix is a multitasking, multi-user computer operating system originally developed in 1969 by a group of AT&T employees at Bell Labs, including Ken Thompson, Dennis Ritchie, Brian Kernighan, Douglas McIlroy, and Joe Ossanna...

 for the PDP-7
PDP-7
The DEC PDP-7 is a minicomputer produced by Digital Equipment Corporation. Introduced in 1965, it was the first to use their Flip-Chip technology. With a cost of only $72,000 USD, it was cheap but powerful by the standards of the time. The PDP-7 was the third of Digital's 18-bit machines, with...

 in 1969, Doug McIlroy created the new system's first higher-level language: an implementation of McClure's TMG. TMG was also the compiler definition tool used by Ken Thompson
Ken Thompson
Kenneth Lane Thompson , commonly referred to as ken in hacker circles, is an American pioneer of computer science...

 to write the compiler for the B language
B programming language
B is a programming language that was developed at Bell Labs. It is almost extinct, as it was replaced with the C language. It was mostly the work of Ken Thompson, with contributions from Dennis Ritchie, and first appeared circa 1969.-History:...

 on his PDP-7 in 1970. B was the immediate ancestor of C.

An early LALR parser generator
LALR parser generator
An LALR parser generator is a software tool that reads a BNF grammar and creates an LALR parser which is capable of parsing text files written in the computer language defined by the BNF grammar...

 was called "TWS", created by Frank DeRemer and Tom Pennello.

XPL

XPL is a dialect of the PL/I
PL/I
PL/I is a procedural, imperative computer programming language designed for scientific, engineering, business and systems programming applications...

 programming language
Programming language
A programming language is an artificial language designed to communicate instructions to a machine, particularly a computer. Programming languages can be used to create programs that control the behavior of a machine and/or to express algorithms precisely....

, developed in 1967, used for the development of compiler
Compiler
A compiler is a computer program that transforms source code written in a programming language into another computer language...

s for computer languages. It was designed and implemented by a team with William McKeeman, James J. Horning
Jim Horning
James J. "Jim" Horning is an American computer scientist and ACM Fellow.Jim Horning received a PhD in computer science from Stanford University in 1969 for a thesis entitled A Study of Grammatical Inference. He was a founding member, and later Chairman, of the Computer Systems Research Group at the...

, and David B. Wortman at Stanford University
Stanford University
The Leland Stanford Junior University, commonly referred to as Stanford University or Stanford, is a private research university on an campus located near Palo Alto, California. It is situated in the northwestern Santa Clara Valley on the San Francisco Peninsula, approximately northwest of San...

 and the University of California, Santa Cruz
University of California, Santa Cruz
The University of California, Santa Cruz, also known as UC Santa Cruz or UCSC, is a public, collegiate university; one of ten campuses in the University of California...

. It was first announced at the 1968 Fall Joint Computer Conference in San Francisco, California
San Francisco, California
San Francisco , officially the City and County of San Francisco, is the financial, cultural, and transportation center of the San Francisco Bay Area, a region of 7.15 million people which includes San Jose and Oakland...

.

It is the name of both the programming language and the LALR parser generator system (or TWS: translator writing system
Translator (computing)
A Translator is a computer program that translates one programming language instruction into anotherprogramming language instruction without the loss of original meaning. OR, the translator will translate X...

) based on the language.

XPL featured a relatively simple bottom-up compiler
Bottom-up parsing
Bottom-up parsing is a strategy for analyzing unknown information that attempts to identify the most fundamental units first, and then to infer higher-order structures from them...

 system dubbed MSP (mixed strategy precedence) by its authors. It was bootstrapped through Burroughs Algol onto the IBM System/360 computer. Subsequent implementations of XPL featured an SLR(1) parser.

yacc

yacc is a parser generator developed by Stephen C. Johnson
Stephen C. Johnson
Stephen Curtis Johnson spent nearly 20 years at Bell Labs and AT&T where he wrote yacc, lint, spell and the Portable C Compiler machine .Johnson earned his PhD in mathematics but has spent his entire career in computer science...

 at AT&T
AT&T
AT&T Inc. is an American multinational telecommunications corporation headquartered in Whitacre Tower, Dallas, Texas, United States. It is the largest provider of mobile telephony and fixed telephony in the United States, and is also a provider of broadband and subscription television services...

 for the Unix
Unix
Unix is a multitasking, multi-user computer operating system originally developed in 1969 by a group of AT&T employees at Bell Labs, including Ken Thompson, Dennis Ritchie, Brian Kernighan, Douglas McIlroy, and Joe Ossanna...

 operating system. The name is an acronym for "Yet Another
Yet another
Among programmers, yet another is an idiomatic qualifier in the name of a computer program, organisation, or event that is confessedly unoriginal.Stephen C...

 Compiler Compiler." It generates an LALR(1) parser based on a grammar written in a notation similar to Backus-Naur Form.

Johnson worked on yacc in the early 1970s at Bell Labs
Bell Labs
Bell Laboratories is the research and development subsidiary of the French-owned Alcatel-Lucent and previously of the American Telephone & Telegraph Company , half-owned through its Western Electric manufacturing subsidiary.Bell Laboratories operates its...

. He was familiar with TMG and its influence can be seen in yacc and the design of the C programming language. Because Yacc was the default parser generator on most Unix systems, it was widely distributed and used. Developments like GNU Bison
GNU bison
GNU bison, commonly known as Bison, is a parser generator that is part of the GNU Project. Bison reads a specification of a context-free language, warns about any parsing ambiguities, and generates a parser which reads sequences of tokens and decides whether the sequence conforms to the syntax...

 are still in use.

The parser generated by yacc requires a lexical analyzer. Lexical analyzer generators, such as Lex
Lex programming tool
Lex is a computer program that generates lexical analyzers . Lex is commonly used with the yacc parser generator. Lex, originally written by Mike Lesk and Eric Schmidt, is the standard lexical analyzer generator on many Unix systems, and a tool exhibiting its behavior is specified as part of the...

 or Flex
Flex lexical analyser
flex is a free software alternative to lex. It is frequently used with the free Bison parser generator. Unlike Bison, flex is not part of the GNU Project. Flex was written in C by Vern Paxson around 1987...

 are widely available. The IEEE POSIX
POSIX
POSIX , an acronym for "Portable Operating System Interface", is a family of standards specified by the IEEE for maintaining compatibility between operating systems...

 P1003.2 standard defines the functionality and requirements for both Lex and Yacc.

See also

Comparison of parser generators
Comparison of parser generators
This is a list of notable lexer generators and parser generators for various language classes.- Regular languages :- Deterministic context-free languages :-Parsing expression grammars, deterministic boolean grammars:...

 - For a more complete list, which also includes LL, SLR, GLR and LR parser generators.

Cross compilation

A cross compiler runs in one environment but produces object code for another. Cross compilers are used for embedded development, where the target computer has limited capabilities.

An early example of cross compilation was AIMICO, where a FLOW-MATIC program on a UNIVAC II was used to generate assembly language for the IBM 705, which was then assembled on the IBM computer.

The ALGOL 68C
ALGOL 68C
The ALGOL68C computer programming language compiler was developed for the CHAOS OS for the CAP capability computer at Cambridge University in 1971 by Stephen Bourne and Michael Guy as a dialect of ALGOL 68. Other early contributors were Andrew D. Birrell and Ian Walker.The initial compiler was...

 compiler generated ZCODE output, that could then be either compiled into the local machine code by a ZCODE translator or run interpreted. ZCODE is a register-based intermediate language. This ability to interpret or compile ZCODE encouraged the porting of ALGOL 68C to numerous different computer platforms.

Optimizing compilers

Compiler optimization
Compiler optimization
Compiler optimization is the process of tuning the output of a compiler to minimize or maximize some attributes of an executable computer program. The most common requirement is to minimize the time taken to execute a program; a less common one is to minimize the amount of memory occupied...

 is the process of improving the quality of object code without changing the results it produces.

The developers of the first FORTRAN compiler aimed to generate code that was better than the average
hand-coded assembler, so that customers would actually use their product. In one of the first
real compilers, they often succeeded.

Later compilers, like IBM's Fortran IV compiler, placed more priority on good diagnostics and executing more quickly, at the expense of object code optimization. It wasn't until the IBM 360 series that IBM provided two separate compilers became available: a fast executing code checker, and a slower optimizing one.

Frances E. Allen
Frances E. Allen
Frances Elizabeth "Fran" Allen is an American computer scientist and pioneer in the field of optimizing compilers. Her achievements include seminal work in compilers, code optimization, and parallelization...

, working alone and jointly with John Cocke
John Cocke
John Cocke was an American computer scientist recognized for his large contribution to computer architecture and optimizing compiler design. He is considered by many to be "the father of RISC architecture."...

, introduced many of the concepts for optimization. Allen's 1966 paper, Program Optimization, introduced the use of graph data structures
Graph (data structure)
In computer science, a graph is an abstract data structure that is meant to implement the graph and hypergraph concepts from mathematics.A graph data structure consists of a finite set of ordered pairs, called edges or arcs, of certain entities called nodes or vertices...

 to encode program content for optimization. Her 1970 papers, Control Flow Analysis and A Basis for Program Optimization established intervals as the context for efficient and effective data flow analysis and optimization. Her 1971 paper with Cocke, A Catalog of Optimizing Transformations, provided the first description and systematization of optimizing transformations. Her 1973 and 1974 papers on interprocedural data flow analysis extended the analysis to whole programs. Her 1976 paper with Cocke describes one of the two main analysis strategies used in optimizing compilers today.

Allen developed and implemented her methods as part of compilers for the IBM 7030 Stretch-Harvest and the experimental Advanced Computing System
ACS-1
The ACS-1 and ACS-360 are two related supercomputers designed by IBM as part of the IBM Advanced Computing Systems project from 1961 to 1969...

. This work established the feasibility and structure of modern machine- and language-independent optimizers. She went on to establish and lead the PTRAN project on the automatic parallel execution of FORTRAN programs. Her PTRAN team developed new parallelism detection schemes and created the concept of the program dependence graph, the primary structuring method used by most parallelizing compilers.

Programming Languages and their Compilers by John Cocke and Jacob T. Schwartz, published early in 1970, devoted more than 200 pages to optimization algorithms. It included many of the now familiar techniques such as redundant code elimination
Partial redundancy elimination
In compiler theory, partial redundancy elimination is a compiler optimization that eliminates expressions that are redundant on some but not necessarily all paths through a program...

 and strength reduction
Strength reduction
Strength reduction is a compiler optimization where expensive operations are replaced with equivalent but less expensive operations. The classic example of strength reduction converts "strong" multiplications inside a loop into "weaker" additions – something that frequently occurs in array...

.

Peephole Optimization

Peephole optimization is a very simple but effective optimization technique. It was invented by William McKeeman and published in 1965 in CACM. It was used in the XPL
XPL
XPL is a dialect of the PL/I programming language, developed in 1967, used for the development of compilers for computer languages. It was designed and implemented by a team with , James J. Horning, and at Stanford University and the University of California, Santa Cruz...

 compiler that McKeeman helped develop.

Capex COBOL Optimizer

Capex Corporation
Capex Corporation
Capex Corporation was a software house based in Phoenix, Arizona founded by three former employees of General Electric. It was subsequently acquired by Computer Associates.-Products:* Autotab - an early batch spreadsheet program...

 developed the "COBOL Optimizer" in the mid 1970's for COBOL
COBOL
COBOL is one of the oldest programming languages. Its name is an acronym for COmmon Business-Oriented Language, defining its primary domain in business, finance, and administrative systems for companies and governments....

. This type of optimizer depended, in this case, upon knowledge of 'weaknesses' in the standard IBM COBOL compiler, and actually replaced (or patch
Patch (computing)
A patch is a piece of software designed to fix problems with, or update a computer program or its supporting data. This includes fixing security vulnerabilities and other bugs, and improving the usability or performance...

ed) sections of the object code with more efficient code. The replacement code might replace a linear table lookup
Lookup table
In computer science, a lookup table is a data structure, usually an array or associative array, often used to replace a runtime computation with a simpler array indexing operation. The savings in terms of processing time can be significant, since retrieving a value from memory is often faster than...

 with a binary search for example or sometimes simply replace a relatively 'slow' instruction with a known faster one that was otherwise functionally equivalent within its context. This technique is now known as "Strength reduction
Strength reduction
Strength reduction is a compiler optimization where expensive operations are replaced with equivalent but less expensive operations. The classic example of strength reduction converts "strong" multiplications inside a loop into "weaker" additions – something that frequently occurs in array...

". For example on the IBM/360 hardware the CLI instruction was, depending on the particular model, between twice and 5 times as fast as a CLC instruction for single byte comparisons.

Modern compilers typically provide optimization options, so programmers can choose whether or not to execute an optimization pass.

Diagnostics

When a compiler is given a syntactically incorrect program, a good, clear error message is helpful. From the perspective of the compiler writer, it is often difficult to achieve.

The WATFIV Fortran compiler was developed at the University of Waterloo
University of Waterloo
The University of Waterloo is a comprehensive public university in the city of Waterloo, Ontario, Canada. The school was founded in 1957 by Drs. Gerry Hagey and Ira G. Needles, and has since grown to an institution of more than 30,000 students, faculty, and staff...

, Canada
Canada
Canada is a North American country consisting of ten provinces and three territories. Located in the northern part of the continent, it extends from the Atlantic Ocean in the east to the Pacific Ocean in the west, and northward into the Arctic Ocean...

 in the late 1960s. It was designed to give better error messages than IBM's Fortran compilers of the time. In addition, WATFIV was far more usable, because it combined compiling, linking and execution into one step, whereas IBM's compilers had three separate components to run.

PL/C

PL/C
PL/C
PL/C is a computer programming language developed at Cornell University with the specific goal of being used for teaching programming. It is based on IBM's PL/I language, and was designed in the early 1970s. Cornell also developed a compiler for the language that was based on its earlier CUPL...

 was a computer programming language developed at Cornell University in the early 1970s. While PL/C was a subset of IBM's PL/I language, it was designed with the specific goal of being used for teaching programming. The two researchers and academic teachers who designed PL/C were Richard W. Conway and Thomas R. Wilcox.[1] They submitted the famous article "Design and implementation of a diagnostic compiler for PL/I" published in the Communications of ACM in March 1973.

PL/C eliminated some of the more complex features of PL/I, and added extensive debugging and error recovery facilities. The PL/C compiler had the unusual capability of never failing to compile any program, through the use of extensive automatic correction of many syntax errors and by converting any remaining syntax errors to output statements.

See also

  • History of programming languages
    History of programming languages
    This article discusses the major developments in the history of programming languages. For a detailed timeline of events, see the timeline of programming languages.- Before 1940 :The first programming languages predate the modern computer...

  • Lex
    Lex programming tool
    Lex is a computer program that generates lexical analyzers . Lex is commonly used with the yacc parser generator. Lex, originally written by Mike Lesk and Eric Schmidt, is the standard lexical analyzer generator on many Unix systems, and a tool exhibiting its behavior is specified as part of the...

     (and Flex lexical analyser
    Flex lexical analyser
    flex is a free software alternative to lex. It is frequently used with the free Bison parser generator. Unlike Bison, flex is not part of the GNU Project. Flex was written in C by Vern Paxson around 1987...

    ), the token parser commonly used in conjunction with yacc (and Bison).
  • BNF, is a metasyntax
    Metasyntax
    A metasyntax describes the allowable structure and composition of phrases and sentences of a metalanguage, which is used to describe either a natural language or a computer programming language...

     used to express context-free grammar
    Context-free grammar
    In formal language theory, a context-free grammar is a formal grammar in which every production rule is of the formwhere V is a single nonterminal symbol, and w is a string of terminals and/or nonterminals ....

    : that is, a formal way to describe formal languages.

Further reading

  • Backus, John
    John Backus
    John Warner Backus was an American computer scientist. He directed the team that invented the first widely used high-level programming language and was the inventor of the Backus-Naur form , the almost universally used notation to define formal language syntax.He also did research in...

    , et al., "The FORTRAN Automatic Coding System", Proceedings of the Western Joint Computer Conference, Los Angeles, California, February 1957. Describes the design and implementation of the first FORTRAN compiler by the IBM team.
  • Bauer, Friedrich L.
    Friedrich L. Bauer
    Friedrich Ludwig Bauer is a German computer scientist and professor emeritus at Technical University of Munich.-Life:...

    ; Eickel, Jürgen (Eds.), Compiler Construction, An Advanced Course, 2nd ed. Lecture Notes in Computer Science 21, Springer 1976, ISBN 3-540-07542-9
  • Cheatham, T. E., and Sattley, K., Syntax directed compilation, SJCC p. 31. (1964).
  • Cocke, John
    John Cocke
    John Cocke was an American computer scientist recognized for his large contribution to computer architecture and optimizing compiler design. He is considered by many to be "the father of RISC architecture."...

    ; Schwartz, Jacob T.
    Jack Schwartz
    Jacob Theodore "Jack" Schwartz was an American mathematician, computer scientist, and professor of computer science at the New York University Courant Institute of Mathematical Sciences. He was the designer of the SETL programming language and the NYU Ultracomputer...

    , Programming Languages and their Compilers: Preliminary Notes, Courant Institute of Mathematical Sciences
    Courant Institute of Mathematical Sciences
    The Courant Institute of Mathematical Sciences is an independent division of New York University under the Faculty of Arts & Science that serves as a center for research and advanced training in computer science and mathematics...

     technical report, New York University
    New York University
    New York University is a private, nonsectarian research university based in New York City. NYU's main campus is situated in the Greenwich Village section of Manhattan...

    , 1969.
  • Conway, Melvin E.
    Melvin Conway
    Melvin Edward Conway was an early computer scientist, computer programmer, and hacker who coined what's now known as Conway's Law: "Organizations which design systems are constrained to produce designs which are copies of the communication structures of these organizations."Apart from the above,...

    , Design of a separable transition-diagram compiler, Communications of the ACM, Volume 6, Issue 7 (July 1963)
  • Floyd, R. W.
    Robert Floyd
    Robert W Floyd was an eminent computer scientist.His contributions include the design of the Floyd–Warshall algorithm , which efficiently finds all shortest paths in a graph, Floyd's cycle-finding algorithm for detecting cycles in a sequence, and his work on parsing...

    , Syntactic analysis and operator precedence, Journal of the ACM, Vol. 10, p. 316. (July 1963).
  • Gries, David
    David Gries
    David Gries is an American computer scientist at Cornell University, United States. He is currently Associate Dean for Undergraduate Programs in the College of Engineering. His research interests include programming methodology and related areas such as programming languages, programming...

    , Compiler Construction for Digital Computers, New York : Wiley, 1971. ISBN 047132776X
  • Irons, Edgar T., A syntax directed compiler for ALGOL 60, Communications of the ACM, Vol. 4, p. 51. (Jan. 1961)
  • Knuth, D. E.
    Donald Knuth
    Donald Ervin Knuth is a computer scientist and Professor Emeritus at Stanford University.He is the author of the seminal multi-volume work The Art of Computer Programming. Knuth has been called the "father" of the analysis of algorithms...

    , On the translation of languages from left to right., Information and Control, Vol. 8, p. 607. (1965).
  • Knuth, D. E., RUNCIBLE-algebraic translation on a limited computer, Communications of the ACM, Vol. 2, p. 18, (Nov. 1959).
  • Randell, Brian
    Brian Randell
    Brian Randell is a British computer scientist, and Emeritus Professor at the School of Computing Science, Newcastle University, U.K. He specializes in research in software fault tolerance and dependability, and is a noted authority on the early prior to 1950 history of computers.- Biography...

    ; Russell, Lawford John, ALGOL 60 Implementation: The Translation and Use of ALGOL 60 Programs on a Computer, Academic Press, 1964
  • Dick Grune
    Dick Grune
    Dick Grune is a Dutch computer scientist and university lecturer best known for inventing and developing the first version of CVS, the Concurrent Versions System. Grune was involved in the construction of Algol 68 compilers in the 1970s and the Amsterdam Compiler Kit in the 1980s.- Selected...

    's Annotated Literature Lists - Compiler Construction before 1980 http://www.dickgrune.com/Summaries/CS/CompilerConstruction-1979.html
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK