Programming language specification
Encyclopedia
In computing
Computing
Computing is usually defined as the activity of using and improving computer hardware and software. It is the computer-specific part of information technology...

, a programming language specification is an artifact that defines a programming language
Programming language
A programming language is an artificial language designed to communicate instructions to a machine, particularly a computer. Programming languages can be used to create programs that control the behavior of a machine and/or to express algorithms precisely....

 so that user
Programmer
A programmer, computer programmer or coder is someone who writes computer software. The term computer programmer can refer to a specialist in one area of computer programming or to a generalist who writes code for many kinds of software. One who practices or professes a formal approach to...

s and implementors
Programming language implementation
A programming language implementation is a system for executing programs written in a programming language.There are two general approaches to programming language implementation:...

 can agree on what programs in that language mean.

A programming language specification can take several forms, including the following:
  • An explicit definition of the syntax and semantics of the language. While syntax is commonly specified using a formal grammar, semantic definitions may be written in natural language
    Natural language
    In the philosophy of language, a natural language is any language which arises in an unpremeditated fashion as the result of the innate facility for language possessed by the human intellect. A natural language is typically used for communication, and may be spoken, signed, or written...

     (e.g., the approach taken for the C language
    C (programming language)
    C is a general-purpose computer programming language developed between 1969 and 1973 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system....

    ), or a formal semantics
    Formal semantics of programming languages
    In programming language theory, semantics is the field concerned with the rigorous mathematical study of the meaning of programming languages and models of computation...

     (e.g., the Standard ML
    Standard ML
    Standard ML is a general-purpose, modular, functional programming language with compile-time type checking and type inference. It is popular among compiler writers and programming language researchers, as well as in the development of theorem provers.SML is a modern descendant of the ML...

    and Scheme specifications).
  • A description of the behavior of a translator
    Compiler
    A compiler is a computer program that transforms source code written in a programming language into another computer language...

     for the language (e.g., the C++
    C++
    C++ is a statically typed, free-form, multi-paradigm, compiled, general-purpose programming language. It is regarded as an intermediate-level language, as it comprises a combination of both high-level and low-level language features. It was developed by Bjarne Stroustrup starting in 1979 at Bell...

     and Fortran
    Fortran
    Fortran is a general-purpose, procedural, imperative programming language that is especially suited to numeric computation and scientific computing...

    ). The syntax and semantics of the language has to be inferred from this description, which may be written in natural or a formal language.
  • A model implementation, sometimes written in the language being specified (e.g., the Prolog
    Prolog
    Prolog is a general purpose logic programming language associated with artificial intelligence and computational linguistics.Prolog has its roots in first-order logic, a formal logic, and unlike many other programming languages, Prolog is declarative: the program logic is expressed in terms of...

    ). The syntax and semantics of the language are explicit in the behavior of the model implementation.

Syntax

Syntax
Syntax
In linguistics, syntax is the study of the principles and rules for constructing phrases and sentences in natural languages....

 in a programming language is usually described using a combination of
  • regular expression
    Regular expression
    In computing, a regular expression provides a concise and flexible means for "matching" strings of text, such as particular characters, words, or patterns of characters. Abbreviations for "regular expression" include "regex" and "regexp"...

    s to describe lexeme
    Lexeme
    A lexeme is an abstract unit of morphological analysis in linguistics, that roughly corresponds to a set of forms taken by a single word. For example, in the English language, run, runs, ran and running are forms of the same lexeme, conventionally written as RUN...

    s, and
  • context-free grammar
    Context-free grammar
    In formal language theory, a context-free grammar is a formal grammar in which every production rule is of the formwhere V is a single nonterminal symbol, and w is a string of terminals and/or nonterminals ....

    s to describe how lexemes may be combined to form a valid program.

Semantics

Formulating a rigorous semantics of a large, complex, practical programming language is a daunting task even for experienced specialists, and the resulting specification can be difficult for anyone but experts to understand. The following are some of the ways in which programming language semantics can be described; all languages use at least one of these description methods, and some languages combine more than one:
  • Natural language
    Natural language
    In the philosophy of language, a natural language is any language which arises in an unpremeditated fashion as the result of the innate facility for language possessed by the human intellect. A natural language is typically used for communication, and may be spoken, signed, or written...

    : Description by human natural language.
  • Formal semantics
    Formal semantics of programming languages
    In programming language theory, semantics is the field concerned with the rigorous mathematical study of the meaning of programming languages and models of computation...

    : Description by mathematics
    Mathematics
    Mathematics is the study of quantity, space, structure, and change. Mathematicians seek out patterns and formulate new conjectures. Mathematicians resolve the truth or falsity of conjectures by mathematical proofs, which are arguments sufficient to convince other mathematicians of their validity...

    .
  • Reference implementations: Description by computer program
    Computer program
    A computer program is a sequence of instructions written to perform a specified task with a computer. A computer requires programs to function, typically executing the program's instructions in a central processor. The program has an executable form that the computer can use directly to execute...

  • Test suites: Description by examples of programs and their expected behaviors. While few language specifications start off in this form, the evolution of some language specifications has been influenced by the semantics of a test suite (eg, in the past the specification of Ada
    Ada (programming language)
    Ada is a structured, statically typed, imperative, wide-spectrum, and object-oriented high-level computer programming language, extended from Pascal and other languages...

     has been modified to match the behavior of the Ada Conformity Assessment Test Suite
    Ada Conformity Assessment Test Suite
    The Ada Conformity Assessment Test Suite is the test suite used for Ada processor conformity testing.The preface to the test report includes the following:The second paragraph of the background of the current says:...

    ).

Natural language

Most widely-used languages are specified using natural language descriptions of their semantics. This description usually takes the form of a reference manual for the language. These manuals can run to hundreds of pages. For example, the print version of The Java Language Specification, 3rd Ed. is 596 pages long.

The imprecision of natural language as a vehicle for describing programming language semantics can lead to problems with interpreting the specification. For example, the semantics of Java
Java (programming language)
Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities...

 threads were specified in English, and it was later discovered that the specification did not provide adequate guidance for implementors.

Formal semantics

Formal semantics are grounded in mathematics. As a result, they can be more precise and less ambiguous than semantics given in natural language. However, supplemental natural language descriptions of the semantics are often included to aid understanding of the formal definitions. For example, The ISO Standard for Modula-2
Modula-2
Modula-2 is a computer programming language designed and developed between 1977 and 1980 by Niklaus Wirth at ETH Zurich as a revision of Pascal to serve as the sole programming language for the operating system and application software for the personal workstation Lilith...

 contains both a formal and a natural language definition on opposing pages.

Programming languages whose semantics are described formally can reap many benefits. For example:
  • Formal semantics enable mathematical proofs of program correctness;
  • Formal semantics facilitate the design of type system
    Type system
    A type system associates a type with each computed value. By examining the flow of these values, a type system attempts to ensure or prove that no type errors can occur...

    s, and proofs about the soundness of those type systems;
  • Formal semantics can establish unambiguous and uniform standards for implementations of a language.


Automatic tool support can help to realize many of these benefits. For example, an automated theorem prover or theorem checker can increase a programmer's (or language designer's) confidence in the correctness of proofs about programs (or the language itself). The power and scalability of these tools varies widely: full formal verification
Formal verification
In the context of hardware and software systems, formal verification is the act of proving or disproving the correctness of intended algorithms underlying a system with respect to a certain formal specification or property, using formal methods of mathematics .- Usage :Formal verification can be...

 is computationally intensive, rarely scales beyond programs containing a few hundred lines and may require considerable manual assistance from a programmer; more lightweight tools such as model checkers require fewer resources and have been used on programs containing tens of thousands of lines; many compilers apply static type checks
Type system
A type system associates a type with each computed value. By examining the flow of these values, a type system attempts to ensure or prove that no type errors can occur...

 to any program they compile.

Reference implementation

A reference implementation is a single implementation of a programming language that is designated as authoritative. The behavior of this implementation is held to define the proper behavior of a program written in the language. This approach has several attractive properties. First, it is precise, and requires no human interpretation: disputes as to the meaning of a program can be settled simply by executing the program on the reference implementation (provided that the implementation behaves deterministically for that program).

On the other hand, defining language semantics through a reference implementation also has several potential drawbacks. Chief among them is that it conflates limitations of the reference implementation with properties of the language. For example, if the reference implementation has a bug, then that bug must be considered to be an authoritative behavior. Another drawback is that programs written in this language may rely on quirks in the reference implementation, hindering portability across different implementations.

Nevertheless, several languages have successfully used the reference implementation approach. For example, the Perl
Perl
Perl is a high-level, general-purpose, interpreted, dynamic programming language. Perl was originally developed by Larry Wall in 1987 as a general-purpose Unix scripting language to make report processing easier. Since then, it has undergone many changes and revisions and become widely popular...

 interpreter is considered to define the authoritative behavior of Perl programs. In the case of Perl, the Open Source
Open source
The term open source describes practices in production and development that promote access to the end product's source materials. Some consider open source a philosophy, others consider it a pragmatic methodology...

model of software distribution has contributed to the fact that nobody has ever produced another implementation of the language, so the issues involved in using a reference implementation to define the language semantics are moot.

Test suite

Defining the semantics of a programming language in terms of a test suite involves writing a number of example programs in the language, and then describing how those programs ought to behave — perhaps by writing down their correct outputs. The programs, plus their outputs, are called the "test suite" of the language. Any correct language implementation must then produce exactly the correct outputs on the test suite programs.

The chief advantage of this approach to semantic description is that it is easy to determine whether a language implementation passes a test suite. The user can simply execute all the programs in the test suite, and compare the outputs to the desired outputs. However, when used by itself, the test suite approach has major drawbacks as well. For example, users want to run their own programs, which are not part of the test suite; indeed, a language implementation that could only run the programs in its test suite would be largely useless. But a test suite does not, by itself, describe how the language implementation should behave on any program not in the test suite; determining that behavior requires some extrapolation on the implementor's part, and different implementors may disagree. In addition, it is difficult to use a test suite to test behavior that is intended or allowed to be nondeterministic.

Therefore, in common practice, test suites are used only in combination with one of the other language specification techniques, such as a natural language description or a reference implementation.

Language specifications

A few examples of official or draft language specifications:
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK