Formal semantics of programming languages

Programming language theory

Programming language theory is a branch of computer science that deals with the design, implementation, analysis, characterization, and classification of programming languages and their individual features. It falls within the discipline of computer science, both depending on and affecting...

, semantics is the field concerned with the rigorous mathematical study of the meaning of programming language

Programming language

A programming language is an artificial language designed to communicate instructions to a machine, particularly a computer. Programming languages can be used to create programs that control the behavior of a machine and/or to express algorithms precisely....

s and models of computation

Model of computation

In computability theory and computational complexity theory, a model of computation is the definition of the set of allowable operations used in computation and their respective costs...

. The formal semantics of a language is given by a mathematical model

Mathematical model

A mathematical model is a description of a system using mathematical concepts and language. The process of developing a mathematical model is termed mathematical modeling. Mathematical models are used not only in the natural sciences and engineering disciplines A mathematical model is a...

that describes the possible computations described by the language.

Overview

The field of formal semantics encompasses all of the following:

The definition of semantic models
The relations between different semantic models
The relations between different approaches to meaning
The relation between computation and the underlying mathematical structures from fields such as logic
Mathematical logic
Mathematical logic is a subfield of mathematics with close connections to foundations of mathematics, theoretical computer science and philosophical logic. The field includes both the mathematical study of logic and the applications of formal logic to other areas of mathematics...

, set theory
Set theory
Set theory is the branch of mathematics that studies sets, which are collections of objects. Although any type of object can be collected into a set, set theory is applied most often to objects that are relevant to mathematics...

, model theory
Model theory
In mathematics, model theory is the study of mathematical structures using tools from mathematical logic....

, category theory
Category theory
Category theory is an area of study in mathematics that examines in an abstract way the properties of particular mathematical concepts, by formalising them as collections of objects and arrows , where these collections satisfy certain basic conditions...

, etc.

It has close links with other areas of computer science

Computer science

Computer science or computing science is the study of the theoretical foundations of information and computation and of practical techniques for their implementation and application in computer systems...

such as programming language design, type theory

Type theory

In mathematics, logic and computer science, type theory is any of several formal systems that can serve as alternatives to naive set theory, or the study of such formalisms in general...

, compiler

Compiler

A compiler is a computer program that transforms source code written in a programming language into another computer language...

s and interpreter

Interpreter (computing)

In computer science, an interpreter normally means a computer program that executes, i.e. performs, instructions written in a programming language...

s, program verification and a model checking

Model checking

In computer science, model checking refers to the following problem:Given a model of a system, test automatically whether this model meets a given specification....

Approaches

There are many approaches to formal semantics; these are many approaches belong to three major classes:

Denotational semantics
Denotational semantics
In computer science, denotational semantics is an approach to formalizing the meanings of programming languages by constructing mathematical objects which describe the meanings of expressions from the languages...

, whereby each phrase in the language is translated into a denotation, i.e. a phrase in some other language. Denotational semantics loosely corresponds to compilation
Compiler
A compiler is a computer program that transforms source code written in a programming language into another computer language...

, although the "target language" is usually a mathematical formalism rather than another computer language. For example, denotational semantics of functional languages often translates the language into domain theory
Domain theory
Domain theory is a branch of mathematics that studies special kinds of partially ordered sets commonly called domains. Consequently, domain theory can be considered as a branch of order theory. The field has major applications in computer science, where it is used to specify denotational...

;

Operational semantics
Operational semantics
In computer science, operational semantics is a way to give meaning to computer programs in a mathematically rigorous way. Operational semantics are classified into two categories: structural operational semantics formally describe how the individual steps of a computation take place in a...

, whereby the execution of the language is described directly (rather than by translation). Operational semantics loosely corresponds to interpretation
Interpreter (computing)
In computer science, an interpreter normally means a computer program that executes, i.e. performs, instructions written in a programming language...

, although again the "implementation language" of the interpreter is generally a mathematical formalism. Operational semantics may define an abstract machine
Abstract machine
An abstract machine, also called an abstract computer, is a theoretical model of a computer hardware or software system used in automata theory...

(such as the SECD machine
SECD machine
The SECD machine is a highly influential virtual machine and abstract machine intended as a target for functional programming language compilers. The letters stand for Stack, Environment, Code, Dump, the internal registers of the machine...

), and give meaning to phrases by describing the transitions they induce on states of the machine. Alternatively, as with the pure lambda calculus
Lambda calculus
In mathematical logic and computer science, lambda calculus, also written as λ-calculus, is a formal system for function definition, function application and recursion. The portion of lambda calculus relevant to computation is now called the untyped lambda calculus...

, operational semantics can be defined via syntactic transformations on phrases of the language itself;

Axiomatic semantics
Axiomatic semantics
Axiomatic semantics is an approach based on mathematical logic to proving the correctness of computer programs. It is closely related to Hoare logic....

, whereby one gives meaning to phrases by describing the logic
Logic
In philosophy, Logic is the formal systematic study of the principles of valid inference and correct reasoning. Logic is used in most intellectual activities, but is studied primarily in the disciplines of philosophy, mathematics, semantics, and computer science...

al axiom
Axiom
In traditional logic, an axiom or postulate is a proposition that is not proven or demonstrated but considered either to be self-evident or to define and delimit the realm of analysis. In other words, an axiom is a logical statement that is assumed to be true...

s that apply to them. Axiomatic semantics makes no distinction between a phrase's meaning and the logical formulas that describe it; its meaning is exactly what can be proven about it in some logic. The canonical example of axiomatic semantics is Hoare logic
Hoare logic
Hoare logic is a formal system with a set of logical rules for reasoning rigorously about the correctness of computer programs. It was proposed in 1969 by the British computer scientist and logician C. A. R. Hoare, and subsequently refined by Hoare and other researchers...

.

The distinctions between the three broad classes of approaches can sometimes be vague, but all known approaches to formal semantics use the above techniques, or some combination thereof.

Apart from the choice between denotational, operational, or axiomatic approaches, most variation in formal semantic systems arises from the choice of supporting mathematical formalism.

Variations

Some variations of formal semantics include the following:

Action semantics
Action semantics
Action semantics is a framework for the formal specification of semantics of programming languages invented by David Watt and Peter D. Mosses. It is a mixture of denotational, operational and algebraic semantics....

is an approach that tries to modularize denotational semantics, splitting the formalization process in two layers (macro and microsemantics) and predefining three semantic entities (actions, data and yielders) to simplify the specification;
Algebraic semantics
Algebraic semantics
An programming language theory, the algebraic semantics of a programming language is a form of axiomatic semantics based on algebraic laws for describing and reasoning about program semantics in a formal manner....

describes semantics in terms of algebras;
Attribute grammar
Attribute grammar
An attribute grammar is a formal way to define attributes for the productions of a formal grammar, associating these attributes to values. The evaluation occurs in the nodes of the abstract syntax tree, when the language is processed by some parser or compiler....

s define systems that systematically compute "metadata
Metadata
The term metadata is an ambiguous term which is used for two fundamentally different concepts . Although the expression "data about data" is often used, it does not apply to both in the same way. Structural metadata, the design and specification of data structures, cannot be about data, because at...

" (called attributes) for the various cases of the language's syntax
Syntax
In linguistics, syntax is the study of the principles and rules for constructing phrases and sentences in natural languages....

. Attribute grammars can be understood as a denotational semantics where the target language is simply the original language enriched with attribute annotations. Aside from formal semantics, attribute grammars have also been used for code generation in compiler
Compiler
A compiler is a computer program that transforms source code written in a programming language into another computer language...

s, and to augment regular or context-free grammars with context-sensitive conditions;
Categorical (or "functorial") semantics uses category theory
Category theory
Category theory is an area of study in mathematics that examines in an abstract way the properties of particular mathematical concepts, by formalising them as collections of objects and arrows , where these collections satisfy certain basic conditions...

as the core mathematical formalism;
Concurrency semantics
Concurrency semantics
In computer science, concurrency semantics is a way to give meaning to concurrent systems in a mathematically rigorous way. Concurrency semantics is often based on mathematical theories of concurrency such as various process calculi, the actor model, or Petri nets....

is a catch-all term for any formal semantics that describes concurrent computations. Historically important concurrent formalisms have included the Actor model
Actor model
In computer science, the Actor model is a mathematical model of concurrent computation that treats "actors" as the universal primitives of concurrent digital computation: in response to a message that it receives, an actor can make local decisions, create more actors, send more messages, and...

and process calculi;
Game semantics
Game semantics
Game semantics is an approach to formal semantics that grounds the concepts of truth or validity on game-theoretic concepts, such as the existence of a winning strategy for a player, somewhat resembling Socratic dialogues or medieval theory of Obligationes. In the late 1950s Paul Lorenzen was the...

uses a metaphor inspired by game theory
Game theory
Game theory is a mathematical method for analyzing calculated circumstances, such as in games, where a person’s success is based upon the choices of others...

.
Predicate transformer semantics
Predicate transformer semantics
Predicate transformer semantics was introduced by Dijkstra in his seminal paper "Guarded commands, nondeterminacy and formal derivation of programs". They define the semantics of an imperative programming paradigm by assigning to each statement in this language a corresponding predicate...

, developed by Edsger W. Dijkstra, describes the meaning of a program fragment as the function transforming a postcondition
Postcondition
In computer programming, a postcondition is a condition or predicate that must always be true just after the execution of some section of code or after an operation in a formal specification. Postconditions are sometimes tested using assertions within the code itself...

to the precondition
Precondition
In computer programming, a precondition is a condition or predicate that must always be true just prior to the execution of some section of code or before an operation in a formal specification....

needed to establish it.

Describing relationships

For a variety of reasons, one might wish to describe the relationships between different formal semantics. For example:

To prove that a particular operational semantics for a language satisfies the logical formulas of an axiomatic semantics for that language. Such a proof demonstrates that it is "sound" to reason about a particular (operational) interpretation strategy using a particular (axiomatic) proof system.
To prove that operational semantics over a high-level machine is related by a bisimulation
Bisimulation
In theoretical computer science a bisimulation is a binary relation between state transition systems, associating systems which behave in the same way in the sense that one system simulates the other and vice-versa....

with the semantics over a low-level machine, whereby the low-level abstract machine contains more primitive operations than the high-level abstract machine definition of a given language. Such a proof demonstrates that the low-level machine "faithfully implements" the high-level machine.

It is also possible to relate multiple semantics through abstractions via the theory of abstract interpretation

Abstract interpretation

In computer science, abstract interpretation is a theory of sound approximation of the semantics of computer programs, based on monotonic functions over ordered sets, especially lattices. It can be viewed as a partial execution of a computer program which gains information about its semantics In...

Overview

Approaches

Variations

Describing relationships

Further reading