Type inference
Encyclopedia
Type inference refers to the automatic deduction of the type of an expression in a programming language
. If some, but not all, type annotations are already present it is referred to as type reconstruction.
It is a feature present in some strongly
statically typed languages. It is often characteristic of — but not limited to — functional programming languages in general. Some languages that include type inference are ML, OCaml, Haskell
, Scala, D
, Clean, Opa
and Go
. It has lately been added (to some extent) to Visual Basic (starting with version 9.0), C# (starting with version 3.0) and C++11. It is also planned for Perl 6
. The ability to infer types automatically makes many programming tasks easier, leaving the programmer free to omit type annotations
while still permitting type checking.
which describes the kind of data a particular value describes. In some languages, the type of a value is known only at run-time; these languages are dynamically typed. In other languages, the type of an expression is known at compile time
; these languages are statically typed. In statically typed languages, the input and output types of functions and local variable
s ordinarily must be explicitly provided by type annotations. For example, in C
:
The signature of this function definition,
, and returns an integer.
This looks very similar to how code is written in a dynamically typed language, but with some extra constraints (described below) it would be possible to infer the types of all the variables at compile time. In the example above, the compiler would infer that
In the imaginary language in which the last example is written, the compiler would assume that, in the absence of information to the contrary,
However, in the subsequent line, result2 is calculated by adding a decimal "
of a function, without explicit type annotations having been given. In many cases, it is possible to omit type annotations from a program completely if the type inference system is robust enough, or the program or language is simple enough.
To obtain the information required to infer the type of an expression, the compiler either gathers this information as an aggregate and subsequent reduction of the type annotations given for its subexpressions, or through an implicit understanding of the type of various atomic values (e.g. true : Bool; 42 : Integer; 3.14159 : Real; etc.). It is through recognition of the eventual reduction of expressions to implicitly typed atomic values that the compiler for a type inferring language is able to compile a program completely without type annotations. In the case of complex forms of higher-order programming and polymorphism, it is not always possible for the compiler to infer as much, however, and type annotations are occasionally necessary for disambiguation.
function
map f [] = []
map f (first:rest) = f first : map f rest
From this, it is evident that the function
So assuming that a list contains elements of the same type, we can reliably construct a type signature
map :: (a -> b) -> [a] -> [b]
where the syntax "
Note that the inferred type of
: The type of the arguments and results of
The origin of this algorithm is the type inference algorithm for the simply typed lambda calculus
, which was devised by Haskell Curry
and Robert Feys
in 1958.
In 1969 J. Roger Hindley
extended this work and proved that their algorithm always inferred the most general type.
In 1978 Robin Milner
, independently of Hindley's work, provided an equivalent algorithm, Algorithm W.
In 1982 Luis Damas finally proved that Milner's algorithm is complete and extended it to support systems with polymorphic references.
Programming language
A programming language is an artificial language designed to communicate instructions to a machine, particularly a computer. Programming languages can be used to create programs that control the behavior of a machine and/or to express algorithms precisely....
. If some, but not all, type annotations are already present it is referred to as type reconstruction.
It is a feature present in some strongly
Strongly-typed programming language
In computer science and computer programming, a type system is said to feature strong typing when it specifies one or more restrictions on how operations involving values of different data types can be intermixed...
statically typed languages. It is often characteristic of — but not limited to — functional programming languages in general. Some languages that include type inference are ML, OCaml, Haskell
Haskell (programming language)
Haskell is a standardized, general-purpose purely functional programming language, with non-strict semantics and strong static typing. It is named after logician Haskell Curry. In Haskell, "a function is a first-class citizen" of the programming language. As a functional programming language, the...
, Scala, D
D (programming language)
The D programming language is an object-oriented, imperative, multi-paradigm, system programming language created by Walter Bright of Digital Mars. It originated as a re-engineering of C++, but even though it is mainly influenced by that language, it is not a variant of C++...
, Clean, Opa
Opa (programming language)
Opa is an open source programming language for web applications.The language was first officially presented at the OWASP conference in 2010, and the source code was released onGitHubin June 2011, under a GNU Affero General Public License....
and Go
Go (programming language)
Go is a compiled, garbage-collected, concurrent programming language developed by Google Inc.The initial design of Go was started in September 2007 by Robert Griesemer, Rob Pike, and Ken Thompson. Go was officially announced in November 2009. In May 2010, Rob Pike publicly stated that Go was being...
. It has lately been added (to some extent) to Visual Basic (starting with version 9.0), C# (starting with version 3.0) and C++11. It is also planned for Perl 6
Perl 6
Perl 6 is a major revision to the Perl programming language. It is still in development, as a specification from which several interpreter and compiler implementations are being written. It is introducing elements of many modern and historical languages. Perl 6 is intended to have many...
. The ability to infer types automatically makes many programming tasks easier, leaving the programmer free to omit type annotations
Type signature
In computer science, a type signature or type annotation defines the inputs and outputs for a function, subroutine or method. A type signature includes at least the function name and the number of its arguments...
while still permitting type checking.
Nontechnical explanation
In most programming languages, all values have a typeData type
In computer programming, a data type is a classification identifying one of various types of data, such as floating-point, integer, or Boolean, that determines the possible values for that type; the operations that can be done on values of that type; the meaning of the data; and the way values of...
which describes the kind of data a particular value describes. In some languages, the type of a value is known only at run-time; these languages are dynamically typed. In other languages, the type of an expression is known at compile time
Compile time
In computer science, compile time refers to either the operations performed by a compiler , programming language requirements that must be met by source code for it to be successfully compiled , or properties of the program that can be reasoned about at compile time.The operations performed at...
; these languages are statically typed. In statically typed languages, the input and output types of functions and local variable
Local variable
In computer science, a local variable is a variable that is given local scope. Such a variable is accessible only from the function or block in which it is declared. In programming languages with only two levels of visibility, local variables are contrasted with global variables...
s ordinarily must be explicitly provided by type annotations. For example, in C
C (programming language)
C is a general-purpose computer programming language developed between 1969 and 1973 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system....
:
The signature of this function definition,
int addone(int x)
, declares that addone
is a function which takes one argument, an integerInteger (computer science)
In computer science, an integer is a datum of integral data type, a data type which represents some finite subset of the mathematical integers. Integral data types may be of different sizes and may or may not be allowed to contain negative values....
, and returns an integer.
int result;
declares that the local variable result
is an integer. In a hypothetical language supporting type inference, the code might be written like this instead:This looks very similar to how code is written in a dynamically typed language, but with some extra constraints (described below) it would be possible to infer the types of all the variables at compile time. In the example above, the compiler would infer that
result
and x
have type integer and addone
is a function int -> int
. The variable result2
isn't used in a legal manner, so it wouldn't have a type.In the imaginary language in which the last example is written, the compiler would assume that, in the absence of information to the contrary,
+
takes two integers and returns one integer. (This is how it works in, for example, OCaml). From this, the type inferencer can infer that the type of x+1
is an integer, which means result
is an integer and thus the return value of addone
is an integer. Similarly, since +
requires that both of its arguments be of the same type, x
must be an integer, and therefore addone
accepts one integer as an argument.However, in the subsequent line, result2 is calculated by adding a decimal "
1.0
" with floating-point arithmetic, causing a conflict in the use of x
for both integer and floating-point expressions. Such a situation would generate a compile-time error message. In a different language, result2 might have been implicitly declared as a floating-point variable, and the addition would implicitly convert x
to a floating point. However, languages that support type inference to the degree the above example illustrates rarely support such implicit type conversions. Such a situation shows the difference between type inference, which does not involve type conversion, and implicit type conversion, which forces data to a different data type, often without restrictions.Technical description
Type inference is the ability to automatically deduce, either partially or fully, the type of an expression at compile time. The compiler is often able to infer the type of a variable or the type signatureType signature
In computer science, a type signature or type annotation defines the inputs and outputs for a function, subroutine or method. A type signature includes at least the function name and the number of its arguments...
of a function, without explicit type annotations having been given. In many cases, it is possible to omit type annotations from a program completely if the type inference system is robust enough, or the program or language is simple enough.
To obtain the information required to infer the type of an expression, the compiler either gathers this information as an aggregate and subsequent reduction of the type annotations given for its subexpressions, or through an implicit understanding of the type of various atomic values (e.g. true : Bool; 42 : Integer; 3.14159 : Real; etc.). It is through recognition of the eventual reduction of expressions to implicitly typed atomic values that the compiler for a type inferring language is able to compile a program completely without type annotations. In the case of complex forms of higher-order programming and polymorphism, it is not always possible for the compiler to infer as much, however, and type annotations are occasionally necessary for disambiguation.
Example
For example, let us consider the HaskellHaskell (programming language)
Haskell is a standardized, general-purpose purely functional programming language, with non-strict semantics and strong static typing. It is named after logician Haskell Curry. In Haskell, "a function is a first-class citizen" of the programming language. As a functional programming language, the...
function
map
, which applies a function to each element of a list, and may be defined as:map f [] = []
map f (first:rest) = f first : map f rest
From this, it is evident that the function
map
takes a list as its second argument, that its first argument f
is a function that can be applied to the type of elements of that list, and that the result of map
is constructed as a list with elements that are results of f
.So assuming that a list contains elements of the same type, we can reliably construct a type signature
map :: (a -> b) -> [a] -> [b]
where the syntax "
a -> b
" denotes a function that takes an a
as its parameter and produces a b
. "a -> b -> c
" is equivalent to "a -> (b -> c)
".Note that the inferred type of
map
is parametrically polymorphicParametric polymorphism
In programming languages and type theory, parametric polymorphism is a way to make a language more expressive, while still maintaining full static type-safety. Using parametric polymorphism, a function or a data type can be written generically so that it can handle values identically without...
: The type of the arguments and results of
f
are not inferred, but left as type variables, and so map
can be applied to functions and lists of various types, as long as the actual types match in each invocation.Hindley–Milner type inference algorithm
The algorithm first used to perform type inference is now informally referred to as the Hindley–Milner algorithm, although the algorithm should properly be attributed to Damas and Milner.The origin of this algorithm is the type inference algorithm for the simply typed lambda calculus
Simply typed lambda calculus
The simply typed lambda calculus , a formof type theory, is a typed interpretation of the lambda calculus with only one type constructor: \to that builds function types. It is the canonical and simplest example of a typed lambda calculus...
, which was devised by Haskell Curry
Haskell Curry
Haskell Brooks Curry was an American mathematician and logician. Curry is best known for his work in combinatory logic; while the initial concept of combinatory logic was based on a single paper by Moses Schönfinkel, much of the development was done by Curry. Curry is also known for Curry's...
and Robert Feys
Robert Feys
Robert Feys was a Belgian logician and philosopher, who worked at the University of Leuven .In 1958 Feys and Haskell B. Curry devised the type inference algorithm for the simply typed lambda calculus ....
in 1958.
In 1969 J. Roger Hindley
J. Roger Hindley
J. Roger Hindley is a prominent British logician best known for the Hindley–Milner type inference algorithm. Since 1998, he has been an Honorary Research Fellow at Swansea University.-Education:...
extended this work and proved that their algorithm always inferred the most general type.
In 1978 Robin Milner
Robin Milner
Arthur John Robin Gorell Milner FRS FRSE was a prominent British computer scientist.-Life, education and career:...
, independently of Hindley's work, provided an equivalent algorithm, Algorithm W.
In 1982 Luis Damas finally proved that Milner's algorithm is complete and extended it to support systems with polymorphic references.
External links
- Archived e-mail message by Roger Hindley, explains history of type inference
- Polymorphic Type Inference by Michael Schwartzbach, gives an overview of Polymorphic type inference.
- Principal type-schemes for functional programs. A re-typeset copy of the Damas and Milner paper which described the soundness and completeness proofs.
- Tutorial and complete implementation in Standard ML The tutorial includes some of the logical history of type systems as well as a detailed description of the algorithm as implemented. Some typographic errors in the original Damas Milner paper are corrected.
- Basic Typechecking paper by Luca Cardelli, describes algorithm, includes implementation in Modula-2Modula-2Modula-2 is a computer programming language designed and developed between 1977 and 1980 by Niklaus Wirth at ETH Zurich as a revision of Pascal to serve as the sole programming language for the operating system and application software for the personal workstation Lilith...
- Implementation of Hindley-Milner type inference in Scala, by Andrew Forrest (retrieved July 30, 2009)
- What is Hindley-Milner? (and why is it cool?) Explains Hindley-Milner, examples in Scala