Intentional Programming
Encyclopedia
In computer programming
, intentional programming is a collection of concepts which enable software source code
to reflect the precise information, called intention, which programmers had in mind when conceiving their work. By closely matching the level of abstraction
at which the programmer was thinking, browsing and maintaining computer program
s becomes easier.
The concept was introduced by long-time Microsoft
employee Charles Simonyi
, who led a team in Microsoft Research
which developed an integrated development environment
(IDE) called IP that demonstrates these concepts. For reasons that are unclear, Microsoft stopped working on intentional programming and ended development of IP in the early 2000s.
An overview of intentional programming is given in Chapter 11 of the book Generative Programming: Methods, Tools, and Applications.
-like manner. Finally, an automated system uses the program description and the toolbox to generate the final program. Successive changes are done at the WYSIWYG level only, employing a system called the "domain workbench".
is not stored in text file
s, but in a binary file
that bears a resemblance to XML
. As with XML, there is no need for a specific parser for each piece of code that wishes to operate on the information that forms the program
, lowering the barrier to writing analysis or restructuring tools.
Tight integration of the editor with the binary format brings some of the nicer features of database normalization
to source code
. Redundancy is eliminated by giving each definition
a unique identity
, and storing the name of variables
and operator
s in exactly one place. This makes it easier to intrinsically distinguish declarations
from reference
s, and the environment shows declarations in boldface type. Whitespace is also not stored as part of the source code
, and each programmer working on a project can choose an indentation
display of the source. More radical visualizations include showing statement lists as nested boxes, editing conditional expressions
as logic gate
s, or re-rendering names in Chinese.
The project appears to standardize a kind of XML Schema
for popular languages like C++
and Java
, while letting users of the environment mix and match these with ideas from Eiffel
and other languages. Often mentioned in the same context as language-oriented programming
via domain-specific languages, and aspect-oriented programming
, IP purports to provide some breakthroughs in generative programming. These techniques allow developers to extend the language environment to capture domain-specific constructs without investing in writing a full compiler
and editor for any new languages.
program that writes out the numbers from 1 to 10, using a curly bracket syntax, might look like this:
The code above contains a common construct of most programming language
s, the bounded loop, in this case represented by the
But this code does not capture the intentions of the programmer, namely to "print the numbers 1 to 10". In this simple case, a programmer asked to maintain the code could likely figure out what it is intended to do, but it is not always so easy. Loops that extend across many lines, or pages, can become very difficult to understand, notably if the original programmer uses unclear labels. Traditionally the only way to indicate the intention of the code was to add source code comment
s, but often comments are not added, or are unclear, or drift out of sync with the source code they originally described.
In intentional programming systems the above loop could be represented, at some level, as something as obvious as "
in larger programs.
Although most languages contain mechanisms for capturing certain kinds of abstraction
, IP, like the Lisp family of languages, allows for the addition of entirely new mechanisms. Thus, if a developer started with a language like C
, they would be able to extend the language with features such as those in C++ without waiting for the compiler developers to add them. By analogy, many more powerful expression mechanisms could be used by programmers than mere classes
and procedures
.
. Since most programming languages represent the source code as plain text, objects are defined by names, and their uniqueness has to be inferred by the compiler. For example, the same symbolic name may be used to name different variables, procedures, or even types. In code that spans several pages – or, for globally visible names, multiple files – it can become very difficult to tell what symbol
refers to what actual object. If a name is changed, the code where it is used must carefully be examined.
By contrast, in an IP system, all definition
s not only assign symbolic names, but also unique private identifier
s to objects. This means that in the IP development environment, every reference to a variable or procedure is not just a name – it is a link to the original entity.
The major advantage of this is that if an entity is renamed, all of the references to it in the program remain valid (known as referential integrity
). This also means that if the same name is used for unique definitions in different namespaces (such as "
Having a unique identity for every defined object in the program also makes it easy to perform automated refactoring tasks, as well as simplifying code check-ins in versioning systems
. For example, in many current code collaboration systems (e.g. CVS
), when two programmers commit changes that conflict (i.e. if one programmer renames a function while another changes one of the lines in that function), the versioning system will think that one programmer created a new function while another modified an old function. In an IP versioning system, it will know that one programmer merely changed a name while another changed the code.
Thus IP systems are self-documenting
to a large degree, allowing the programmer to keep a good high-level picture of the program as a whole.
Computer programming
Computer programming is the process of designing, writing, testing, debugging, and maintaining the source code of computer programs. This source code is written in one or more programming languages. The purpose of programming is to create a program that performs specific operations or exhibits a...
, intentional programming is a collection of concepts which enable software source code
Source code
In computer science, source code is text written using the format and syntax of the programming language that it is being written in. Such a language is specially designed to facilitate the work of computer programmers, who specify the actions to be performed by a computer mostly by writing source...
to reflect the precise information, called intention, which programmers had in mind when conceiving their work. By closely matching the level of abstraction
Abstraction (computer science)
In computer science, abstraction is the process by which data and programs are defined with a representation similar to its pictorial meaning as rooted in the more complex realm of human life and language with their higher need of summarization and categorization , while hiding away the...
at which the programmer was thinking, browsing and maintaining computer program
Computer program
A computer program is a sequence of instructions written to perform a specified task with a computer. A computer requires programs to function, typically executing the program's instructions in a central processor. The program has an executable form that the computer can use directly to execute...
s becomes easier.
The concept was introduced by long-time Microsoft
Microsoft
Microsoft Corporation is an American public multinational corporation headquartered in Redmond, Washington, USA that develops, manufactures, licenses, and supports a wide range of products and services predominantly related to computing through its various product divisions...
employee Charles Simonyi
Charles Simonyi
Charles Simonyi is a Hungarian-American computer software executive who, as head of Microsoft's application software group, oversaw the creation of Microsoft's flagship Office suite of applications. He now heads his own company, Intentional Software, with the aim of developing and marketing his...
, who led a team in Microsoft Research
Microsoft Research
Microsoft Research is the research division of Microsoft created in 1991 for developing various computer science ideas and integrating them into Microsoft products. It currently employs Turing Award winners C.A.R. Hoare, Butler Lampson, and Charles P...
which developed an integrated development environment
Integrated development environment
An integrated development environment is a software application that provides comprehensive facilities to computer programmers for software development...
(IDE) called IP that demonstrates these concepts. For reasons that are unclear, Microsoft stopped working on intentional programming and ended development of IP in the early 2000s.
An overview of intentional programming is given in Chapter 11 of the book Generative Programming: Methods, Tools, and Applications.
Development cycle
As envisioned by Simonyi, developing a new application via the Intentional Programming paradigm proceeds as follows. A programmer first builds a toolbox specific to a given problem domain (such as life insurance). Domain experts, aided by the programmer, then describe the application's intended behavior in a WYSIWYGWYSIWYG
WYSIWYG is an acronym for What You See Is What You Get. The term is used in computing to describe a system in which content displayed onscreen during editing appears in a form closely corresponding to its appearance when printed or displayed as a finished product...
-like manner. Finally, an automated system uses the program description and the toolbox to generate the final program. Successive changes are done at the WYSIWYG level only, employing a system called the "domain workbench".
Separating source code storage and presentation
Key to the benefits of IP is that source codeSource code
In computer science, source code is text written using the format and syntax of the programming language that it is being written in. Such a language is specially designed to facilitate the work of computer programmers, who specify the actions to be performed by a computer mostly by writing source...
is not stored in text file
Text file
A text file is a kind of computer file that is structured as a sequence of lines of electronic text. A text file exists within a computer file system...
s, but in a binary file
Binary file
A binary file is a computer file which may contain any type of data, encoded in binary form for computer storage and processing purposes; for example, computer document files containing formatted text...
that bears a resemblance to XML
XML
Extensible Markup Language is a set of rules for encoding documents in machine-readable form. It is defined in the XML 1.0 Specification produced by the W3C, and several other related specifications, all gratis open standards....
. As with XML, there is no need for a specific parser for each piece of code that wishes to operate on the information that forms the program
Computer program
A computer program is a sequence of instructions written to perform a specified task with a computer. A computer requires programs to function, typically executing the program's instructions in a central processor. The program has an executable form that the computer can use directly to execute...
, lowering the barrier to writing analysis or restructuring tools.
Tight integration of the editor with the binary format brings some of the nicer features of database normalization
Database normalization
In the design of a relational database management system , the process of organizing data to minimize redundancy is called normalization. The goal of database normalization is to decompose relations with anomalies in order to produce smaller, well-structured relations...
to source code
Source code
In computer science, source code is text written using the format and syntax of the programming language that it is being written in. Such a language is specially designed to facilitate the work of computer programmers, who specify the actions to be performed by a computer mostly by writing source...
. Redundancy is eliminated by giving each definition
Definition
A definition is a passage that explains the meaning of a term , or a type of thing. The term to be defined is the definiendum. A term may have many different senses or meanings...
a unique identity
Identity (philosophy)
In philosophy, identity, from , is the relation each thing bears just to itself. According to Leibniz's law two things sharing every attribute are not only similar, but are the same thing. The concept of sameness has given rise to the general concept of identity, as in personal identity and...
, and storing the name of variables
Variable (programming)
In computer programming, a variable is a symbolic name given to some known or unknown quantity or information, for the purpose of allowing the name to be used independently of the information it represents...
and operator
Operator (programming)
Programming languages typically support a set of operators: operations which differ from the language's functions in calling syntax and/or argument passing mode. Common examples that differ by syntax are mathematical arithmetic operations, e.g...
s in exactly one place. This makes it easier to intrinsically distinguish declarations
Declaration (computer science)
In programming languages, a declaration specifies the identifier, type, and other aspects of language elements such as variables and functions. It is used to announce the existence of the element to the compiler; this is important in many strongly-typed languages that require variables and their...
from reference
Reference (computer science)
In computer science, a reference is a value that enables a program to indirectly access a particular data item, such as a variable or a record, in the computer's memory or in some other storage device. The reference is said to refer to the data item, and accessing those data is called...
s, and the environment shows declarations in boldface type. Whitespace is also not stored as part of the source code
Source code
In computer science, source code is text written using the format and syntax of the programming language that it is being written in. Such a language is specially designed to facilitate the work of computer programmers, who specify the actions to be performed by a computer mostly by writing source...
, and each programmer working on a project can choose an indentation
Indentation
An indentation may refer to:* A notch, or deep recesses; for instance in a coastline, or a carving in rock* The placement of text farther to the right to separate it from surrounding text....
display of the source. More radical visualizations include showing statement lists as nested boxes, editing conditional expressions
Conditional statement
In computer science, conditional statements, conditional expressions and conditional constructs are features of a programming language which perform different computations or actions depending on whether a programmer-specified boolean condition evaluates to true or false...
as logic gate
Logic gate
A logic gate is an idealized or physical device implementing a Boolean function, that is, it performs a logical operation on one or more logic inputs and produces a single logic output. Depending on the context, the term may refer to an ideal logic gate, one that has for instance zero rise time and...
s, or re-rendering names in Chinese.
The project appears to standardize a kind of XML Schema
XML Schema
XML Schema, published as a W3C recommendation in May 2001, is one of several XML schema languages. It was the first separate schema language for XML to achieve Recommendation status by the W3C...
for popular languages like C++
C++
C++ is a statically typed, free-form, multi-paradigm, compiled, general-purpose programming language. It is regarded as an intermediate-level language, as it comprises a combination of both high-level and low-level language features. It was developed by Bjarne Stroustrup starting in 1979 at Bell...
and Java
Java (programming language)
Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities...
, while letting users of the environment mix and match these with ideas from Eiffel
Eiffel (programming language)
Eiffel is an ISO-standardized, object-oriented programming language designed by Bertrand Meyer and Eiffel Software. The design of the language is closely connected with the Eiffel programming method...
and other languages. Often mentioned in the same context as language-oriented programming
Language-oriented programming
Language oriented programming is a style of computer programming in which, rather than solving problems in general-purpose programming languages, the programmer creates one or more domain-specific languages for the problem first, and solves the problem in those languages...
via domain-specific languages, and aspect-oriented programming
Aspect-oriented programming
In computing, aspect-oriented programming is a programming paradigm which aims to increase modularity by allowing the separation of cross-cutting concerns...
, IP purports to provide some breakthroughs in generative programming. These techniques allow developers to extend the language environment to capture domain-specific constructs without investing in writing a full compiler
Compiler
A compiler is a computer program that transforms source code written in a programming language into another computer language...
and editor for any new languages.
Example
A JavaJava (programming language)
Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities...
program that writes out the numbers from 1 to 10, using a curly bracket syntax, might look like this:
for (int i = 1; i <= 10; i++) {
System.out.println("the number is " + i);
}
The code above contains a common construct of most programming language
Programming language
A programming language is an artificial language designed to communicate instructions to a machine, particularly a computer. Programming languages can be used to create programs that control the behavior of a machine and/or to express algorithms precisely....
s, the bounded loop, in this case represented by the
for
construct. The code, when compiled, linked and run, will loop 10 times, incrementing the value of i each time after printing it out.But this code does not capture the intentions of the programmer, namely to "print the numbers 1 to 10". In this simple case, a programmer asked to maintain the code could likely figure out what it is intended to do, but it is not always so easy. Loops that extend across many lines, or pages, can become very difficult to understand, notably if the original programmer uses unclear labels. Traditionally the only way to indicate the intention of the code was to add source code comment
Comment (computer programming)
In computer programming, a comment is a programming language construct used to embed programmer-readable annotations in the source code of a computer program. Those annotations are potentially significant to programmers but typically ignorable to compilers and interpreters. Comments are usually...
s, but often comments are not added, or are unclear, or drift out of sync with the source code they originally described.
In intentional programming systems the above loop could be represented, at some level, as something as obvious as "
print the numbers 1 to 10
". The system would then use the intentions to generate source code, likely something very similar to the code above. The key difference is that the intentional programming systems maintain the semantic level, which the source code lacks, and which can dramatically ease readabilityReadability
Readability is the ease in which text can be read and understood. Various factors to measure readability have been used, such as "speed of perception," "perceptibility at a distance," "perceptibility in peripheral vision," "visibility," "the reflex blink technique," "rate of work" , "eye...
in larger programs.
Although most languages contain mechanisms for capturing certain kinds of abstraction
Abstraction (computer science)
In computer science, abstraction is the process by which data and programs are defined with a representation similar to its pictorial meaning as rooted in the more complex realm of human life and language with their higher need of summarization and categorization , while hiding away the...
, IP, like the Lisp family of languages, allows for the addition of entirely new mechanisms. Thus, if a developer started with a language like C
C (programming language)
C is a general-purpose computer programming language developed between 1969 and 1973 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system....
, they would be able to extend the language with features such as those in C++ without waiting for the compiler developers to add them. By analogy, many more powerful expression mechanisms could be used by programmers than mere classes
Class (computer science)
In object-oriented programming, a class is a construct that is used as a blueprint to create instances of itself – referred to as class instances, class objects, instance objects or simply objects. A class defines constituent members which enable these class instances to have state and behavior...
and procedures
Subroutine
In computer science, a subroutine is a portion of code within a larger program that performs a specific task and is relatively independent of the remaining code....
.
Identity
IP focuses on the concept of identityIdentity (philosophy)
In philosophy, identity, from , is the relation each thing bears just to itself. According to Leibniz's law two things sharing every attribute are not only similar, but are the same thing. The concept of sameness has given rise to the general concept of identity, as in personal identity and...
. Since most programming languages represent the source code as plain text, objects are defined by names, and their uniqueness has to be inferred by the compiler. For example, the same symbolic name may be used to name different variables, procedures, or even types. In code that spans several pages – or, for globally visible names, multiple files – it can become very difficult to tell what symbol
Symbol
A symbol is something which represents an idea, a physical entity or a process but is distinct from it. The purpose of a symbol is to communicate meaning. For example, a red octagon may be a symbol for "STOP". On a map, a picture of a tent might represent a campsite. Numerals are symbols for...
refers to what actual object. If a name is changed, the code where it is used must carefully be examined.
By contrast, in an IP system, all definition
Definition
A definition is a passage that explains the meaning of a term , or a type of thing. The term to be defined is the definiendum. A term may have many different senses or meanings...
s not only assign symbolic names, but also unique private identifier
Identifier
An identifier is a name that identifies either a unique object or a unique class of objects, where the "object" or class may be an idea, physical [countable] object , or physical [noncountable] substance...
s to objects. This means that in the IP development environment, every reference to a variable or procedure is not just a name – it is a link to the original entity.
The major advantage of this is that if an entity is renamed, all of the references to it in the program remain valid (known as referential integrity
Referential integrity
Referential integrity is a property of data which, when satisfied, requires every value of one attribute of a relation to exist as a value of another attribute in a different relation ....
). This also means that if the same name is used for unique definitions in different namespaces (such as "
.to_string
"), references with the same name but different identity will not be renamed, as sometimes happens with search/replace in current editors. This feature also makes it easy to have multi-language versions of the program; it can have a set of English-language names for all the definitions as well as a set of Japanese-language names which can be swapped in at will.Having a unique identity for every defined object in the program also makes it easy to perform automated refactoring tasks, as well as simplifying code check-ins in versioning systems
Revision control
Revision control, also known as version control and source control , is the management of changes to documents, programs, and other information stored as computer files. It is most commonly used in software development, where a team of people may change the same files...
. For example, in many current code collaboration systems (e.g. CVS
Concurrent Versions System
The Concurrent Versions System , also known as the Concurrent Versioning System, is a client-server free software revision control system in the field of software development. Version control system software keeps track of all work and all changes in a set of files, and allows several developers ...
), when two programmers commit changes that conflict (i.e. if one programmer renames a function while another changes one of the lines in that function), the versioning system will think that one programmer created a new function while another modified an old function. In an IP versioning system, it will know that one programmer merely changed a name while another changed the code.
Levels of detail
IP systems also offer several levels of detail, allowing the programmer to "zoom in" or out. In the example above, the programmer could zoom out to get a level that would say something like:
<>
Thus IP systems are self-documenting
Self-documenting
In computer programming, self-documenting is a common descriptor for source code that follows certain loosely-defined conventions for naming and structure...
to a large degree, allowing the programmer to keep a good high-level picture of the program as a whole.
Similar works
There are projects that exploit similar ideas to create code with higher level of abstraction. Among them are:- Concept programmingConcept programmingConcept programming is a programming paradigm focusing on how concepts, that live in the programmer's head, translate into representations that are found in the code space. This approach was introduced in 2001 by Christophe de Dinechin with the XL Programming Language.- Pseudo-metrics :Concept...
- Language-oriented programmingLanguage-oriented programmingLanguage oriented programming is a style of computer programming in which, rather than solving problems in general-purpose programming languages, the programmer creates one or more domain-specific languages for the problem first, and solves the problem in those languages...
- Domain-specific language
- Program transformationProgram transformationA program transformation is any operation that takes a computer program and generates another program. In many cases the transformed program is required to be semantically equivalent to the original, relative to a particular formal semantics and in fewer cases the transformations result in programs...
- Semantic-oriented programmingSemantic-oriented programmingSemantic-oriented programming is a programming paradigm in which the programmer formulizes the logic of a domain by means of semantic structures.-Common features:...
- Literate programmingLiterate programmingLiterate programming is an approach to programming introduced by Donald Knuth as an alternative to the structured programming paradigm of the 1970s....
- Model-driven architectureModel-driven architectureModel-driven architecture is a software design approach for the development of software systems. It provides a set of guidelines for the structuring of specifications, which are expressed as models. Model-driven architecture is a kind of domain engineering, and supports model-driven engineering of...
(MDA) - Software factorySoftware factoryIn software engineering and enterprise software architecture, a software factory is an organizational structure that specializes in producing computer software applications or software components according to specific, externally-defined end-user requirements through an assembly process...
- MetaprogrammingMetaprogrammingMetaprogramming is the writing of computer programs that write or manipulate other programs as their data, or that do part of the work at compile time that would otherwise be done at runtime...
- Lisp (programming language)
See also
- Programming paradigmProgramming paradigmA programming paradigm is a fundamental style of computer programming. Paradigms differ in the concepts and abstractions used to represent the elements of a program and the steps that compose a computation A programming paradigm is a fundamental style of computer programming. (Compare with a...
- Code generationCode generationIn computer science, code generation is the process by which a compiler's code generator converts some intermediate representation of source code into a form that can be readily executed by a machine ....
- Object databaseObject databaseAn object database is a database management system in which information is represented in the form of objects as used in object-oriented programming...
- Programming by demonstrationProgramming by demonstrationIn computer science, programming by demonstration is an End-user development technique for teaching a computer or a robot new behaviors by demonstrating the task totransfer directly instead of programming it through machine commands....
- ArtefakturArtefakturArtefaktur Component Development Kit is a platform-independent library for generating distributed server-based components and applications. Services are provided by a C++ framework....
- Abstract syntax treeAbstract syntax treeIn computer science, an abstract syntax tree , or just syntax tree, is a tree representation of the abstract syntactic structure of source code written in a programming language. Each node of the tree denotes a construct occurring in the source code. The syntax is 'abstract' in the sense that it...
- Semantic resolution treeSemantic resolution treeA semantic resolution tree is a tree used for the definition of the semantics of a programming language....
- Structure editorStructure editorA structure editor, also structured editor or projectional editor, is any document editor that is cognizant of the document’s underlying structure. Structure editors can be used to edit hierarchical or marked up text, computer programs, diagrams, chemical formulas, and any other type of content...
External links
- Intentional Software - Charles Simonyi's company
- [ftp://ftp.research.microsoft.com/pub/tr/tr-95-52.doc The Death Of Computer Languages, The Birth of Intentional Programming, a technical report by Charles Simonyi (1995)] (FTP links)
- Intentional Programming - Innovation in the Legacy Age, a talk by Charles Simonyi (1996)
- Edge.org interview with Charles Simonyi (interviewer: John Brockman)
- Language Workbenches: The Killer-App for Domain Specific Languages? - Martin Fowler's article on the general class of tools that Intentional Programming is an example of.
- "Anything You Can Do, I Can Do Meta" Tuesday, January 9, 2007, Scott Rosenberg, Technology ReviewTechnology ReviewTechnology Review is a magazine published by the Massachusetts Institute of Technology. It was founded in 1899 as "The Technology Review", and was re-launched without the "The" in its name on April 23, 1998 under then publisher R. Bruce Journey...
- Awaiting the Day When Everyone Writes Software, The New York Times, 28 January 2007
- Is programming a form of encryption?, by Charles Simonyi (2005)
- The information contents of programs, by Charles Simonyi (2005)
- Feature X Considered Harmful, by Charles Simonyi (2005)
- Notations and Programming Languages, by Charles Simonyi (2005)
- Personal Observations from a Developer, by Mark Edel (2005)