Abstract interpretation
Encyclopedia
In computer science
Computer science
Computer science or computing science is the study of the theoretical foundations of information and computation and of practical techniques for their implementation and application in computer systems...

, abstract interpretation is a theory of sound approximation of the semantics
Semantics
Semantics is the study of meaning. It focuses on the relation between signifiers, such as words, phrases, signs and symbols, and what they stand for, their denotata....

 of computer programs, based on monotonic functions over ordered set
Ordered set
In order theory in mathematics, a set with a binary relation R on its elements that is reflexive , antisymmetric and transitive is described as a partially ordered set or poset...

s, especially lattice
Lattice (order)
In mathematics, a lattice is a partially ordered set in which any two elements have a unique supremum and an infimum . Lattices can also be characterized as algebraic structures satisfying certain axiomatic identities...

s. It can be viewed as a partial execution
Execution (computers)
Execution in computer and software engineering is the process by which a computer or a virtual machine carries out the instructions of a computer program. The instructions in the program trigger sequences of simple actions on the executing machine...

 of a computer program
Computer program
A computer program is a sequence of instructions written to perform a specified task with a computer. A computer requires programs to function, typically executing the program's instructions in a central processor. The program has an executable form that the computer can use directly to execute...

 which gains information about its semantics
Semantics
Semantics is the study of meaning. It focuses on the relation between signifiers, such as words, phrases, signs and symbols, and what they stand for, their denotata....

 (e.g. control-flow, data-flow
Data-flow analysis
Data-flow analysis is a technique for gathering information about the possible set of values calculated at various points in a computer program. A program's control flow graph is used to determine those parts of a program to which a particular value assigned to a variable might propagate. The...

) without performing all the calculation
Calculation
A calculation is a deliberate process for transforming one or more inputs into one or more results, with variable change.The term is used in a variety of senses, from the very definite arithmetical calculation of using an algorithm to the vague heuristics of calculating a strategy in a competition...

s.

Its main concrete application is formal static analysis
Static code analysis
Static program analysis is the analysis of computer software that is performed without actually executing programs built from that software In most cases the analysis is performed on some version of the source code and in the other cases some form of the object code...

, the automatic extraction of information about the possible executions of computer programs; such analyses have two main usages:
  • inside compiler
    Compiler
    A compiler is a computer program that transforms source code written in a programming language into another computer language...

    s, to analyse programs in order to decide whether certain optimization
    Optimization (computer science)
    In computer science, program optimization or software optimization is the process of modifying a software system to make some aspect of it work more efficiently or use fewer resources...

    s or transformation
    Program transformation
    A program transformation is any operation that takes a computer program and generates another program. In many cases the transformed program is required to be semantically equivalent to the original, relative to a particular formal semantics and in fewer cases the transformations result in programs...

    s are applicable;
  • for debugging
    Debugging
    Debugging is a methodical process of finding and reducing the number of bugs, or defects, in a computer program or a piece of electronic hardware, thus making it behave as expected. Debugging tends to be harder when various subsystems are tightly coupled, as changes in one may cause bugs to emerge...

     or even the certification of programs against classes of bugs.


Abstract interpretation was formalized by Patrick Cousot
Patrick Cousot
Patrick Cousot is a French computer scientist.Together with his wife Radhia, Cousot is the originator of abstract interpretation, an influential technique in formal methods. In the 2000s, he has worked on practical methods of static analysis for critical embedded software, such as found in avionics...

 and Radhia Cousot.

Intuition

This article now illustrates what abstract interpretation means, by using real-world, non-computing, examples.

Consider the people in a conference room. To prove that some persons were not present, one concrete method is to look up a list of the names and some identifiers unique to them, like a social security number
Social Security number
In the United States, a Social Security number is a nine-digit number issued to U.S. citizens, permanent residents, and temporary residents under section 205 of the Social Security Act, codified as . The number is issued to an individual by the Social Security Administration, an independent...

 in the United States, of all participants. Since two different people cannot have the same number, it is possible to prove or disprove the presence of a participant simply by looking up his or her number in the list.

However it is possible that only the names of attendees were registered. If the name of a person is not found in the list, we may safely conclude that that person was not present; but if it is, we cannot conclude definitely without further inquiries, due to the possibility of homonym
Homonym
In linguistics, a homonym is, in the strict sense, one of a group of words that often but not necessarily share the same spelling and the same pronunciation but have different meanings...

s (for example two people named John Smith). Note that this imprecise information will still be adequate for most purposes, because homonyms are rare in practice. However, in all rigor, we cannot say for sure that somebody was present in the room; all we can say is that he or she was possibly here. If the person we are looking up is a criminal, we will issue an alarm; but there is of course the possibility of issuing a false alarm. Similar phenomena will occur in the analysis of programs.

If we are only interested in some specific information, say, "was there a person of age n in the room", keeping a list of all names and dates of births is unnecessary. We may safely and without loss of precision restrict ourselves to keeping a list of the participants' ages. If this is already too much to handle, we might keep only the minimal m and maximal M ages. If the question is about an age strictly lower than m or strictly higher than M, then we may safely respond that no such participant was present. Otherwise, we may only be able to say that we do not know.

In the case of computing, concrete, precise information is in general not computable within finite time and memory (see Rice's theorem
Rice's theorem
In computability theory, Rice's theorem states that, for any non-trivial property of partial functions, there is no general and effective method to decide whether an algorithm computes a partial function with that property...

 and the halting problem
Halting problem
In computability theory, the halting problem can be stated as follows: Given a description of a computer program, decide whether the program finishes running or continues to run forever...

). Abstraction
Abstraction
Abstraction is a process by which higher concepts are derived from the usage and classification of literal concepts, first principles, or other methods....

 is used to allow for vague answers to questions (for example, answering "maybe" to a yes/no question); this simplifies the problems making them amenable to automatic solutions. One crucial requirement is to add enough vagueness so as to make problems manageable while still retaining enough precision for answering the important questions (such as "may the program crash?").

Abstract interpretation of computer programs

Given a programming or specification language, abstract interpretation consists in giving several semantics linked by relations of abstraction. A semantics is a mathematical characterization of a possible behavior of the program. The most precise semantics, describing very closely the actual execution of the program, are called the concrete semantics. For instance, the concrete semantics of an imperative programming language may associate to each program the set of execution traces it may produce – an execution trace being a sequence of possible consecutive states of the execution of the program; a state typically consists of the value of the program counter and the memory locations (globals, stack and heap). More abstract semantics are then derived; for instance, one may consider only the set of reachable states in the executions (which amounts to considering the last states in finite traces).

The goal of static analysis is to derive a computable semantic interpretation at some point. For instance, one may choose to represent the state of a program manipulating integer variables by forgetting the actual values of the variables and only keeping their signs (+, − or 0). For some elementary operations, such as multiplication
Multiplication
Multiplication is the mathematical operation of scaling one number by another. It is one of the four basic operations in elementary arithmetic ....

, such an abstraction does not lose any precision: to get the sign of a product, it is sufficient to know the sign of the operands. For some other operations, the abstraction may lose precision: for instance, it is impossible to know the sign of a sum whose operands are respectively positive and negative.

Sometimes a loss of precision is necessary to make the semantics decidable (see Rice's theorem
Rice's theorem
In computability theory, Rice's theorem states that, for any non-trivial property of partial functions, there is no general and effective method to decide whether an algorithm computes a partial function with that property...

, halting problem
Halting problem
In computability theory, the halting problem can be stated as follows: Given a description of a computer program, decide whether the program finishes running or continues to run forever...

). In general, there is a compromise to be made between the precision of the analysis and its decidability (computability), or tractability (complexity
Complexity
In general usage, complexity tends to be used to characterize something with many parts in intricate arrangement. The study of these complex linkages is the main goal of complex systems theory. In science there are at this time a number of approaches to characterizing complexity, many of which are...

).

In practice the abstractions that are defined are tailored to both the program properties one desires to analyze, and to the set of target programs. The first large scale automated analysis of computer programs with abstract interpretation can be attributed to an accident that resulted in the destruction of the first flight of the Ariane 5
Ariane 5 Flight 501
Cluster was a constellation of four European Space Agency spacecraft which were launched on the maiden flight of the Ariane 5 rocket, Flight 501, and subsequently lost when that rocket failed to achieve orbit. The launch, which took place on Tuesday, June 4, 1996, ended in failure due to an error...

 rocket in 1996.

Formalization

Let L be an ordered set, called a concrete set and let L′ be another ordered set, called an abstract set. These two sets are related to each other by defining total functions that map elements from one to the other.

A function α is called an abstraction function if it maps an element x in the concrete set L to an element α(x) in the abstract set L′. That is, element α(x) in L′ is the abstraction of x in L.

A function γ is called a concretization function if it maps an element x′ in the abstract set L′ to an element γ(x′) in the concrete set L. That is, element γ(x′) in L is a concretization of x′ in L′.

Let L1, L2, L1 and L2 be ordered sets. The concrete semantics f is a monotonic function from L1 to L2. A function f′ from L1 to L2 is said to be a valid abstraction of f if for all x′ in L1, (f ∘ γ)(x′) ≤ (γ ∘ f′)(x′).

Program semantics are generally described using fixed point
Fixed point (mathematics)
In mathematics, a fixed point of a function is a point that is mapped to itself by the function. A set of fixed points is sometimes called a fixed set...

s in the presence of loops or recursive procedures. Let us suppose that L is a complete lattice and let f be a monotonic function from L into L. Then, any x′ such that f′(x′) ≤ x′ is an abstraction of the least fixed-point of f, which exists, according to the Knaster–Tarski theorem.

The difficulty is now to obtain such an x′. If L′ is of finite height, or at least verifies the «ascending chain condition» (all ascending sequences are ultimately stationary), then such an x′ may be obtained as the stationary limit of the ascending sequence xn defined by induction as follows: x0=⊥ (the least element of L′) and xn+1=f′(xn).

In other cases, it is still possible to obtain such an x′ through a widening operator ∇: for all x and y, xy should be greater or equal than both x and y, and for any sequence yn, the sequence defined by x0=⊥ and xn+1=xnyn is ultimately stationary. We can then take yn=f′(xn).

In some cases, it is possible to define abstractions using Galois connection
Galois connection
In mathematics, especially in order theory, a Galois connection is a particular correspondence between two partially ordered sets . The same notion can also be defined on preordered sets or classes; this article presents the common case of posets. Galois connections generalize the correspondence...

s (α, γ) where α is from L to L′ and γ is from L′ to L. This supposes the existence of best abstractions, which is not necessarily the case. For instance, if we abstract sets of couples (x, y) of real number
Real number
In mathematics, a real number is a value that represents a quantity along a continuum, such as -5 , 4/3 , 8.6 , √2 and π...

s by enclosing convex polyhedra
Polyhedron
In elementary geometry a polyhedron is a geometric solid in three dimensions with flat faces and straight edges...

, there is no optimal abstraction to the disc defined by x2+y2 ≤ 1.

Examples of abstract domains

One can assign to each variable x available at a given program point an interval [Lx, Hx]. A state assigning the value v(x) to variable x will be a concretization of these intervals if for all x, then v(x) is in [Lx, Hx]. From the intervals [Lx, Hx] and [Ly, Hy] for variables x and y, one can easily obtain intervals for x+y ([Lx+Ly, Hx+Hy]) and for xy ([LxHy, HxLy]); note that these are exact abstractions, since the set of possible outcomes for, say, x+y, is precisely the interval ([Lx+Ly, Hx+Hy]). More complex formulas can be derived for multiplication, division, etc., yielding so-called interval arithmetics.

Let us now consider the following very simple program:

y = x;
z = x - y;

With reasonable arithmetic types, the result for z should be zero. But if we do interval arithmetics starting from x in [0, 1], one gets z in [−1, +1]. While each of the operations taken individually was exactly abstracted, their composition isn't.

The problem is evident: we did not keep track of the equality relationship between x and y; actually, this domain of intervals does not take into account any relationships between variables, and is thus a non-relational domain. Non-relational domains tend to be fast and simple to implement, but imprecise.

Some examples of relational numerical abstract domains are:
  • congruence relations on integers
  • convex polyhedra – with some high computational costs
  • "octagons"
  • difference-bound matrices
  • linear equalities

and combinations thereof.

When one chooses an abstract domain, one typically has to strike a balance between keeping fine-grained relationships, and high computational costs.

Tools

  • Abstract Rewriting Machine
    Abstract Rewriting Machine
    The Abstract Rewriting Machine is a virtual machine which implements term rewriting for minimal term rewriting systems.Minimal term rewriting systems are left-linear term rewriting systems in which each rule takes on one of six forms:...

  • Polyspace
    Polyspace
    Polyspace is a static code analysis tool inspired by the failure of the maiden flight of Ariane 5 where a run time error resulted in destruction of the launch vehicle. It is the first example of large-scale static code analysis by abstract interpretation to detect and prove the absence of certain...

  • CodeSonar
    GrammaTech
    GrammaTech is a software-development tools vendor based in Ithaca, New York. The company was founded in 1988 as a technology spin-off of Cornell University...

  • Coverity Prevent
    Coverity
    Coverity is a software vendor based in San Francisco. It was incorporated in November 2002. It develops static code analysis tools, for C, C++ and other programming languages, used to find defects and security vulnerabilities in source code...

  • Klocwork Insight
    Klocwork
    Klocwork is a software company with headquarters in Burlington, MA and R&D based in Ottawa, ON, Canada. Klocwork was founded in 2001 as a spin-out of Nortel Networks and has over 850 customers who use its software development tools.-Products:...

  • Parasoft Jtest
    Jtest
    Jtest is an automated Java testing and static code analysis product that is made by Parasoft. It aims to improve Java code reliability, functionality, security, performance, and maintainability. Basic functionality includes Unit test-case generation, static analysis, regression testing, runtime...

  • Parasoft C/C++test
    Parasoft
    Parasoft is an independent software vendor with headquarters in Monrovia, California. It was founded in 1987 by five graduates of the California Institute of Technology who had been working on Caltech Cosmic Cube....

  • Red Lizard's Goanna
    Red Lizard Software
    Red Lizard Software is a privately held software vendor for static analysis tools. The company was founded in 2009 as a spinout from NICTA, after four years of research. Its headquarters are in Sydney, Australia.-Products:...


See also

  • Standard interpretation
    Interpreter (computing)
    In computer science, an interpreter normally means a computer program that executes, i.e. performs, instructions written in a programming language...

  • Model checking
    Model checking
    In computer science, model checking refers to the following problem:Given a model of a system, test automatically whether this model meets a given specification....

  • Symbolic simulation
    Symbolic simulation
    In computer science, a simulation is a computation of the execution of some appropriately modelled state-transition system. Typically this process models the complete state of the system at individual points in a discrete linear time frame, computing each state sequentially from its predecessor...

  • Symbolic execution
    Symbolic execution
    In computer science, symbolic execution refers to the analysis of programs by tracking symbolic rather than actual values, a case of abstract interpretation. The field of symbolic simulation applies the same concept to hardware...


External links


Lecture notes
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK